StructVIO : Visual-inertial Odometry with Structural Regularity of Man-made Environments

Danping Zou - Institute for Sensing and Navigation @ Shanghai Jiao Tong University
dpzou@sjtu.edu.cn

In this project, we develop a novel visual-inertial odometry approach that adopts structural regularity in man-made environments. Instead of using Manhattan world assumption, we use Atlanta world model to describe such regularity. An Atlanta world is a world that contains multiple local Manhattan worlds with different heading directions. Each local Manhattan world is detected on-the-fly, and their headings are gradually refined by the state estimator when new observations are coming. With fully exploration of structural lines that aligned with each local Manhattan worlds, our visual-inertial odometry method become more accurate and robust, as well as much more flexible to different kinds of complex man-made environments. Through extensive benchmark tests and real-world tests, the results show hat the proposed approach outperforms the state-of-the-art visual-inertial systems in large-scale man-made environments. Mahattan world vs Atlanta world structvio

Currently the paper is under review, please click here to download the arXiv paper.

Excutable

We provide a binary executable for testing StructVIO. Currently it is only tested on Ubuntu 16.04, 17.04, and 18.04.

Dataset

inar

Our datasets for evaluation of visual-inertial odometry methods were collected inside and outside of three buildings - Soft, Mech, and Micro. The indoor parts include typical scenes such as narrow passages, staircases, large halls, clutter workshop, open offices, corridor junctions and so on. The outdoor parts include trees, roads, parking lots, and building entrance. Challenging cases such as over or under exposure, texture-less walls, distance features, and fast camera motions can be found in our datasets.

The performance of VIO is evaluated by the end-to-end error that computed from the ArUco pattern placed at the starting point. We measured its accuracy with VICON and found the average positional error is about 3 centimeters.

Click the following links to download the datasets.

Software Engineering building

Download link	Soft-01	Soft-02	Soft-03	Soft-04
Traveling distance	315m	438m	348m	400m

Mechanical Engineering building

Download link	Mech-01	Mech-02	Mech-03	Mech-04
Traveling distance	341m	389m	318m	650m

Microelectronics Engineering building

Download link	MicroA-01	MicroA-02	MicroA-03	MicroA-04
Traveling distance	258	190m	388m	238m

Download link	MicroB-01	MicroB-02	MicroB-03	MicroB-04
Traveling distance	339	306m	485m	357m

File format & structure

We use the same format and directory structure as the Euroc datasets. For example, the file structure after extraction of the Soft-01.zip file is shown as

Soft-01
+-- Soft-01
|   +-- cam0
|   |  +-- data           #storing the images
|   |  data.csv           #list of images
|   +-- imu0
|   |  data.csv           #imu measurements
|   Soft-01-ArUco-a.txt   #ArUco ground-truth pose file
|   Soft-01-ArUco-b.txt   #ArUco ground-truth pose file
|   tango_pose.txt        #Tango VIO result

There is a little of bit difference in the file structure of some datasets using VICON as the reference, like MicroA-03

MicroB-03
+-- MicroB-03
|   +-- cam0
|   |  +-- data           #storing the images
|   |  data.csv           #list of images
|   +-- imu0
|   |  data.csv           #imu measurements
|   vicon.txt             #ground-truth pose file from VICON
|   tango_pose.txt        #Tango VIO result

The extra files ‘Soft-01-ArUco-a.txt’ and ‘Soft-01-ArUco-b.txt’ describe the camera poses computed from the ArUco tags. The ‘vicon.txt’ file containts the motion capture data from the VICON system. Their formats are as the following:

<timestamp (in seconds)> <x> <y> <z> <quat_w> <quat_x> <quat_y> <quat_z>
... ...
2580.831073 0.14008 -0.10809 1.003 -0.21064 0.97635 0.025657 -0.041372
... ...

The ‘tango_pose.txt’ is the VIO result from Project Tango Tablet, whose format is described as

<second> <nano second> <1 - a number for future use> <quat_w> <quat_x> <quat_y> <quat_z> <x> <y> <z>
... ...
000004369   611521000   1   0.737397    0.675205    -0.011286   -0.0147454  -0.0636032  0.0252636   -0.000515278
... ...

Running StructVIO

After downloading the binary file and extraction of ‘Soft-01.zip’ to the ‘Soft-01’ folder, we run the following command to start VIO.

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml

Here, ‘structvio_data.yaml’ is a configuration file for the running algorithm, which includes the camera and imu parameters.

Here structvio_data.yaml and euroc_data.yaml are the default configurations for StructVIO datasets and Euroc datasets. You can type

./structvio --help

to get the usage of different arguments. Some of them are listed in the following.

-g,  --gui_on
  Display the GUIs

-t <Image latency (nanoseconds)>,  --img_latency <Image latency
   (nanoseconds)>
  Time latency of image

-i <root path of the input data>,  --input_dir <root path of the input
   data>
  (required)  The root path of the single set of data

-n <name of the data>,  --data_name <name of the data>
  (required)  The name of the data

-r <result dir>,  --result_dir <result dir>
  (required)  The folder to save the results

-c <.yaml file of configuration>,  --cfg_yaml <.yaml file of
   configuration>
  (required)  The .yaml file that specifies the sensor parameters and
  program options

-p <number>,  --point_num <number>
  Number of points used

-l <0|1|2>,  --line_type <0|1|2>
  Type of lines used: 0-structlines, 1-general lines, 2-both

-f <0|1|2>,  --feature_type <0|1|2>
  Type of features used: 0 - point, 1 - line, 2 - both

If you want to run point-only VIO, please type

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0

You can also change the number of points allowed to be detected by using ‘-p’ option. For example, we change the number to 50.

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0 -p 50

To run StructVIO on the Euroc datasets, download the configuration file euroc_data.yaml and type

./structvio -i ./MH_01_easy -n mav0 -r MH_01_easy-res -c euroc_data.yaml

Note that the new time stamp of an image with non-zero image latency is computed as $t n e w \to t o l d + t l a t e n c y (in nanoseconds)$ $t_{new}　\rightarrow t_{old}+ t_{latency}(\text{in nanoseconds})$

Camera intrinsics

The camera intrinsics is specified by a $3\times3$ matrix in the ‘.yaml’ file:

K = ⎡ ⎣ ⎢ f x 00 0 f x 0 c x c y 1 ⎤ ⎦ ⎥

$K = \begin{bmatrix} f_x & 0 & c_x\\ 0 & f_x & cy \\ 0 & 0 & 1 \end{bmatrix}$ For example, if you get

fx=250,fy=250,cx=320,cy=240 $f_x = 250,f_y = 250, c_x = 320,c_y = 240$ , you need specify in the ‘.yaml’ configuration file as the following.

#camera intrinsic marix
K: !!opencv-matrix
   rows: 3
   cols: 3
   dt: d
   data: [ 250.0,0.0,320.0,0.0,250.0,240.0,0.0, 0.0, 1.0]

Camera distortion

StructVIO supports three distortion models - Radial-Tangential, FOV and Equidistant models.

The Radial-Tangential model that has been used as the default camera distortion model in Caltech camera calibration toolbox and OpenCV. It consists of five parameters. $D = [D 1, D 2, D 3, D 4, D 5]$ $D = [D_1,D_2,D_3,D_4,D_5]$ where $D_1,D_2,D_5$ are coefficients for different orders of the radial distortion and $D_3,D_4$ are coefficients for tangential distortion. Here is an example of the specification of the radial-tangential model in the ‘.yaml’ file.

#camera distortion parameters (five parameters for Radial-Tangential model)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [-0.325283107218,0.118269672106,6.91877400175e-05,0.000704614881902,-0.0199427782057]

The FOV model is introduced for modeling the fisheye camera. It has been widely used in visual SLAM systems such as PTAM. Compared with other fisheye distortion models, it has only one parameter and is has the close form solution of undistorted coordinates. We always choose this distortion model for fisheye cameras in our VIO or VSLAM implementations. This is an example of a FOV model specified in the ‘.yaml’ file.

#distortion parameters (one parameter FOV model, -10000 represents Invalid values)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [0.92378997802734375,-10000.0,-10000.0,-10000.0,-10000.0]

The Equidistant model that has been introduced in OpenCV 3.0 is also supported. It however contains only $4$ parameters. So you need to leave the last number being set to the invalid value ( $-10000$ ). This model has not been well tested in our system. So no recommendation to using this model in our system.

#camera distortion parameters (four parameters for Equidistant model)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [-0.0133484559473,-0.0618812617667,0.076141294789,-0.0275889068285,-10000]

Imu parameters

The Imu parameters are from Kalibr. After running Kalibr, we can get a ‘.txt’ file that stores all the calibration results. Here are the corresponding variables between the StructVIO　.yaml file and the Kalbir output .txt file.

R_ga (StructVIO yaml)<-> Gyroscope.C_gyro_i (Kalibr output .txt)
gyro_M (StructVIO yaml) <-> Gyroscope.M (Kalibr output .txt)
acc_M (StructVIO yaml) <-> Accelerometer.M (Kalibr output .txt)

Relative transformation between the camera and IMU

The relative transformation is described the transformation from the camera frame to the imu frame in StructVIO’s yaml file.

#transformation from the camera frame to the IMU frame (quaternion+translation)
# or a 3x4 matrix: [R t] corresponding to Ｔ_ic of Kalbir's output .txt file
imu_cam_trans: !!opencv-matrix
   rows: 1
   cols: 7
   dt: d
   data: [-0.115784,0.993255,0.0033886,0.0051271,0.00648922,-0.0123893,-0.00512952]

Results of StructVIO

After running StructVIO, the result of state estimates at each time step will be saved as ‘state.txt’ in the ‘Soft-01-res’ directory. The ‘state.txt’ file is defined as

<image id> <seconds> <nano seconds> <1x4 quat_wi> <1x3 p_wi> <1x3 v_wi> <1x3 gyro bias> <1x3 acc bias> <1x4 quat_ic> <1x3 p_ic>
... ...
64,3081,46683334,0.989035,0.128174,-0.0731408,0.00554781,0.139563,0.0427987,-0.0472643,-0.0868763,0.199802,0.0756805,-0.00352615,-0.00140592,-0.00106152,-0.229281,-0.176175,0.0945192,-0.117153,0.992984,0.0145482,0.00677828,0.00157036,-0.0129524,-0.00040645,
... ...

where the quat_wi, p_wi, v_wi, quat_ic, and p_ic are defined as:

quat_wi - quaternion from the IMU frame to the world frame
p_wi - position in the world frame
v_wi - velocity in the world frame
(quat_ic, p_ic) - transformation from the camera frame to the IMU frame.

Scripts for evaluation

For StructVIO datasets, we provide a script vio_eva.py for computing the end-to-end positional error of StructVIO as described in the paper. Note that please install evo_tools first.

vio_eva.py -r <path to the result file - 'state.txt'> -d <path to the folder of data - e.g. 'Mech-01'>

For EuRoC datasets, we provide a simple script con2tum.py to convert the StructVIO result and EuRoC ground truth into TUM’s format for comparison.

#convert the results and ground truth into TUM's format.
conv2tum.py -t structvio -i state.txt -o result.tum
conv2tum.py -t euroc -i xx/mav0/state_groundtruth_estimate0/data.csv -o gt.tum
#call evo for comparison
#(absolute positional error)
evo_ape tum gt.tum result.tum -va --save_results res.zip
#(relative positional error)
evo_rpe tum gt.tum result.tum -d 1 -r full

2018-10-18 (First release)
2019-03-01 (Scripts for evaluation)