Danping Zou - Institute for Sensing and Navigation @ Shanghai Jiao Tong University
dpzou@sjtu.edu.cn
In this project, we develop a novel visual-inertial odometry approach that adopts structural regularity in man-made environments. Instead of using Manhattan world assumption, we use Atlanta world model to describe such regularity. An Atlanta world is a world that contains multiple local Manhattan worlds with different heading directions. Each local Manhattan world is detected on-the-fly, and their headings are gradually refined by the state estimator when new observations are coming. With fully exploration of structural lines that aligned with each local Manhattan worlds, our visual-inertial odometry method become more accurate and robust, as well as much more flexible to different kinds of complex man-made environments. Through extensive benchmark tests and real-world tests, the results show hat the proposed approach outperforms the state-of-the-art visual-inertial systems in large-scale man-made environments.
Currently the paper is under review, please click here to download the arXiv paper.
We provide a binary executable for testing StructVIO. Currently it is only tested on Ubuntu 16.04, 17.04, and 18.04.
Our datasets for evaluation of visual-inertial odometry methods were collected inside and outside of three buildings - Soft, Mech, and Micro. The indoor parts include typical scenes such as narrow passages, staircases, large halls, clutter workshop, open offices, corridor junctions and so on. The outdoor parts include trees, roads, parking lots, and building entrance. Challenging cases such as over or under exposure, texture-less walls, distance features, and fast camera motions can be found in our datasets.
The performance of VIO is evaluated by the end-to-end error that computed from the ArUco pattern placed at the starting point. We measured its accuracy with VICON and found the average positional error is about 3 centimeters.
Click the following links to download the datasets.
Download link | Soft-01 | Soft-02 | Soft-03 | Soft-04 |
---|---|---|---|---|
Traveling distance | 315m | 438m | 348m | 400m |
Download link | Mech-01 | Mech-02 | Mech-03 | Mech-04 |
---|---|---|---|---|
Traveling distance | 341m | 389m | 318m | 650m |
Download link | MicroA-01 | MicroA-02 | MicroA-03 | MicroA-04 |
---|---|---|---|---|
Traveling distance | 258 | 190m | 388m | 238m |
Download link | MicroB-01 | MicroB-02 | MicroB-03 | MicroB-04 |
---|---|---|---|---|
Traveling distance | 339 | 306m | 485m | 357m |
We use the same format and directory structure as the Euroc datasets. For example, the file structure after extraction of the Soft-01.zip file is shown as
Soft-01 +-- Soft-01 | +-- cam0 | | +-- data #storing the images | | data.csv #list of images | +-- imu0 | | data.csv #imu measurements | Soft-01-ArUco-a.txt #ArUco ground-truth pose file | Soft-01-ArUco-b.txt #ArUco ground-truth pose file | tango_pose.txt #Tango VIO result
There is a little of bit difference in the file structure of some datasets using VICON as the reference, like MicroA-03
MicroB-03 +-- MicroB-03 | +-- cam0 | | +-- data #storing the images | | data.csv #list of images | +-- imu0 | | data.csv #imu measurements | vicon.txt #ground-truth pose file from VICON | tango_pose.txt #Tango VIO result
The extra files ‘Soft-01-ArUco-a.txt’ and ‘Soft-01-ArUco-b.txt’ describe the camera poses computed from the ArUco tags. The ‘vicon.txt’ file containts the motion capture data from the VICON system. Their formats are as the following:
<timestamp (in seconds)> <x> <y> <z> <quat_w> <quat_x> <quat_y> <quat_z>
... ...
2580.831073 0.14008 -0.10809 1.003 -0.21064 0.97635 0.025657 -0.041372
... ...
The ‘tango_pose.txt’ is the VIO result from Project Tango Tablet, whose format is described as
<second> <nano second> <1 - a number for future use> <quat_w> <quat_x> <quat_y> <quat_z> <x> <y> <z>
... ...
000004369 611521000 1 0.737397 0.675205 -0.011286 -0.0147454 -0.0636032 0.0252636 -0.000515278
... ...
After downloading the binary file and extraction of ‘Soft-01.zip’ to the ‘Soft-01’ folder, we run the following command to start VIO.
./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml
Here, ‘structvio_data.yaml’ is a configuration file for the running algorithm, which includes the camera and imu parameters.
Here structvio_data.yaml and euroc_data.yaml are the default configurations for StructVIO datasets and Euroc datasets. You can type
./structvio --help
to get the usage of different arguments. Some of them are listed in the following.
-g, --gui_on Display the GUIs -t <Image latency (nanoseconds)>, --img_latency <Image latency (nanoseconds)> Time latency of image -i <root path of the input data>, --input_dir <root path of the input data> (required) The root path of the single set of data -n <name of the data>, --data_name <name of the data> (required) The name of the data -r <result dir>, --result_dir <result dir> (required) The folder to save the results -c <.yaml file of configuration>, --cfg_yaml <.yaml file of configuration> (required) The .yaml file that specifies the sensor parameters and program options -p <number>, --point_num <number> Number of points used -l <0|1|2>, --line_type <0|1|2> Type of lines used: 0-structlines, 1-general lines, 2-both -f <0|1|2>, --feature_type <0|1|2> Type of features used: 0 - point, 1 - line, 2 - both
If you want to run point-only VIO, please type
./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0
You can also change the number of points allowed to be detected by using ‘-p’ option. For example, we change the number to 50.
./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0 -p 50
To run StructVIO on the Euroc datasets, download the configuration file euroc_data.yaml and type
./structvio -i ./MH_01_easy -n mav0 -r MH_01_easy-res -c euroc_data.yaml
The camera intrinsics is specified by a
#camera intrinsic marix K: !!opencv-matrix rows: 3 cols: 3 dt: d data: [ 250.0,0.0,320.0,0.0,250.0,240.0,0.0, 0.0, 1.0]
StructVIO supports three distortion models - Radial-Tangential, FOV and Equidistant models.
#camera distortion parameters (five parameters for Radial-Tangential model) kc: !!opencv-matrix rows: 1 cols: 5 dt: d data: [-0.325283107218,0.118269672106,6.91877400175e-05,0.000704614881902,-0.0199427782057]
#distortion parameters (one parameter FOV model, -10000 represents Invalid values) kc: !!opencv-matrix rows: 1 cols: 5 dt: d data: [0.92378997802734375,-10000.0,-10000.0,-10000.0,-10000.0]
#camera distortion parameters (four parameters for Equidistant model) kc: !!opencv-matrix rows: 1 cols: 5 dt: d data: [-0.0133484559473,-0.0618812617667,0.076141294789,-0.0275889068285,-10000]
The Imu parameters are from Kalibr. After running Kalibr, we can get a ‘.txt’ file that stores all the calibration results. Here are the corresponding variables between the StructVIO .yaml file and the Kalbir output .txt file.
R_ga (StructVIO yaml)<-> Gyroscope.C_gyro_i (Kalibr output .txt) gyro_M (StructVIO yaml) <-> Gyroscope.M (Kalibr output .txt) acc_M (StructVIO yaml) <-> Accelerometer.M (Kalibr output .txt)
The relative transformation is described the transformation from the camera frame to the imu frame in StructVIO’s yaml file.
#transformation from the camera frame to the IMU frame (quaternion+translation) # or a 3x4 matrix: [R t] corresponding to T_ic of Kalbir's output .txt file imu_cam_trans: !!opencv-matrix rows: 1 cols: 7 dt: d data: [-0.115784,0.993255,0.0033886,0.0051271,0.00648922,-0.0123893,-0.00512952]
After running StructVIO, the result of state estimates at each time step will be saved as ‘state.txt’ in the ‘Soft-01-res’ directory. The ‘state.txt’ file is defined as
<image id> <seconds> <nano seconds> <1x4 quat_wi> <1x3 p_wi> <1x3 v_wi> <1x3 gyro bias> <1x3 acc bias> <1x4 quat_ic> <1x3 p_ic> ... ... 64,3081,46683334,0.989035,0.128174,-0.0731408,0.00554781,0.139563,0.0427987,-0.0472643,-0.0868763,0.199802,0.0756805,-0.00352615,-0.00140592,-0.00106152,-0.229281,-0.176175,0.0945192,-0.117153,0.992984,0.0145482,0.00677828,0.00157036,-0.0129524,-0.00040645, ... ...
where the quat_wi, p_wi, v_wi, quat_ic, and p_ic are defined as:
quat_wi - quaternion from the IMU frame to the world frame p_wi - position in the world frame v_wi - velocity in the world frame (quat_ic, p_ic) - transformation from the camera frame to the IMU frame.
For StructVIO datasets, we provide a script vio_eva.py for computing the end-to-end positional error of StructVIO as described in the paper. Note that please install evo_tools first.
vio_eva.py -r <path to the result file - 'state.txt'> -d <path to the folder of data - e.g. 'Mech-01'>
For EuRoC datasets, we provide a simple script con2tum.py to convert the StructVIO result and EuRoC ground truth into TUM’s format for comparison.
#convert the results and ground truth into TUM's format. conv2tum.py -t structvio -i state.txt -o result.tum conv2tum.py -t euroc -i xx/mav0/state_groundtruth_estimate0/data.csv -o gt.tum #call evo for comparison #(absolute positional error) evo_ape tum gt.tum result.tum -va --save_results res.zip #(relative positional error) evo_rpe tum gt.tum result.tum -d 1 -r full