StructVIO : Visual-inertial Odometry with Structural Regularity of Man-made Environments

Danping Zou - Institute for Sensing and Navigation @ Shanghai Jiao Tong University

In this project, we develop a novel visual-inertial odometry approach that adopts structural regularity in man-made environments. Instead of using Manhattan world assumption, we use Atlanta world model to describe such regularity. An Atlanta world is a world that contains multiple local Manhattan worlds with different heading directions. Each local Manhattan world is detected on-the-fly, and their headings are gradually refined by the state estimator when new observations are coming. With fully exploration of structural lines that aligned with each local Manhattan worlds, our visual-inertial odometry method become more accurate and robust, as well as much more flexible to different kinds of complex man-made environments. Through extensive benchmark tests and real-world tests, the results show hat the proposed approach outperforms the state-of-the-art visual-inertial systems in large-scale man-made environments. Mahattan world vs Atlanta world structvio

Currently the paper is under review, please click here to download the arXiv paper.


We provide a binary executable for testing StructVIO. Currently it is only tested on Ubuntu 16.04, 17.04, and 18.04.



Our datasets for evaluation of visual-inertial odometry methods were collected inside and outside of three buildings - Soft, Mech, and Micro. The indoor parts include typical scenes such as narrow passages, staircases, large halls, clutter workshop, open offices, corridor junctions and so on. The outdoor parts include trees, roads, parking lots, and building entrance. Challenging cases such as over or under exposure, texture-less walls, distance features, and fast camera motions can be found in our datasets.

Snapshots of StructVIO datasets
Snapshots of StructVIO datasets

The performance of VIO is evaluated by the end-to-end error that computed from the ArUco pattern placed at the starting point. We measured its accuracy with VICON and found the average positional error is about 3 centimeters.

ArUco’s accuracy evaluated by VICON
ArUco’s accuracy evaluated by VICON

Click the following links to download the datasets.

Download link Soft-01 Soft-02 Soft-03 Soft-04
Traveling distance 315m 438m 348m 400m
Download link Mech-01 Mech-02 Mech-03 Mech-04
Traveling distance 341m 389m 318m 650m
Download link MicroA-01 MicroA-02 MicroA-03 MicroA-04
Traveling distance 258 190m 388m 238m
Download link MicroB-01 MicroB-02 MicroB-03 MicroB-04
Traveling distance 339 306m 485m 357m

File format & structure

We use the same format and directory structure as the Euroc datasets. For example, the file structure after extraction of the file is shown as

+-- Soft-01
|   +-- cam0
|   |  +-- data           #storing the images
|   |  data.csv           #list of images
|   +-- imu0
|   |  data.csv           #imu measurements
|   Soft-01-ArUco-a.txt   #ArUco ground-truth pose file
|   Soft-01-ArUco-b.txt   #ArUco ground-truth pose file
|   tango_pose.txt        #Tango VIO result

There is a little of bit difference in the file structure of some datasets using VICON as the reference, like MicroA-03

+-- MicroB-03
|   +-- cam0
|   |  +-- data           #storing the images
|   |  data.csv           #list of images
|   +-- imu0
|   |  data.csv           #imu measurements
|   vicon.txt             #ground-truth pose file from VICON
|   tango_pose.txt        #Tango VIO result

The extra files ‘Soft-01-ArUco-a.txt’ and ‘Soft-01-ArUco-b.txt’ describe the camera poses computed from the ArUco tags. The ‘vicon.txt’ file containts the motion capture data from the VICON system. Their formats are as the following:

<timestamp (in seconds)> <x> <y> <z> <quat_w> <quat_x> <quat_y> <quat_z>
... ...
2580.831073 0.14008 -0.10809 1.003 -0.21064 0.97635 0.025657 -0.041372
... ...

The ‘tango_pose.txt’ is the VIO result from Project Tango Tablet, whose format is described as

<second> <nano second> <1 - a number for future use> <quat_w> <quat_x> <quat_y> <quat_z> <x> <y> <z>
... ...
000004369   611521000   1   0.737397    0.675205    -0.011286   -0.0147454  -0.0636032  0.0252636   -0.000515278
... ...

Running StructVIO

After downloading the binary file and extraction of ‘’ to the ‘Soft-01’ folder, we run the following command to start VIO.

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml

Here, ‘structvio_data.yaml’ is a configuration file for the running algorithm, which includes the camera and imu parameters.

Here structvio_data.yaml and euroc_data.yaml are the default configurations for StructVIO datasets and Euroc datasets. You can type

./structvio --help

to get the usage of different arguments. Some of them are listed in the following.

-g,  --gui_on
  Display the GUIs

-t <Image latency (nanoseconds)>,  --img_latency <Image latency
  Time latency of image

-i <root path of the input data>,  --input_dir <root path of the input
  (required)  The root path of the single set of data

-n <name of the data>,  --data_name <name of the data>
  (required)  The name of the data

-r <result dir>,  --result_dir <result dir>
  (required)  The folder to save the results

-c <.yaml file of configuration>,  --cfg_yaml <.yaml file of
  (required)  The .yaml file that specifies the sensor parameters and
  program options

-p <number>,  --point_num <number>
  Number of points used

-l <0|1|2>,  --line_type <0|1|2>
  Type of lines used: 0-structlines, 1-general lines, 2-both

-f <0|1|2>,  --feature_type <0|1|2>
  Type of features used: 0 - point, 1 - line, 2 - both

If you want to run point-only VIO, please type

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0

You can also change the number of points allowed to be detected by using ‘-p’ option. For example, we change the number to 50.

./structvio -i ./Soft-01 -n Soft-01 -r Soft-01-res -c structvio_data.yaml - f 0 -p 50

To run StructVIO on the Euroc datasets, download the configuration file euroc_data.yaml and type

./structvio -i ./MH_01_easy -n mav0 -r MH_01_easy-res -c euroc_data.yaml

Camera intrinsics

The camera intrinsics is specified by a 3×3 matrix in the ‘.yaml’ file:

For example, if you get fx=250,fy=250,cx=320,cy=240, you need specify in the ‘.yaml’ configuration file as the following.

#camera intrinsic marix
K: !!opencv-matrix
   rows: 3
   cols: 3
   dt: d
   data: [ 250.0,0.0,320.0,0.0,250.0,240.0,0.0, 0.0, 1.0]

Camera distortion

StructVIO supports three distortion models - Radial-Tangential, FOV and Equidistant models.

#camera distortion parameters (five parameters for Radial-Tangential model)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [-0.325283107218,0.118269672106,6.91877400175e-05,0.000704614881902,-0.0199427782057]
#distortion parameters (one parameter FOV model, -10000 represents Invalid values)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [0.92378997802734375,-10000.0,-10000.0,-10000.0,-10000.0]
#camera distortion parameters (four parameters for Equidistant model)
kc: !!opencv-matrix
   rows: 1
   cols: 5
   dt: d
   data: [-0.0133484559473,-0.0618812617667,0.076141294789,-0.0275889068285,-10000]

Imu parameters

The Imu parameters are from Kalibr. After running Kalibr, we can get a ‘.txt’ file that stores all the calibration results. Here are the corresponding variables between the StructVIO .yaml file and the Kalbir output .txt file.

R_ga (StructVIO yaml)<-> Gyroscope.C_gyro_i (Kalibr output .txt)
gyro_M (StructVIO yaml) <-> Gyroscope.M (Kalibr output .txt)
acc_M (StructVIO yaml) <-> Accelerometer.M (Kalibr output .txt)

Relative transformation between the camera and IMU

The relative transformation is described the transformation from the camera frame to the imu frame in StructVIO’s yaml file.

#transformation from the camera frame to the IMU frame (quaternion+translation)
# or a 3x4 matrix: [R t] corresponding to T_ic of Kalbir's output .txt file
imu_cam_trans: !!opencv-matrix
   rows: 1
   cols: 7
   dt: d
   data: [-0.115784,0.993255,0.0033886,0.0051271,0.00648922,-0.0123893,-0.00512952]

Results of StructVIO

After running StructVIO, the result of state estimates at each time step will be saved as ‘state.txt’ in the ‘Soft-01-res’ directory. The ‘state.txt’ file is defined as

<image id> <seconds> <nano seconds> <1x4 quat_wi> <1x3 p_wi> <1x3 v_wi> <1x3 gyro bias> <1x3 acc bias> <1x4 quat_ic> <1x3 p_ic>
... ...
... ...

where the quat_wi, p_wi, v_wi, quat_ic, and p_ic are defined as:

quat_wi - quaternion from the IMU frame to the world frame
p_wi - position in the world frame
v_wi - velocity in the world frame
(quat_ic, p_ic) - transformation from the camera frame to the IMU frame.

Scripts for evaluation

For StructVIO datasets, we provide a script for computing the end-to-end positional error of StructVIO as described in the paper. Note that please install evo_tools first. -r <path to the result file - 'state.txt'> -d <path to the folder of data - e.g. 'Mech-01'>

For EuRoC datasets, we provide a simple script to convert the StructVIO result and EuRoC ground truth into TUM’s format for comparison.

#convert the results and ground truth into TUM's format. -t structvio -i state.txt -o result.tum -t euroc -i xx/mav0/state_groundtruth_estimate0/data.csv -o gt.tum
#call evo for comparison
#(absolute positional error)
evo_ape tum gt.tum result.tum -va --save_results
#(relative positional error)
evo_rpe tum gt.tum result.tum -d 1 -r full