Deep Learning for Bronchoscopy Navigation

In bronchoschopy, computer vision systems for navigation assistance are an attractive low-cost solution to guide the endoscopist to target peripheral lesions for biopsy and histological analysis. We propose a decoupled deep learning architecture that projects input frames onto the domain of CT renderings, thus allowing offline training from patient-specific CT data.

The proposed workflow is shown in Fig. 1 below. First, given a dataset of input bronchoscopy frames and their corresponding CT renderings, an RGB transcoder network (1) is created which maps the input frames into a textureless representation resembling the renderings. This requires the existence of a database of renderings aligned to the input frames, whose creation is discussed in the paper. Separately, a depth estimator network (2) is trained to map renderings to their corresponding depth information. The networks are then chained together, so that depth is automatically inferred from an input RGB frame.

Fig. 1. System architecture

The reason behind this architecture lies in the nature and availability of training data. If we consider a hypothetical network mapping RGB inputs directly to depth, this could either be trained online with a SLAM system which iteratively registers the CT volume to the input images, or offline using databases of scope images and depth data from other patients. In the first case, navigation could already be performed by SLAM, making dense 3D reconstruction redundant, while in the second patient specificity is lost.

However, SLAM systems are still inadequate for this task, leaving offline training as the only alternative. A popular way to mitigate the lack of tailored data is to pre-train networks with application-specific renderings and subsequently refine with real data. We propose to flip around this paradigm by projecting the real frames onto the domain of CT renderings, and to perform the mapping from renderings to depth after training with large amounts of automatically generated patient-specific renderings.

In this way, it is possible to retain patient specificity by training with arbitrarily large amounts of pre-operative renderings which reflect the individual patients morphological characteristics. At the same time, assuming a constant scope model and setup, textural differences between subjects are less pronounced, making it possible to learn a mapping from input frames to renderings using offline databases from different subjects.

The fully convolutional network architecture is implemented on GPU and tested on a phantom dataset involving 32 video sequences and ∼60k frames with aligned ground truth and renderings, which is made available as the first public dataset for bronchoscopy navigation.


For registration, rather than operating a full 3D point cloud to mesh alignment, we explore the possibility of using bronchial bifurcations as anchor points for registration initialisation. To this end, we produce binary masks highlighting bronchial splits, generated by physically “painting” the mesh and generating the views using the same virtual routes but with a shader that only outputs texture information, with no shading due to structure. The output is shown in the video below.

For effective 3D-3D registration, initialisation is critical. This is especially true for point clouds produced by visual SLAM systems, since the scale is also unknwon. We hypothesise that a scene segmentation program, similar to the ones in use for autonomous driving, can help for this task. A working bronchial bifurcation segmentation would automatically label some of the points in the SLAM point cloud as bifurcations. Since the route is known in advance, the sequence of bifurcations that should be seen is also known in advance. It would be therefore possible to have the bifurcation centroids working as anchor points for initial alignment between the 3D point cloud generated by SLAM and the CT mesh. The alignment can then be refined by conventional methods.


All data is available at the Dropbox folder link below. Raw data (>300GB!!!) is also gradually being added. Please contact me for any questions or any further data you might need for your project scenario.


(10/2017): Added a Jupyter Python notebook in folder “Scripts/” to load experimental data (from folder Data/)

(07/2017): The folder has been updated with additional data and instructions.

Dropbox folder


Related papers.

Visentini-Scarzanella, Marco ; Sugiura, Takamasa ; Kaneko, Toshimitsu ; Koto, Shinichiro

Deep Monocular 3D Reconstruction for Assisted Navigation in Bronchoscopy Inproceedings

International Conference on Information Processing in Computer-Assisted Interventions (IPCAI), 2017.


Visentini-Scarzanella, Marco ; Sugiura, Takamasa ; Kaneko, Toshimitsu ; Koto, Shinichiro

Deep Monocular 3D Reconstruction for Assisted Navigation in Bronchoscopy Journal Article

International Journal of Computer Assisted Radiology and Surgery (IJCARS), 2017.

Links | BibTeX

Visentini-Scarzanella, Marco; Kawasaki, Hiroshi

Simultaneous Camera, Light Position and Radiant Intensity Distribution Calibration Incollection

Image and Video Technology, pp. 557–571, Springer Nature, 2015.

Links | BibTeX

Posted on Format Aside