2D-3D Fusion for Layer Decomposition of Urban Facades

IEEE International Conference on Computer Vision(ICCV 2011)

Yangyan Li1    Qian Zheng1    Andrei Sharf2, 1    Daniel Cohen-Or3    Baoquan Chen1   Niloy J. Mitra4  
1SIAT, China    2Ben-Gurion University    3Tel Aviv University   4University College London


Figure 1: Given a 2D photograph and 3D LiDAR scan of a building (left), we overlay the scan over the rectified photograph (semitransparent) (mid-left). Analyzing the fusion of the two modes allows decomposing the scene into depth-layers (distinctively colored, mid-right) followed by a per-layer symmetry detection that allows completing and augmenting the LiDAR scan with enhanced texture information (two buildings, right).


We present a method for fusing two acquisition modes, 2D photographs and 3D LiDAR scans, for depth-layer decomposition of urban facades. The two modes have complementary characteristics: point cloud scans are coherent and inherently 3D, but are often sparse, noisy, and incomplete; photographs, on the other hand, are of high resolution, easy to acquire, and dense, but view-dependent and inherently 2D, lacking critical depth information. In this paper we exploit the fact that photographs are easy to capture, versatile and can cover more of the building, to enhance the acquired LiDAR data.

Our key observation is that with an initial registration of the 2D and 3D datasets we can decompose the input photographs into rectified depth layers. We decompose the input photographs into rectangular planar fragments and diffuse depth information from the corresponding 3D scan onto the fragments by solving a multi-label assignment problem. Our layer decomposition enables accurate repetition detection in each planar layer, and using the detected repetitions to propagate geometry, remove outliers and enhance the 3D scan. Finally, the algorithm produces an enhanced, layered, textured model. We evaluate our algorithm on complex multi-planar building facades, where direct autocorrelation methods for repetition detection fail. We demonstrate how 2D photographs help improve the 3D scans by exploiting data redundancy, and transferring high level structural information to (plausibly) complete large missing regions.






Figure 2: Top floors of buildings are barely visible to the LiDAR scanner resulting in large missing parts in the 3D scan. The figure shows the input data, the repetition pattern, and the enhanced geometry and textures (shown from two different viewpoints to emphasize that the result is in 3D.


Figure 3: Multi-modal data fusion produces textured polygonal model, with missing parts in LiDAR data being completed using extracted repetition patterns. In this example, two sets of repetitions, in dark blue (top floors) and light blue (low floors), are discovered.


Figure 4: Photographs can have significant parts occluded due to trees and other obstacles (top row). The occlusions, however, are usually different across views thus resulting in improved geometry and texture consolidation when we use more images. In this example we added another photograph from a different view to significantly improve the resulting 3D model (bottom row).

Figure 5: Evaluation of our multi-modal method on synthetic data. We virtually scan a 3D model using a ray-casting sampling technique(left). We accurately compute depth layers (middle) and utilize repetitions to compute a textured polygonal 3D facade (right).


Testing Data



We would like to thank Guowei Wan for initial experiments on this project and Dror Aiger for inspiring discussions.



  title = {2D-3D Fusion for Layer Decomposition of Urban Facades},
  author = {Yangyan Li and Qian Zheng and Andrei Sharf and Daniel Cohen-Or and Baoquan Chen and Niloy J. Mitra},
  booktitle = {Computer Vision (ICCV), 2011 IEEE International Conference on},
  pages = {882-889},
  year = {2011},
  month = November,
  address = {Barcelona, Spain}