Pixel2point: 3D Object Reconstruction From a Single Image Using CNN and Initial Sphere

Page created by Tyrone Shaw

Current Events

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Received December 8, 2020, accepted December 20, 2020, date of publication December 23, 2020,
date of current version January 4, 2021.
Digital Object Identifier 10.1109/ACCESS.2020.3046951

Pixel2point: 3D Object Reconstruction From a
Single Image Using CNN and Initial Sphere
AHMED J. AFIFI 1 , JANNES MAGNUSSON2 , TOUFIQUE A. SOOMRO3 , (Member, IEEE),
AND OLAF HELLWICH1 , (Senior Member, IEEE)
1 Computer Vision and Remote Sensing, Technische Universität Berlin, 10587 Berlin, Germany
2 Institute
for Image Science and Computational Modeling in Cardiovascular Medicine, Charité–Universitätsmedizin Berlin, 13353 Berlin, Germany
3 Department of Electronic Engineering, Quaid-e-Awam University of Engineering, Science, and Technology, Nawabshah 67480, Pakistan

Corresponding author: Ahmed J. Afifi (ahmed.afifi@campus.tu-berlin.de)
This work was supported by the German Research Foundation and the Open Access Publication Fund of TU Berlin.

ABSTRACT 3D reconstruction from a single image has many useful applications. However, it is a
challenging and ill-posed problem as various candidates can be a solution for the reconstruction. In this
paper, we propose a simple, yet powerful, CNN model that generates a point cloud of an object from a single
image. 3D data can be represented in different ways. Point clouds have proven to be a common and simple
representation. The proposed model was trained end-to-end on synthetic data with 3D supervision. It takes a
single image of an object and generates a point cloud with a fixed number of points. An initial point cloud of
a sphere shape is used to improve the generated point cloud. The proposed model was tested on synthetic and
real data. Qualitative evaluations demonstrate that the proposed model is able to generate point clouds that
are very close to the ground-truth. Also, the initial point cloud has improved the final results as it distributes
the points on the object surface evenly. Furthermore, the proposed method outperforms the state-of-the-art
in solving this problem quantitatively and qualitatively on synthetic and real images. The proposed model
illustrates an outstanding generalization to the new and unseen images and scenes.

INDEX TERMS Single-view reconstruction, deep learning, point cloud, CNN.

I. INTRODUCTION the object nature. Some of them were applied to real-world
Single-view reconstruction is a long-standing ill-posed prob- images without any knowledge of the image formation, and
lem and fundamental to many applications such as object the output of these approaches is plausible. One class of these
recognition and scene understanding. Single-view 3D recon- methods focus on curved objects and try to produce smooth
struction means using a single image of an object and utilizing objects. These methods define an energy function to mini-
it to infer the 3D structure of the object so that it can be mize the object surface with respect to some constraints such
viewed from all directions. For multi-view scenarios, a large as a fixed area or volume [18], [20], [28]. Other methods focus
variety of methods has been proposed which are able to on piecewise planer objects and utilize semantic knowledge
present high-quality reconstruction results [10]. The chal- of object locations such as the sky and the ground locations
lenge appears when a single input image is just available for in the image [7].
the reconstruction process. Many approaches were proposed With the astonishing results obtained by applying deep
with restrictions and special assumptions on the input image learning on different computer vision problems, many
to predict 3D geometry [19]. Single-view 3D reconstruction 3D-based models have made great progress in solving differ-
is a hard problem and it mainly depends on the available infor- ent tasks using 3D data directly such as classification, object
mation and the imposed assumptions on the target object. parts segmentation, and 3D shape completion. Also, the avail-
This information or cues provide prior knowledge that helps ability of large-scale datasets [5] encourages researchers to
in generating 3D shapes with plausible precision [19]. formulate and tackle the single-view reconstruction problem.
Before the deep learning era, many approaches have been Volumetric methods were first used to infer the 3D structure
proposed to solve single-view reconstruction depending on of an object from a single view [6]. However, volumetric rep-
resentation suffers from information sparsity and the heavy
The associate editor coordinating the review of this manuscript and computations during the training process. Also, this represen-
approving it for publication was Xiaohui Yuan . tation is ineffective in high-resolution outputs. To overcome

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
110 VOLUME 9, 2021