Back close

Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation

Publication Type : Conference Proceedings

Publisher : IEEE

Source : 2022 IEEE 7th International conference for Convergence in Technology (I2CT), 2022, pp. 1-7

Url : https://ieeexplore.ieee.org/document/9824488

Campus : Amritapuri

School : School of Computing

Year : 2022

Abstract : Extracting depth information from a single RGB image is a fundamental and challenging task in computer vision with wide-ranging applications. This task cannot be solved using traditional methods like multi-view geometry but only by deep learning. Existing methods using convolutional neural nets produce inconsistent and blurry results due to the lack of long-range dependencies. With the recent success of Transformer networks in computer vision, which can process information locally and globally, we leverage this idea to propose a novel architecture named Focal-WNet in this paper. This architecture consists of two separate encoders and a single decoder. The main aim of this network is to learn most monocular depth cues like relative scale, contrast differences, texture gradient, etc. We incorporate focal self-attention instead of vanilla self-attention to reduce the computational complexity of the network. Along with the focal transformer layers, we leverage a convolutional architecture to learn depth cues that cannot be learned by a transformer alone as some cues like occlusion require a local receptive field and are easier for a conv-net to learn. Extensive experiments show that the proposed Focal-WNet achieves competitive results on two challenging datasets.

Cite this Research Publication : G. Manimaran and J. Swaminathan, "Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation," 2022 IEEE 7th International conference for Convergence in Technology (I2CT), 2022, pp. 1-7, doi: 10.1109/I2CT54291.2022.9824488.

Admissions Apply Now