2022 IEEE International Symposium on Multimedia (ISM)
Download PDF

Abstract

Extracting building information using artificial sources from satellite and remote sensing data has become a valuable tool for a variety of applications such as, damage detection, infrastructure construction, land use management, and building energy consumption estimation. Recently, deep learning methods have made much progress in extracting building footprints from remote sensing (RS) imagery but many challenges persist. Convolutional Neural Networks (CNN) have been the fundamental way to extract segments of buildings, but they are not able to capture accurately the global connectivity of representations. To overcome this boundary, researchers proposed Vision Transformers which achieved state of the art accuracy in computer vision tasks [1]. Especially in building extraction from RS imagery several architectures that based on Transformers have been proposed lately. However the experimental scenarios make them difficult to compare and extract meaningful conclusions. Considering this, the current manuscript presents an analytical comparison between diverse Transformer-based semantic segmentation architectures, aiming to observe their predictive performance and computational efficiency in three building footprint extraction RS imagery datasets. Moreover, this work introduces four new architectures which are extensively compared with literature baselines.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles