Relational Part-Aware Learning for Complex Composite Object Detection in High-Resolution Remote Sensing Images

From geohpgc
Jump to navigation Jump to search

Click here to get full paper: File:Relational Part-Aware Learning for Complex Composite Object Detection in High-Resolution Remote Sensing Images.pdf

Abstract

In high-resolution remote sensing images (RSIs), complex composite object detection (e.g., coal-fired power plant detection and harbor detection) is challenging due to multiple discrete parts with variable layouts leading to complex weak inter-relationship and blurred boundaries, instead of a clearly defined single object. To address this issue, this article proposes an end-to-end framework, i.e., relational part-aware network (REPAN), to explore the semantic correlation and extract discriminative features among multiple parts. Specifically, we first design a part region proposal network (P-RPN) to locate discriminative yet subtle regions. With butterfly units (BFUs) embedded, feature-scale confusion problems stemming from aliasing effects can be largely alleviated. Second, a feature relation Transformer (FRT) plumbs the depths of the spatial relationships by part-and-global joint learning, exploring correlations between various parts to enhance significant part representation. Finally, a contextual detector (CD) classifies and detects parts and the whole composite object through multirelation-aware features, where part information guides to locate the whole object. We collect three remote sensing object detection datasets with four categories to evaluate our method. Consistently surpassing the performance of state-of-the-art methods, the results of extensive experiments underscore the effectiveness and superiority of our proposed method.

Introduction

Object detection in remote sensing areas, one ofthe most interesting yet formidable issues, laying the groundwork for interpreting and understanding remote sensing images (RSIs) . Owing to the achievements in high-resolution RSIs datasets and deep learning algorithms,tremendous progress in the accuracy and efficiency of object detection in remote sensing has been witnessed. However, most of the existing algorithms are designed for clearly defined single-object detection like vehicle detection, yet overlooking many complex composite objects in optical RSIs (e.g., coal-fired power plant and airport) which we should think of as a whole. These complex composite objects provide essential support for society (e.g., power plants for electricity generation and airports for transportation), so monitoring them in RSIs is equally important. With a target to identify these combined complexes with multiple parts and nonrigid layouts, and the difficulties arising from the complicated background and blurred boundaries, it is a challenging research problem.
Compared with single-object detection, complex composite object detection in RSIs is difficult for two reasons and Fig. 1 shows the comparison between complex composite objects and single objects. First, these objects are characterized by intricate parts with various layouts. For example, a coal-fired power plant contains chimneys and condensing towers, and such complex detection target involves problems including complex spatial relationships between parts and nonrigid boundaries. Nonrigid boundaries can enlarge the sizes of bounding boxes and decrease the precision. The complex composite manner indicates the parts are discrete, and other textures between parts make the composite spatial relationships weak and disturbed, leading to difficulties in detecting a composite object as a whole precisely. Second, complex composite objects are frequently situated amidst surroundings with similar textures, further complicating detection. For instance, coal-fired power plants are often located in industrial areas where other similar industrial infrastructures may hamper coal-fired power plant detection performance. Similar surroundings contribute to the blurred boundaries and puzzle the bounding box localization. Unlike single objects, such as cars or ships, which own a unified structure and a unified semantic meaning without significant internal complexity, composite objects, such as a coal-fired power plant, are characterized by a more complex structure composed of multiple semantic meanings with internal complexity. As the red lines and green ovals in Fig. 1(a) and (b), complex yet weak spatial inter-relationships and blurred boundaries caused by multiple components with various layouts make composite objects harder to detect than single objects.
Nevertheless, commonly used CNN-based object detection methods rely on feature extraction from local regions and use these features to generate bounding boxes. For composite object detection, these algorithms may fail to handle the semantic gap between low-level features and high-level understanding of objects caused by complex and diverse spatial inter-relationships between parts. Additionally, the highly variable appearance of parts makes it difficult to generalize across different instances of the same object. Consequently, the direct application of existing algorithms to composite object detection is ill-advised, and part-based methods are better for discovering discriminative and subtle components.
Part-based methods are used in fine-grained visual classi-fication tasks, aiming to generate rich feature representations or localize parts for feature enhance�ment. By modeling a complex structure as an assemblage of distinct parts that can be localized and recognized individually, part-based methods offer heightened efficacy for composite object detection. Recently, a few efforts have been made on part-based meth�ods for composite object detection in RSIs. For example, Sun et al. proposed a unified part-based CNN-based network consisting of a part localization module and a con�text refinement module to localize the most representative part features. Although previous work has reached promising results, the attention to constraints on local feature learning and simple concatenation of part features lead to the regardless of discriminative parts and the potential in long-range spatial inter-relationships. We argue that investigating the potential correlation between parts and constructing a global semantic understanding of objects can significantly benefit composite object detection in RSIs.