Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id). Most ...
详细信息
ISBN:
(纸本)9781665445092
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id). Most existing works employ two-stage detectors like Faster-RCNN, yielding encouraging accuracy but with high computational overhead. In this work, we present the Feature-Aligned Person Search Network (AlignPS), the first anchor-free framework to efficiently tackle this challenging task AlignPS explicitly addresses the major challenges, which we summarize as the misalignment issues in different levels (i.e., scale, region, and task), when accommodating an anchor-free detector for this task More specifically, we propose an aligned feature aggregation module to generate more discriminative and robust feature embeddings by following a "re-id first" principle. Such a simple design directly improves the baseline anchor-free model on CUHK-SYSU by more than 20% in mAR Moreover AlignPS outperforms state-of-the-art two-stage methods, with a higher speed. The code is available at https://***/daodaofr/AlignPS.
Anomaly localization, with the purpose to segment the anomalous regions within images, is challenging due to the large variety of anomaly types. Existing methods typically train deep models by treating the entire imag...
详细信息
ISBN:
(纸本)9781665445092
Anomaly localization, with the purpose to segment the anomalous regions within images, is challenging due to the large variety of anomaly types. Existing methods typically train deep models by treating the entire image as a whole yet put little effort into learning the local distribution, which is vital for this pixel-precise task. In this work, we propose an unsupervised patch-based approach that gives due consideration to both the global and local information. More concretely, we employ a Local-Net and Global-Net to extract features from any individual patch and its surrounding respectively. Global-Net is trained with the purpose to mimic the local feature such that we can easily detect an abnormal patch when its feature mismatches that from the context. We further introduce an Inconsistency Anomaly Detection (IAD) head and a Distortion Anomaly Detection (DAD) head to sufficiently spot the discrepancy between global and local features. A scoring function derived from the multi-head design facilitates high-precision anomaly localization. Extensive experiments on a couple of real-world datasets suggest that our approach outperforms state-of-the-art competitors by a sufficiently large margin.
This paper proposes an attention-based multi-level model with a multi-scale backbone for thermal image super-resolution. The model leverages the multi-scale backbone as well. The thermal image dataset is provided by P...
详细信息
ISBN:
(纸本)9781665448994
This paper proposes an attention-based multi-level model with a multi-scale backbone for thermal image super-resolution. The model leverages the multi-scale backbone as well. The thermal image dataset is provided by PBVS 2020 in their thermal image super-resolution challenge. This dataset contains the images with three different resolution scales(low, medium, high) [1]. However, only the medium and high-resolution images are used to train the proposed architecture to generate the super-resolution images in x2, x4 scales. The proposed architecture is based on the Res2net blocks as the backbone of the network. Along with this, the coordinate convolution layer and dual attention are also used in the architecture. Further, multi-level supervision is implemented to supervise the output image resolution similarity with the real image at each block during training. To test the robustness of the proposed model, we evaluated our model on the Thermal-6 dataset [20]. The results show that our model is efficient to achieve state-of-the-art results on the PBVS dataset. Further the results on the Thermal-6 dataset show that the model has a decent generalization capacity.
We show that explicit modeling of composition rules benefits image cropping. Image cropping is considered a promising way to automate aesthetic composition in professional photography. Existing efforts, however;only m...
详细信息
ISBN:
(纸本)9781665445092
We show that explicit modeling of composition rules benefits image cropping. Image cropping is considered a promising way to automate aesthetic composition in professional photography. Existing efforts, however;only model such professional knowledge implicitly, e.g., by ranking from comparative candidates. Inspired by the observation that natural composition traits always follow a specific rule, we propose to learn such rules in a discriminative manner, and more importantly, to incorporate learned composition clues explicitly in the model. To this end, we introduce the concept of the key composition map (KCM) to encode the composition rules. The KCM can reveal the common laws hidden behind different composition rules and can inform the cropping model of what is important in composition. With the KCM, we present a novel cropping-by-composition paradigm and instantiate a network to implement composition-aware image cropping. Extensive experiments on two benchmarks justify that our approach enables effective, interpretable, and fast image cropping.
Recent works of multi-source domain adaptation focus on learning a domain-agnostic model, of which the parameters are static. However, such a static model is difficult to handle conflicts across multiple domains, and ...
详细信息
ISBN:
(纸本)9781665445092
Recent works of multi-source domain adaptation focus on learning a domain-agnostic model, of which the parameters are static. However, such a static model is difficult to handle conflicts across multiple domains, and suffers from a performance degradation in both source domains and target domain. In this paper, we present dynamic transfer to address domain conflicts, where the model parameters are adapted to samples. The key insight is that adapting model across domains is achieved via adapting model across samples. Thus, it breaks down source domain barriers and turns multi-source domains into a single-source domain. This also simplifies the alignment between source and target domains, as it only requires the target domain to be aligned with any part of the union of source domains. Furthermore, we find dynamic transfer can be simply modeled by aggregating residual matrices and a static convolution matrix. Experimental results show that, without using domain labels, our dynamic transfer outperforms the state-of-theart method by more than 3% on the large multi-source domain adaptation datasets - DomainNet.
This article explores the intricate and complex system patternrecognition algorithm based on backpropagation (BP) neural networks, which are efforts to leverage leverage's capabilities. Complicated systems, as se...
详细信息
We aim to provide a comprehensive view of the inference efficiency of DETR-style detection models. We explore the effect of basic efficiency techniques and identify the factors that are easy to implement, yet effectiv...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
We aim to provide a comprehensive view of the inference efficiency of DETR-style detection models. We explore the effect of basic efficiency techniques and identify the factors that are easy to implement, yet effectively improve the efficiency-accuracy trade-off. Specifically, we investigate the effect of input resolution, multi-scale feature enhancement, and backbone pre-training. Our experiments support that 1) adjusting the input resolution is a simple yet effective way to achieve a better efficiency-accuracy trade-off. 2) Multi-scale feature enhancement can be lightened with a marginal decrease in accuracy, and 3) improved backbone pre-training can further improve the trade-off.
The introduction of transformer models, which utilize a self-attention mechanism within deep neural networks, represents a notable breakthrough in natural language processing. This advancement has spurred researchers ...
详细信息
Nowadays computers have become a necessity for all computers have made a great leap for us and with the help of that we are able to move to a golden age of Artificial Intelligence. Artificial Intelligence or A.I has h...
详细信息
Detecting out-of-distribution (OOD) inputs is a central challenge for safely deploying machine learning models in the real world. Existing solutions are mainly driven by small datasets, with low resolution and very fe...
详细信息
ISBN:
(纸本)9781665445092
Detecting out-of-distribution (OOD) inputs is a central challenge for safely deploying machine learning models in the real world. Existing solutions are mainly driven by small datasets, with low resolution and very few class labels (e.g., CIFAR). As a result, OOD detection for large-scale image classification tasks remains largely unexplored. In this paper, we bridge this critical gap by proposing a group-based OOD detection framework, along with a novel OOD scoring function termed MOS. Our key idea is to decompose the large semantic space into smaller groups with similar concepts, which allows simplifying the decision boundaries between in- vs. out-of-distribution data for effective OOD detection. Our method scales substantially better for high-dimensional class space than previous approaches. We evaluate models trained on ImageNet against four carefully curated OOD datasets, spanning diverse semantics. MOS establishes state-of-the-art performance, reducing the average FPR95 by 14.33% while achieving 6x speedup in inference compared to the previous best method.
暂无评论