In emergency rescue scenarios, rapid identification of human casualties is a critical first step in enhancing emergency medical response. This task can be limited by the physical and cognitive capacity of rescue perso...
详细信息
ISBN:
(纸本)9798350336702
In emergency rescue scenarios, rapid identification of human casualties is a critical first step in enhancing emergency medical response. This task can be limited by the physical and cognitive capacity of rescue personnel, who are exposed to significant risk. The use of small unmanned aerial systems (sUAS) equipped with autonomous casualty assessment abilities can reduce these limitations and risks by enabling remote casualty detection, identification, and vitals assessment, providing standoff protection, and eliminating the need for human personnel to access the potentially hazardous scene. This paper presents a vision-based casualty assessment framework and specifically discusses our casualty identification software, which is designed to recognize the faces of casualties and identify their nametapes in images captured by sUAS under realistic conditions. Our approach addresses the limitations of the sUAS-captured long-distance images to enable accurate identification in challenging casualty monitoring situations. The face and nametape recognition algorithms will be integrated into the larger casualty perception framework and embedded into sUAS platforms to assist with emergency rescue operations. The total casualty perception system will detect, identify, and evaluate the condition of casualties from a remote location, providing standoff protection to first responders and rapid information to inform a suitable medical treatment plan.
The introduction of affordable wearable cameras and eye trackers have led to a massive amount of egocentric (or first-person view) videos, bringing new challenges to the computer vision community for understanding and...
详细信息
Manipulating 3D objects has been among the active research topic for 3D vision. With the development and success of neural radiance field (NeRF) [1] on scene modeling, synthesizing and manipulating 3D objects using su...
详细信息
ISBN:
(纸本)9781728198354
Manipulating 3D objects has been among the active research topic for 3D vision. With the development and success of neural radiance field (NeRF) [1] on scene modeling, synthesizing and manipulating 3D objects using such a representation becomes desirable. In this paper, we introduce a semantic-aware generative NeRF, which is able to interpret the latent representation learned by category-specific generative NeRFs and to achieve editing of particular part attributes. With pretrained generative NeRF, we propose to deploy a semantic segmentor for performing part segmentation on the object category. This allows the rendering of the 2D image and prediction of the corresponding segmentation mask. Our proposed scheme learns to manipulate the resulting latent representation, optimized to edit the object part of interest with varying degrees. We conduct experiments on various object categories on benchmark datasets, and the results successfully verify the effectiveness and practicality of our proposed model.
Aiming at the problem of human-computer interaction of teaching robots in secondary vocational schools and aiming to improve the teaching quality, this paper conducts a study on the speech enhancement of secondary voc...
详细信息
Anomaly detection in computer vision seeks to identify samples outside of a predefined distribution, including texture defect detection and semantic anomaly detection. However, existing methods are difficult to simult...
详细信息
ISBN:
(纸本)9781728198354
Anomaly detection in computer vision seeks to identify samples outside of a predefined distribution, including texture defect detection and semantic anomaly detection. However, existing methods are difficult to simultaneously achieve high performance for both types of anomaly detection. To address this issue, we propose a new flow-based anomaly detection method. Firstly, we use semantic features extracted from a pre-trained backbone to learn the distribution of normal data from a semantic perspective. Secondly, we introduce a multi-frequency feature fusion module to aggregate semantic and texture information, which substantially improves performance for both types of anomaly detection at the same time. Extensive experiments on multiple well-known datasets demonstrate that our proposed method performs well in both types of anomaly detection, specially, achieves state-of-the-art performance in one-class anomaly detection. The codes will be available at https://***/SYLan2019/FOADMFFF.
Current visual captioning technologies typically transform 3D/2D visual information into one-dimensional sequential data and employ language models to generate corresponding descriptions. This approach, however, compr...
详细信息
Enhancing low-light images is challenging as it requires simultaneously handling global and local contents. This paper presents a new solution which incorporates the vision transformer (ViT) into Laplacian pyramid and...
详细信息
ISBN:
(纸本)9781728198354
Enhancing low-light images is challenging as it requires simultaneously handling global and local contents. This paper presents a new solution which incorporates the vision transformer (ViT) into Laplacian pyramid and explores cross-layer dependence within the pyramid. It first applies Laplacian pyramid to decompose the low-light image into a low-frequency (LF) component and several high-frequency (HF) components. As the LF component has a low resolution and mainly includes global attributes, ViT is applied on it to explore the interdependence among global contents. Since there exists strong spatial correlation among different frequency components, the refined features from a lower pyramid layer are used to assist the refinement of upper-layer features. Experiments demonstrate that our approach achieves better performance than state-of-the-art methods, while maintaining a relative small model size and low computational complexity. Our source code and trained model will be released at https://***/Xinjie-Wei/DLEN.
Generating the heatmaps is one of the explanation methods to show what regions the model use to predict in vision tasks. GradCAM is a popular approach to provide such heatmaps. However, GradCAM is post-hoc, and its he...
详细信息
This paper aims at measuring deflection of mainly beam structures in a non-contact, installation-free and marker-less way with the aid of stereo vision, edge detecting, projection transformation, spline interpolation ...
详细信息
Conventional image compression techniques are mostly developed for the human visual system. However, with the extensive use of deep neural networks (DNNs), more and more images will be consumed by DNN-based intelligen...
详细信息
ISBN:
(纸本)9781728198354
Conventional image compression techniques are mostly developed for the human visual system. However, with the extensive use of deep neural networks (DNNs), more and more images will be consumed by DNN-based intelligent machines, which makes it crucial to develop image compression techniques customized for DNN vision while being JPEG compliant. In this paper, we first propose a new distortion measure, dubbed the sensitivity weighted error (SWE). Then, we develop OptS, a DNN-oriented compression algorithm with full JPEG compatibility, which designs optimal quantization tables for DNN models based on SWE. To test the performance of our algorithm, experiments of image classification are conducted on the ImageNet dataset for two prevailing DNN models. Results demonstrate that our algorithm achieves better rate-accuracy (R-A) performance than the default JPEG. For some DNN model, the compression ratio of our algorithm can reach 8.3x(1), reducing the compression rate (bits per pixel, bpp) of the default JPEG by 57.4% with no accuracy loss. Our source code is available at https://***/zkxufo/***.
暂无评论