To ensure that the wall-climbing robot can accurately walk along the predetermined straight route, it is necessary to obtain the real-time offset and deflection of the wall-climbing robot to provide information feedba...
详细信息
vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exis...
详细信息
ISBN:
(纸本)9781728198354
vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.
Machine learning techniques rely on large and diverse datasets for generalization. Computer vision, natural language processing, and other applications can often reuse public datasets to train many different models. H...
详细信息
ISBN:
(纸本)9798350323658
Machine learning techniques rely on large and diverse datasets for generalization. Computer vision, natural language processing, and other applications can often reuse public datasets to train many different models. However, due to differences in physical configurations, it is challenging to leverage public datasets for training robotic control policies on new robot platforms or for new tasks. In this work, we propose a novel framework, ExAug to augment the experiences of different robot platforms from multiple datasets in diverse environments. ExAug leverages a simple principle: by extracting 3D information in the form of a point cloud, we can create much more complex and structured augmentations, utilizing both generating synthetic images and geometric-aware penalization that would have been suitable in the same situation for a different robot, with different size, turning radius, and camera placement. The trained policy is evaluated on two new robot platforms with three different cameras in indoor and outdoor environments with obstacles.
This paper presents a real-time semantic video communication method for general scenes, combining lossy semantic map coding with motion compensation to achieve reduced bit rates while maintaining perceptual and semant...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
This paper presents a real-time semantic video communication method for general scenes, combining lossy semantic map coding with motion compensation to achieve reduced bit rates while maintaining perceptual and semantic quality. Our findings show that semantic image synthesis effectively adapts to minute errors resulting from motion estimation, eliminating the need to transmit the residuals. We recommend the Group of Pictures approach as a more efficient alternative. Comparative assessments against HEVC and VVC confirm the method's effectiveness. This research paves the way for efficient real-time semantic video communication, addressing the demands of data-intensive visual applications.
The ICASSP-SP Grand Challenge on Hyperspectral Skin vision aims to democratize skin analysis by leveraging low-cost consumer-grade cameras to reconstruct vital spectral reflectance data. Addressing the accessibility l...
详细信息
ISBN:
(纸本)9798350374520;9798350374513
The ICASSP-SP Grand Challenge on Hyperspectral Skin vision aims to democratize skin analysis by leveraging low-cost consumer-grade cameras to reconstruct vital spectral reflectance data. Addressing the accessibility limitation of costly hyperspectral equipment, this challenge tasks participants with decoding skin spectral information crucial for assessing melanin and hemoglobin concentrations. The provided Hyper-Skin dataset is carefully curated following ethical guidelines, consisting of a total of 306 hyperspectral data from 51 human subjects. Through comprehensive evaluation with Spectral Angle Mapper (SAM), fairness and accuracy in spectral reconstruction methods are ensured, encouraging advancements with real-world applications. This challenge attracted 51 teams and yielding 9 complete submissions. The top 5 teams achieves significant performance with average SAM score of 0.09486. This achievement underscores the transformative potential of this initiative in reshaping skin analysis accessibility and driving interdisciplinary progress with profound societal implications.
In recent years, event cameras have achieved significant attention due to their advantages over conventional cameras. Event cameras have high dynamic range, no motion blur, and high temporal resolution. Contrary to tr...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
In recent years, event cameras have achieved significant attention due to their advantages over conventional cameras. Event cameras have high dynamic range, no motion blur, and high temporal resolution. Contrary to traditional cameras which generate intensity frames, event cameras output a stream of asynchronous events based on brightness change. There is extensive ongoing research on performing computer vision tasks like object detection, classification, etc via the event camera. However, due to the unconventional output format of the event camera, it is difficult to perform computer vision tasks directly on the event stream. Mostly, works reconstruct the intensity image from the event stream and then perform such tasks. An important and crucial task is feature detection and description. Scale-invariant feature transform (SIFT) is a widely-used scale-invariant keypoint detector and descriptor that is invariant to transformations like scale, rotation, noise, and illumination. In this work, given an event voxel, we directly generate the LoG pyramid for SIFT keypoint detection. We fit a 3rd-degree polynomial and calculate the polynomial roots to compute the scale-space extrema response for SIFT keypoint detection. Since the extrema computation is performed after LoG thresholding, the solution is computationally less expensive. Experimental results validate the effectiveness of our system.
Deep ensembles are capable of achieving state-of-the-art results on classification and out-of-distribution (OOD) detection tasks. However, their effectiveness is limited due to the homogeneity of learned patterns with...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Deep ensembles are capable of achieving state-of-the-art results on classification and out-of-distribution (OOD) detection tasks. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency-Diversified Deep Ensembles (SDDE1), a novel approach that promotes diversity among ensemble members by leveraging saliency maps. Through incorporating saliency map diversification, our method outperforms conventional ensemble techniques and improves calibration on multiple classification and OOD detection tasks. In particular, the proposed method achieves state-of-the-art OOD detection quality, calibration, and accuracy on multiple benchmarks, including CIFAR10/100 and large-scale ImageNet datasets.
We propose a wheelchair differential speed control method based on the coupling of control signals and environmental information, which firstly determines the motion state of the wheelchair based on the displacement o...
详细信息
ISBN:
(纸本)9798350388084;9798350388077
We propose a wheelchair differential speed control method based on the coupling of control signals and environmental information, which firstly determines the motion state of the wheelchair based on the displacement offset, so as to generate the corresponding control signals;then ultrasound, vision and IMU sensors are used to collect distance, image and angle information respectively, and the environmental information is fused based on fuzzy control to get the integrated environmental factor;Finally, the control signals and Integrated environmental factor are coupled to calculate the expected speeds of the left and right wheels under different motion states. The method fully reflects the environmental information and realizes the full coupling between the control signal and the environmental information, establishes the differential speed model of the wheelchair considering the environmental information, ensures the safety and reliability of the wheelchair during operation, and realizes the intelligent control of the wheelchair.
Prohibited items detection refers to the non-contact inspection of passenger baggage for potential threats through X-ray image. Since the uncertainty of artificial security screening, previous research has mainly conc...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Prohibited items detection refers to the non-contact inspection of passenger baggage for potential threats through X-ray image. Since the uncertainty of artificial security screening, previous research has mainly concentrated on direct transfer by universal detection frameworks based on natural image and design in enhance or aware module with salient features like edge and color. With the increasing complexity in both categories and quantities of X-ray security inspection, the unreliability of direct transfer and complication of taskspecific design make the existing algorithms difficult to reliably and efficiently adapt the complex security inspection. To address this challenge, we propose the Adapter in X-ray (AdaptXray), which firstly explores pre-trained vision Transformer with powerful representation and Parameter Efficient Transfer Learning method applying for prohibited items detection. Specifically, we design Color Prior Extractor to perceive local prior features from different color spaces. Subsequently, we develop Global-aware Self-Adapter to adaptively perceive and optimize the global universal features in the backbone. Additionally, we propose Local-aware Interactive Adapter to incorporate prior knowledge into the pretrained backbone. Thorough experimentation on two public baggage datasets, namely OPIXray and PIDray, demonstrates that the effectiveness of our proposed method, outperforming the existing renown CNN-based detection approaches.
Channel reconstruction transforms a subsampled multispectral image into hyperspectral, offering hyperspectral imaging benefits without a dedicated camera. MST++ is a state of the art channel reconstruction technique, ...
详细信息
暂无评论