Epileptic seizure prediction algorithms based on EEG signals can help epilepsy patients take timely measures to avoid risks. However, EEG signals possess high dimensionality, nonlinearity, and strong temporal dependen...
详细信息
ISBN:
(纸本)9798400709647
Epileptic seizure prediction algorithms based on EEG signals can help epilepsy patients take timely measures to avoid risks. However, EEG signals possess high dimensionality, nonlinearity, and strong temporal dependencies, making it difficult for models to integrate global and local features and capture long-term dependencies. To address these issues, we propose EEG VMamba, which use the Visual State Space (VSS) block as the backbone and introduce Convolutional Neural Network (CNN) at the later stage. This approach fully combining the local perception capability of CNN and the global modeling capability of VSS. It was evaluated on the publicly available CHB-MIT dataset to demonstrate its effectiveness in seizure prediction, achieving a sensitivity of 91.1% and an AUC of 0.914. Compared to the seizure prediction method based on vision Transformer, the EEG VMamba demonstrates superior performance, evidenced by higher sensitivity and AUC score.
This study's long-term goal is the development of a communication robot as a partner that can keep talking about specific things about which the user would like to talk and in which they are interested. To achieve...
详细信息
ISBN:
(纸本)9798400703232
This study's long-term goal is the development of a communication robot as a partner that can keep talking about specific things about which the user would like to talk and in which they are interested. To achieve this goal, we developed an interviewer robot that adapts topics based on the user's multimodal attitudes. The robot, utilizing the Japanese GPT-NeoX-3.6, selects questions based on the estimated topic continuance level. We regard the topic continuance level as the degree of the user's speaking willingness (willingness to continue the current topic). This paper aims to validate the multimodal topic continuance recognition model and its adaptive question selection strategy. First, we trained the model on the "Hazumi" dialog corpus, which includes user multimodal behavior in human-virtual agent interactions. Second, 10 participants were interviewed with the robot equipped with the trained model. After the interviews, we asked the participants if the topic continuance/change by the robot was appropriate and validated the estimation accuracy.
To ensure that the wall-climbing robot can accurately walk along the predetermined straight route, it is necessary to obtain the real-time offset and deflection of the wall-climbing robot to provide information feedba...
详细信息
This study examines multi-stage robotic production line performance via queuing analysis to enable more accurate resource planning. It is one of a select few studies of this type designed to boost the efficiency of ma...
详细信息
vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exis...
详细信息
ISBN:
(纸本)9781728198354
vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.
Machine learning techniques rely on large and diverse datasets for generalization. Computer vision, natural language processing, and other applications can often reuse public datasets to train many different models. H...
详细信息
ISBN:
(纸本)9798350323658
Machine learning techniques rely on large and diverse datasets for generalization. Computer vision, natural language processing, and other applications can often reuse public datasets to train many different models. However, due to differences in physical configurations, it is challenging to leverage public datasets for training robotic control policies on new robot platforms or for new tasks. In this work, we propose a novel framework, ExAug to augment the experiences of different robot platforms from multiple datasets in diverse environments. ExAug leverages a simple principle: by extracting 3D information in the form of a point cloud, we can create much more complex and structured augmentations, utilizing both generating synthetic images and geometric-aware penalization that would have been suitable in the same situation for a different robot, with different size, turning radius, and camera placement. The trained policy is evaluated on two new robot platforms with three different cameras in indoor and outdoor environments with obstacles.
This paper presents a real-time semantic video communication method for general scenes, combining lossy semantic map coding with motion compensation to achieve reduced bit rates while maintaining perceptual and semant...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
This paper presents a real-time semantic video communication method for general scenes, combining lossy semantic map coding with motion compensation to achieve reduced bit rates while maintaining perceptual and semantic quality. Our findings show that semantic image synthesis effectively adapts to minute errors resulting from motion estimation, eliminating the need to transmit the residuals. We recommend the Group of Pictures approach as a more efficient alternative. Comparative assessments against HEVC and VVC confirm the method's effectiveness. This research paves the way for efficient real-time semantic video communication, addressing the demands of data-intensive visual applications.
The ICASSP-SP Grand Challenge on Hyperspectral Skin vision aims to democratize skin analysis by leveraging low-cost consumer-grade cameras to reconstruct vital spectral reflectance data. Addressing the accessibility l...
详细信息
ISBN:
(纸本)9798350374520;9798350374513
The ICASSP-SP Grand Challenge on Hyperspectral Skin vision aims to democratize skin analysis by leveraging low-cost consumer-grade cameras to reconstruct vital spectral reflectance data. Addressing the accessibility limitation of costly hyperspectral equipment, this challenge tasks participants with decoding skin spectral information crucial for assessing melanin and hemoglobin concentrations. The provided Hyper-Skin dataset is carefully curated following ethical guidelines, consisting of a total of 306 hyperspectral data from 51 human subjects. Through comprehensive evaluation with Spectral Angle Mapper (SAM), fairness and accuracy in spectral reconstruction methods are ensured, encouraging advancements with real-world applications. This challenge attracted 51 teams and yielding 9 complete submissions. The top 5 teams achieves significant performance with average SAM score of 0.09486. This achievement underscores the transformative potential of this initiative in reshaping skin analysis accessibility and driving interdisciplinary progress with profound societal implications.
In recent years, event cameras have achieved significant attention due to their advantages over conventional cameras. Event cameras have high dynamic range, no motion blur, and high temporal resolution. Contrary to tr...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
In recent years, event cameras have achieved significant attention due to their advantages over conventional cameras. Event cameras have high dynamic range, no motion blur, and high temporal resolution. Contrary to traditional cameras which generate intensity frames, event cameras output a stream of asynchronous events based on brightness change. There is extensive ongoing research on performing computer vision tasks like object detection, classification, etc via the event camera. However, due to the unconventional output format of the event camera, it is difficult to perform computer vision tasks directly on the event stream. Mostly, works reconstruct the intensity image from the event stream and then perform such tasks. An important and crucial task is feature detection and description. Scale-invariant feature transform (SIFT) is a widely-used scale-invariant keypoint detector and descriptor that is invariant to transformations like scale, rotation, noise, and illumination. In this work, given an event voxel, we directly generate the LoG pyramid for SIFT keypoint detection. We fit a 3rd-degree polynomial and calculate the polynomial roots to compute the scale-space extrema response for SIFT keypoint detection. Since the extrema computation is performed after LoG thresholding, the solution is computationally less expensive. Experimental results validate the effectiveness of our system.
Deep ensembles are capable of achieving state-of-the-art results on classification and out-of-distribution (OOD) detection tasks. However, their effectiveness is limited due to the homogeneity of learned patterns with...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Deep ensembles are capable of achieving state-of-the-art results on classification and out-of-distribution (OOD) detection tasks. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency-Diversified Deep Ensembles (SDDE1), a novel approach that promotes diversity among ensemble members by leveraging saliency maps. Through incorporating saliency map diversification, our method outperforms conventional ensemble techniques and improves calibration on multiple classification and OOD detection tasks. In particular, the proposed method achieves state-of-the-art OOD detection quality, calibration, and accuracy on multiple benchmarks, including CIFAR10/100 and large-scale ImageNet datasets.
暂无评论