In applications where robot arms collaborate with humans, the corresponding shared workspaces are very dynamic and not as predictable as purely robotic ones. Unknown obstacles, such as arbitrary boxes or tools placed ...
详细信息
Semantic segmentation is a well-addressed topic in the computervision literature, but the design of fast and accurate video processing networks remains challenging. In addition, to run on embedded hardware, computer ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Semantic segmentation is a well-addressed topic in the computervision literature, but the design of fast and accurate video processing networks remains challenging. In addition, to run on embedded hardware, computervision models often have to make compromises on accuracy to run at the required speed, so that a latency/accuracy trade-off is usually at the heart of these real-time systems' design. For the specific case of videos, models have the additional possibility to make use of computations made for previous frames to mitigate the accuracy loss while being real-time. In this work, we propose to tackle the task of fast future video segmentation prediction through the use of convolutional layers with time-dependent channel masking. this technique only updates a chosen subset of the feature maps at each time-step, bringing simultaneously less computation and latency, and allowing the network to leverage previously computed features. We apply this technique to several fast architectures and experimentally confirm its benefits for the future prediction subtask.
Due to delayed notice and incorrect identification, wrong-way driving is a major traffic safety concern that frequently results in accidents. We describe a computervision and deep learning based real-time wrong-way v...
详细信息
Early detection of breast cancer has become an essential medical procedure to reduce mortality rates in patients. Although various deep learning methods enhance cancer analysis and detection, including feature extract...
详细信息
ISBN:
(纸本)9783031821493;9783031821509
Early detection of breast cancer has become an essential medical procedure to reduce mortality rates in patients. Although various deep learning methods enhance cancer analysis and detection, including feature extraction techniques, transfer learning, image fusion, and data augmentation, challenges such as limited data and class imbalance impede progress in early cancer detection. this study introduces a new approach that employs a dual ensemble learning and MixUp data techniques to simultaneously (1) tackle unbalanced classes and (2) improve diversity in ultrasound images. the proposed training methodology involves various strategies, integrating original and augmented datasets to maintain model training and increase performance. through comprehensive evaluations using benchmark breast ultrasound images (BUSI), the proposed approach demonstrates a substantial improvement and feasibility in the multi-classification of breast cancer.
An accurate and fast fire smoke detection algorithm is urgently needed to solve the emergency linkage measures to prevent early fire spread and after the fire. In this paper, the depth matrix of motion difference is c...
详细信息
vision Transformer (ViT) has recently been introduced into the computervision (CV) field with its self-attention mechanism and gotten remarkable performance. However, simply applying ViT for hyperspectral image (HSI)...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
vision Transformer (ViT) has recently been introduced into the computervision (CV) field with its self-attention mechanism and gotten remarkable performance. However, simply applying ViT for hyperspectral image (HSI) classification is not applicable due to 1) ViT is a spatial-only self-attention model, but rich spectral information exists in HSI;2) ViT needs sufficient training samples, but HSI suffers from limited samples;3) ViT does not well learn local features;4) multi-scale features for ViT are not considered. Furthermore, the methods which combine convolutional neural network (CNN) and ViT generally suffer from a large computational burden. Hence, this paper tends to design a suitable pure ViT based model for HSI classification as the following points: 1) spectral-only vision transformer with all tokens' aggregation;2) spatial-only local-global transformer;3) cross-scale local-global feature fusion, and 4) a cooperative loss function to unify the spectral and spatial features. As a result, the proposed idea achieves competitive classification performance on three public datasets than other state-of-the-art methods.
New energy prediction is an important issue in energy management and optimization. By analyzing the difficulties of new energy prediction under the influence of meteorological factors and the advantages of generative ...
详细信息
Cardiac arrest has been a leading cause of mortality worldwide, with limited opportunities for intervention. this project introduces a novel machine-learning approach to predict and process cardiac arrest risk in high...
详细信息
Many museums and libraries conducted efforts to digitize their assets, and many historic documents are now available as digital images. However, these documents are not directly accessible to retrieval systems that re...
详细信息
ISBN:
(纸本)9783031724398;9783031724404
Many museums and libraries conducted efforts to digitize their assets, and many historic documents are now available as digital images. However, these documents are not directly accessible to retrieval systems that rely on written text and not images. In this study, the novel GPT4-vision is being studied for its ability of optical character recognition (OCR), in cases where established methods, such as Tesseract may have difficulties. We find that GPT4-vision provides excellent results even in cases where even humans struggle. We also identified a number of key limitations, including the long runtime implying high energy requirements, the lack of handling of rotated images, the necessity for layout hints, and limitations regarding image size. Even withthese limitations, it is expected that large language models and vision transformers will play an important role to make historical documents more accessible for further processing, or directly to users.
image fusion aims to integrate complementary information of images collected by multiple source channels into a single fusion image. It can extract the favorable information in each channel to the maximum extent and g...
详细信息
暂无评论