Simulator sickness induced by 360 & DEG;stereoscopic video contents is a prolonged challenging issue in Virtual Reality (VR) system. Current machine learning models for simulator sickness prediction ignore the und...
详细信息
Simulator sickness induced by 360 & DEG;stereoscopic video contents is a prolonged challenging issue in Virtual Reality (VR) system. Current machine learning models for simulator sickness prediction ignore the underlying interdependencies and correlations across multiple visual features which may lead to simulator sickness. We propose a model for sickness prediction by automatic learning and adaptive integrating multi-level mappings from stereoscopic video features to simulator sickness scores. Firstly, saliency, optical flow and disparity features are extracted from videos to reflect the factors causing simulator sickness, including human attention area, motion velocity and depth information. Then, these features are embedded and fed into a 3-dimensional convolutional neural network (3D CNN) to extract the underlying multi-level knowledge which includes low-level and higher-order visual concepts, and global image descriptor. Finally, an attentional mechanism is exploited to adaptively fuse multi-level information with attentional weights for sickness score estimation. The proposed model is trained by an end-to-end approach and validated over a public dataset. Comparison results with state-of-the-art models and ablation studies demonstrated improved performance in terms of Root Mean Square Error (RMSE) and Pearson Linear Correlation Coefficient.
image restoration, a critical task in computer vision and imageprocessing, focuses on recovering degraded or damaged images to their original, high-quality state. This paper introduces an innovative approach to image...
详细信息
Computer vision is one of the important areas and directions of deep learning research, which requires different approaches to be chosen for different fields due to the complexity and diversity of vision tasks. In the...
详细信息
ISBN:
(纸本)9781665464680
Computer vision is one of the important areas and directions of deep learning research, which requires different approaches to be chosen for different fields due to the complexity and diversity of vision tasks. In the field of aviation, the existing image resources are still far from the real needs due to the influence and constraints of realistic scenes and difficulties of image acquisition. More detailed and comprehensive images can better provide reliable technical support and basis for applications, and then make more accurate decisions on problems, which requires generating more effective images to expand the data. Generative Adversarial Networks (GAN) are the fastest growing and most effective generation method in recent years, so this experiment investigates the application of GAN on aviation data, taking images of airplanes, cars and ships as examples to conduct quantitative research. The effect on the effect of GAN is studied from the perspective of image size, number of images, number of iterations, and different categories of images, in order to obtain better parameter settings for generating effective images, which provides a theoretical and experimental basis for the subsequent application of GAN in the aviation field to generate more images with similar characteristics and solve the problem of insufficient data.
As a very important branch of computer science and engineering, graphics, and imageprocessing is a research topic of capturing, storing, and manipulating information from reflected electromagnetic waves from objects ...
详细信息
This study addresses the pressing need for computer systems to interpret digital media images with a level of sophistication comparable to human visual perception. By leveraging Convolutional Neural Networks (CNNs), w...
详细信息
1. Animal phenotypic traits are utilised in a variety of studies. Often the traits are measured from images. The processing of a large number of images can be challenging;nevertheless, image analytical applications, b...
详细信息
1. Animal phenotypic traits are utilised in a variety of studies. Often the traits are measured from images. The processing of a large number of images can be challenging;nevertheless, image analytical applications, based on neural networks, can be an effective tool in automatic trait collection.2. Our aim was to develop a stand-alone application to effectively segment an arthropod from an image and to recognise individual body parts: namely, head, thorax (or prosoma), abdomen and four pairs of appendages. It is based on convolutional neural network with U-Net architecture trained on more than a thousand images showing dorsal views of arthropods (mainly of wingless insects and spiders). The segmentation model gave very good results, with the automatically generated segmentation masks usually requiring only slight manual adjustments.3. The application, named MAPHIS, can further (1) organise and preprocess the images;(2) adjust segmentation masks using a simple graphical editor;and (3) calculate various size, shape, colouration and pattern measures for each body part organised in a hierarchical manner. In addition, a special plug-in function can align body profiles of selected individuals to match a median profile and enable comparison among groups. The usability of the application is shown in three practical examples.4. The application can be used in a variety of fields where measures of phenotypic diversity are required, such as taxonomy, ecology and evolution (e.g. mimetic similarity). Currently, the application is limited to arthropods, but it can be easily extended to other animal taxa.
It is challenging to find a solution for lane detection. It has aroused the curiosity of the computer vision field for many years. It has been found that computer vision and machine learning algorithms struggle to tac...
详细信息
ISBN:
(数字)9789819738106
ISBN:
(纸本)9789819738090
It is challenging to find a solution for lane detection. It has aroused the curiosity of the computer vision field for many years. It has been found that computer vision and machine learning algorithms struggle to tackle the multi-feature identification problem known as lane detection. Even though there are a few different machine learning approaches that may be used for lane identification, these approaches are often employed for classification rather than feature development. On the other hand, contemporary techniques of machine learning may be used to discover features that have a high recognition value, and they have shown success in feature identification tests. These strategies haven’t been applied correctly, which compromises their efficiency and accuracy when it comes to lane recognition. In this study, we provide a fresh approach to solving the problem. A brand-new preprocessing and Region of Interest (ROI) selection method is presented in this article. The major objective is to extract white features by making use of the HSV color transformation, adding preliminary edge feature detection while doing preprocessing, and then selecting ROI based on the preprocessing that was proposed. With the help of this cutting-edge preprocessing strategy, the lane may be found. The integrated autonomous vehicle that we envision is one that is controlled by a Robotic Operating System and that is capable of making intelligent driving choices. The unique filtering and noise reduction techniques that were used on the visual feedback by means of the processing unit served as the basis for the digital image-processing algorithm that was responsible for the greatest performance achieved by the autonomous vehicle. Within the control system, we used two separate control units, one of which was a master and the other of which was a slave. The master control unit is in charge of the visual processing and filtering, while the slave control unit is in charge of the vehicle’s propulsio
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded syst...
详细信息
ISBN:
(纸本)9783031723582;9783031723599
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded systems, e.g. in robotics or augmented reality, hand tracking based on standard frame-based cameras is too slow and/or power hungry. The latency is limited by the frame rate of the image sensor already, and any subsequent DL processing further increases the latency gap, while requiring substantial power for processing. Dynamic vision sensors, on the other hand, enable sub-millisecond time resolution and output sparse signals that can be processed with an efficient Sigma Delta Neural Network (SDNN) model that preserves the sparsity advantage in the neural network. This paper presents the training and evaluation of a small SDNN for hand detection, based on event data from the DHP19 dataset deployed on Intel's Loihi 2 neuromorphic development board. We found it possible to deploy a hand detection model in neuromorphic hardware backend without a notable performance difference to the original GPU implementation, at an estimated mean dynamic power consumption for the network running on the chip of approximate to 7 mW.
In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best m...
详细信息
ISBN:
(纸本)9798350354249;9798350354232
In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best model for image classification, demonstrating remarkable success in various medical applications. However, the advent of vision Transformers (ViTs) has challenged the dominance of CNN approaches. This study aims to explore the potential of ViTs in healthcare, comparing their performance with that of CNN models. The latter has traditionally excelled in image feature extraction through convolutional operations;on the other hand, ViTs, relying on self-attention mechanisms, exhibit unique capabilities in capturing long-range dependencies, enabling them to effectively capture complex patterns within images. In this study, after analyzing their architectures, we assessed the behaviour of from-scratch and pre-trained models, highlighting their differences in performance and providing light on the applicability of Transfer Learning (TL) approach in the healthcare scenario.
Humans outperform object recognizers despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shift. This problem pe...
详细信息
ISBN:
(纸本)9781713899921
Humans outperform object recognizers despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shift. This problem persists, in part, because we have no guidance on the absolute difficulty of an image or dataset making it hard to objectively assess progress toward human-level performance, to cover the range of human abilities, and to increase the challenge posed by a dataset. We develop a dataset difficulty metric MVT, Minimum Viewing Time, that addresses these three problems. Subjects view an image that flashes on screen and then classify the object in the image. images that require brief flashes to recognize are easy, those which require seconds of viewing are hard. We compute the imageNet and ObjectNet image difficulty distribution, which we find significantly undersamples hard images. Nearly 90% of current benchmark performance is derived from images that are easy for humans. Rather than hoping that we will make harder datasets, we can for the first time objectively guide dataset difficulty during development. We can also subset recognition performance as a function of difficulty: model performance drops precipitously while human performance remains stable. Difficulty provides a new lens through which to view model performance, one which uncovers new scaling laws: vision-language models stand out as being the most robust and human-like while all other techniques scale poorly. We release tools to automatically compute MVT, along with image sets which are tagged by difficulty. Objective image difficulty has practical applications - one can measure how hard a test set is before deploying a real-world system - and scientific applications such as discovering the neural correlates of image difficulty and enabling new object recognition techniques that eliminate the benchmark-vsreal-world performance gap.
暂无评论