In this paper, simulation is based on the ESM model, which is traditionally characterized by high nonlinearity and high uncertainty. The complexity of the system poses a great challenge for its overall characterizatio...
详细信息
With the development of the information society and the rapid rise of new generation information technology industries represented by smart earth, the internet of things, and cloud computing, the data transmission and...
详细信息
An Internet server, service, or network may be the subject of a distributed denial-of-service (DDoS) assault, in which the attacker attempts to interrupt regular traffic by flooding the target with an excessive amount...
详细信息
The skin is the most important part of the human body because it protects bones and muscles., hence, it is the body's largest organ. Numerous people are suffering from skin cancer these days. It has now become the...
详细信息
The growth of social media in recent years has contributed to the spread of fake news on the Internet. Since multimodal contents such as pictures have a huge impact on the spread of news in social media, researchers a...
详细信息
The transformer model has gained a lot of success in various computervision tasks owing to its capacity of modeling long-range dependencies. However, its application has been limited in the area of high-resolution un...
详细信息
ISBN:
(纸本)9798400701085
The transformer model has gained a lot of success in various computervision tasks owing to its capacity of modeling long-range dependencies. However, its application has been limited in the area of high-resolution unpaired image translation using GANs due to the quadratic complexity with the spatial resolution of input features. In this paper, we propose a novel transformer-based GAN for high-resolution unpaired image translation named Swin-UNIT. A two-stage generator is designed which consists of a global style translation (GST) module and a recurrent detail supplement (RDS) module. The GST module focuses on translating low-resolution global features using the ability of self-attention. The RDS module offers quick information propagation from the global features to the detail features at a high resolution using cross-attention. Moreover, we customize a dual-branch discriminator to guide the generator. Extensive experiments demonstrate that our model achieves state-of-the-art results on the unpaired image translation tasks.
More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activ...
详细信息
computervision has recently developed into a crucial tool in agriculture for tracking plant health and treating illnesses. In this paper, we present a VGG19-based deep learning technique for recognising diseases in P...
详细信息
Human activity recognition (HAR) is a crucial component for many current applications, including those in the healthcare, security, and entertainment sectors. At the current state of the art, deep learning outperforms...
详细信息
ISBN:
(纸本)9783031538292;9783031538308
Human activity recognition (HAR) is a crucial component for many current applications, including those in the healthcare, security, and entertainment sectors. At the current state of the art, deep learning outperformsmachine learning with its ability to automatically extract features. Autoencoders (AE) and convolutional neural networks (CNN) are the types of neural networks that are known for their good performance in dimensionality reduction and image classification, respectively. As most of the methods introduced for classification purposes are limited to sensor based methods. This paper mainly focuses on vision based HAR where we present a combination of AE and CNN for the classification of labeled data, in which convolutional AE (conv-AE) is utilized for two functions: dimensionality reduction and feature extraction and CNN is employed for classifying the activities. For the proposed model's implementation, public benchmark datasets KTH andWeizmann are considered, on which we have attained a recognition rate of 96.3%, 94.89% for both, respectively. Comparative analysis is provided for the proposed model for the above-mentioned datasets.
On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does ...
详细信息
ISBN:
(纸本)9781665491907
On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.
暂无评论