The process that produces written descriptions that effectively represent the meaning and context of an image is known as image captioning. To integrate visual and textual data, it needs to blend computer vision and n...
详细信息
The purpose of this work is to improve Traffic sign recognition using machine learning to improve comprehension of traffic signs. The Novel artificialneural Network (ANN) method is compared with Recurrent neural Netw...
详细信息
neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are cur...
详细信息
ISBN:
(纸本)1577358872
neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are currently utilized for various downstream tasks, e.g., image classification, time series classification, image generation, etc. Its key part is how to model the time-derivative of the hidden state, denoted dh(t)/dt. People have habitually used conventional neural network architectures, e.g., fully-connected layers followed by non-linear activations. In this paper, however, we present a neural operator-based method to define the time-derivative term. neural operators were initially proposed to model the differential operator of partial differential equations (PDEs). Since the time-derivative of NODEs can be understood as a special type of the differential operator, our proposed method, called branched Fourier neural operator (BFNO), makes sense. In our experiments with general downstream tasks, our method significantly outperforms existing methods.
Vision transformers have become popular as a possible substitute to convolutional neuralnetworks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relatio...
详细信息
Vision transformers have become popular as a possible substitute to convolutional neuralnetworks (CNNs) for a variety of computer vision applications. These transformers, with their ability to focus on global relationships in images, offer large learning capacity. However, they may suffer from limited generalization as they do not tend to model local correlation in images. Recently, in vision transformers hybridization of both the convolution operation and self-attention mechanism has emerged, to exploit both the local and global image representations. These hybrid vision transformers, also referred to as CNN-Transformer architectures, have demonstrated remarkable results in vision applications. Given the rapidly growing number of hybrid vision transformers, it has become necessary to provide a taxonomy and explanation of these hybrid architectures. This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers. Additionally, the key features of these architectures such as the attention mechanisms, positional embeddings, multi-scale processing, and convolution are also discussed. In contrast to the previous survey papers that are primarily focused on individual vision transformer architectures or CNNs, this survey uniquely emphasizes the emerging trend of hybrid vision transformers. By showcasing the potential of hybrid vision transformers to deliver exceptional performance across a range of computer vision tasks, this survey sheds light on the future directions of this rapidly evolving architecture.
Convolutional neuralnetworks (CNNs) exhibit exceptional performance within the imageprocessing domain. The acceleration of convolutions for CNNs has consistently represented a focal point within machine learning har...
详细信息
ISBN:
(纸本)9798350350920
Convolutional neuralnetworks (CNNs) exhibit exceptional performance within the imageprocessing domain. The acceleration of convolutions for CNNs has consistently represented a focal point within machine learning hardware accelerators. However, with the continuous development of CNNs, the design costs and project workloads of hardware accelerators have significantly increased. To enhance accelerator performance while reducing time-related expenses, it is necessary to determine a series of optimal design parameters during the early stages of accelerator design. To achieve this objective, the concept of design space exploration (DSE) for CNN accelerators is proposed. However, as neuralnetworks become increasingly complex, the demands for DSE methods have also grown, rendering the existing methods unsuitable for meeting the real-time requirements of accelerators, and unable to discover the optimal design. In this paper, we introduce a DSE framework based on the Genetic Simulated Annealing (GSA) algorithm. The proposed framework autonomously generates the hardware design parameters such as parallelism degrees based on the resource constraint and CNN model. Our method is evaluated with two typical CNN accelerators. Experimental results show that our method largely improves the DSE efficiency, reducing the exploration time by up to 73.7x when compared to existing DSE methods.
The Hopfield network is an example of an artificialneural network used to implement associative memories. A binary digit represents the neuron's state of a traditional Hopfield neural network. Inspired by the hum...
详细信息
The Hopfield network is an example of an artificialneural network used to implement associative memories. A binary digit represents the neuron's state of a traditional Hopfield neural network. Inspired by the human brain's ability to cope simultaneously with multiple sensorial inputs, this paper presents three multi-modal Hopfield-type neuralnetworks that treat multi-dimensional data as a single entity. In the first model, called the vector-valued Hopfield neural network, the neuron's state is a vector of binary digits. Synaptic weights are modeled as finite impulse response (FIR) filters in the second model, yielding the so-called convolutional associative memory. Finally, the synaptic weights are modeled by linear time-varying (LTV) filters in the third model. Besides their potential applications for multi-modal intelligence, the new associative memories may also be used for signal and imageprocessing and solve optimization and classification tasks.
The Hierarchical Bayesian Convolutional neural Network (HCNN) is a machine learning algorithm that attempts to use the natural hierarchical structure of data. HCNN has demonstrated gains in robustness, accuracy, and r...
详细信息
ISBN:
(数字)9781510661936
ISBN:
(纸本)9781510661929;9781510661936
The Hierarchical Bayesian Convolutional neural Network (HCNN) is a machine learning algorithm that attempts to use the natural hierarchical structure of data. HCNN has demonstrated gains in robustness, accuracy, and reporting capabilities by addressing the technical challenge of classifying data at different levels of a hierarchical structure. There is a significant operational benefit in classifying at different levels of an ontology where the extracted knowledge is used for future decision-making, especially when classification at the finest level is unfeasible.
Convolutional neuralnetworks (CNN) have become a common choice for industrial quality control, as well as other critical applications in the Industry 4.0. When these CNNs behave in ways unexpected to human users or d...
详细信息
Convolutional neuralnetworks (CNN) have become a common choice for industrial quality control, as well as other critical applications in the Industry 4.0. When these CNNs behave in ways unexpected to human users or developers, severe consequences can arise, such as economic losses or an increased risk to human life. Concept extraction techniques can be applied to increase the reliability and transparency of CNNs through generating global explanations for trained neural network models. The decisive features of image datasets in quality control often depend on the feature's scale;for example, the size of a hole or an edge. However, existing concept extraction methods do not correctly represent scale, which leads to problems interpreting these models as we show herein. To address this issue, we introduce the Scale-Preserving Automatic Concept Extraction (SPACE) algorithm, as a state-of-the-art alternative concept extraction technique for CNNs, focused on industrial applications. SPACE is specifically designed to overcome the aforementioned problems by avoiding scale changes throughout the concept extraction process. SPACE proposes an approach based on square slices of input images, which are selected and then tiled before being clustered into concepts. Our method provides explanations of the models' decision-making process in the form of human-understandable concepts. We evaluate SPACE on three image classification datasets in the context of industrial quality control. Through experimental results, we illustrate how SPACE outperforms other methods and provides actionable insights on the decision mechanisms of CNNs. Finally, code for the implementation of SPACE is provided.
Underwater image-capturing technology has advanced over the years, and varieties of artificial intelligence -based applications have been developed on digital and synthetic images. The low-quality and low-resolution u...
详细信息
Underwater image-capturing technology has advanced over the years, and varieties of artificial intelligence -based applications have been developed on digital and synthetic images. The low-quality and low-resolution underwater images are challenging factors for use in existing imageprocessing in computer vision applications. Degraded or low-quality photos are common issues in the underwater imaging process due to natural factors like low illumination and scattering. The recent techniques use deep learning architectures like CNN, GAN, or other models for image enhancement. Although adversarial -based architectures provide good perceptual quality, they performed worse in quantitative tests compared with convolutional-based networks. A hybrid technique is proposed in this paper that blends both designs to gain advantages of the CNN and GAN architectures. The generator component produces or makes images, which contributes to the creation of a sizable training set. The EUVP dataset is used for experimentation for model training and testing. The PSNR score was observed to measure the visual quality of the resultant images produced by models. The proposed system was able to provide an improved image with a higher PSNR score and SSIM score with state-of-the-art methods.
Large-kernel convolutional neuralnetworks (CNNs) have recently achieved remarkable performance comparable to Visual Transformers(ViTs) in high-level vision tasks. However, there are two critical drawbacks hindering i...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Large-kernel convolutional neuralnetworks (CNNs) have recently achieved remarkable performance comparable to Visual Transformers(ViTs) in high-level vision tasks. However, there are two critical drawbacks hindering its widespread applications in image dehazing. 1) Most large-kernel designs focus expanding the kernel size even further to model stronger long-range dependencies, but this approach brings a substantial amount of computational overhead. 2) As the kernel size increases, the network tends to focus more on the shape of the object over its texture, potentially affecting the details of the recovered image. To overcome these issues, we propose an effective multi-scale large separable kernel attention module (MLSKA) that can simultaneously build long-range and local dependencies in a cost-effective manner to facilitate high-quality image reconstruction. Specifically, MLSKA combines an efficient convolutional decomposition design with multi-scale learning, realizing multi-scale receptive fields while significantly reducing the computational cost and parameters of large-kernel convolution. In addition, we introduce a deformable attention feedforward network (DAFN) to aggregate contextual information. In DAFN, a novel deformable attention gate is designed to provide holistic attention to the feed-forward network (FFN), thereby improving its utilization of critical features. Integrating these two designs into a U-shaped backbone, the proposed multi-scale large-kernel network (MLANet) outperforms state-of-the-art methods on several dehazing benchmarks, achieving the best parameter-performance trade-off.
暂无评论