This research investigates the generalization capabilities of neuralnetworks in deep learning when applied to real-world scenarios where data often contains imperfections, focusing on their adaptability to both noisy...
详细信息
ISBN:
(纸本)1577358872
This research investigates the generalization capabilities of neuralnetworks in deep learning when applied to real-world scenarios where data often contains imperfections, focusing on their adaptability to both noisy and non-noisy scenarios for image retrieval tasks. Our study explores approaches to preserve all available data, regardless of quality, for diverse tasks. The evaluation of results varies per task, due to the ultimate goal of developing a technique to extract relevant information while disregarding noise in the final network design for each specific task. The aim is to enhance accessibility and efficiency of AI across diverse tasks, particularly for individuals or countries with limited resources, lacking access to high-quality data. The dedication is directed towards fostering inclusivity and unlocking the potential of AI for widespread societal benefit.
With the proliferation of social media data, Multimodal Named Entity Recognition (MNER) has received much attention;using different data modalities is crucial for the development of natural language processing and neu...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
With the proliferation of social media data, Multimodal Named Entity Recognition (MNER) has received much attention;using different data modalities is crucial for the development of natural language processing and neuralnetworks. However, existing methods suffer from two drawbacks: 1) textimage pairs in the data only sometimes correspond to each other, and it is impossible to rely on contextual information due to the short text nature of social media. 2) Despite the introduction of visual information, heterogeneity gaps may occur in previous complex fusion methods, leading to misidentification. This paper proposes a new synthetic image with a selected graphic alignment network(SAMNER) to address these challenges and construct a matching relationship between external images and text. To solve the graphic mismatch problem, we use a stable diffusion model to generate the images and perform entity labeling. Specifically, we generate images and perform entity labeling through the stable diffusion model to generate the image with the highest match to the text, filter the generated images by the internal image set to generate the best image, and then perform multimodal fusion to predict the entity labeling, we design a simple and effective multimodal attentional alignment mechanism to obtain a better visual representation, and we conduct a large number of experiments. The experiments prove that our model produces competitive results on the two publicly available datasets.
neuralnetworks (NNs) have made significant progress in recent years and have been applied in a broad range of applications, including speech recognition, image classification, automatic driving, and natural language ...
详细信息
neuralnetworks (NNs) have made significant progress in recent years and have been applied in a broad range of applications, including speech recognition, image classification, automatic driving, and natural language processing. The hardware implementation of NNs presents challenges, and research communities have explored various analog and digital neuronal and synaptic devices for resource-efficient implementation. However, these hardware NNs face several challenges, such as overheads imposed by peripheral circuitry, speed-area tradeoffs, non-idealities associated with memory devices, low on-off resistance ratio, sneak path issues, low weight precision, and power-inefficient converters. This article reviews different synaptic devices and discusses the challenges associated with implementing these devices in hardware, along with corresponding solutions, and prospecting future research directions. Several categories of emerging synaptic devices such as resistive random-access memory (RRAM), phase change memory (PCM), analog-to-digital hybrid volatile memory-based, ferroelectric field effect transistor (FeFET)-based, spintronic-based spin transfer, spin-orbit, magnetic domain wall (DW) and skyrmion synaptic devices have been explored, and a comparison between them is presented. This study provides insights for researchers engaged in the field of hardware neuralnetworks. This article reviews different synaptic devices and discusses the challenges associated with implementing these devices in hardware, along with corresponding solutions, applications, and prospecting future research directions.
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded syst...
详细信息
ISBN:
(纸本)9783031723582;9783031723599
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded systems, e.g. in robotics or augmented reality, hand tracking based on standard frame-based cameras is too slow and/or power hungry. The latency is limited by the frame rate of the image sensor already, and any subsequent DL processing further increases the latency gap, while requiring substantial power for processing. Dynamic vision sensors, on the other hand, enable sub-millisecond time resolution and output sparse signals that can be processed with an efficient Sigma Delta neural Network (SDNN) model that preserves the sparsity advantage in the neural network. This paper presents the training and evaluation of a small SDNN for hand detection, based on event data from the DHP19 dataset deployed on Intel's Loihi 2 neuromorphic development board. We found it possible to deploy a hand detection model in neuromorphic hardware backend without a notable performance difference to the original GPU implementation, at an estimated mean dynamic power consumption for the network running on the chip of approximate to 7 mW.
This paper investigates advanced techniques in image recognition and classification by integrating deep learning and machine learning approaches to achieve higher accuracy. Through the implementation of sophisticated ...
详细信息
Stereo camera self-calibration is a complex challenge in computer vision applications such as robotics, object tracking, surveillance and 3D reconstruction. To address this, we propose an efficient, fully automated En...
详细信息
Stereo camera self-calibration is a complex challenge in computer vision applications such as robotics, object tracking, surveillance and 3D reconstruction. To address this, we propose an efficient, fully automated End-To-End AI-Based system for automatic stereo camera self-calibration with varying intrinsic parameters, using only two images of any 3D scene. Our system combines deep convolutional neuralnetworks (CNNs) with transfer learning techniques and fine-tuning. First, our end-to-end convolutional neural network optimized model begins by extracting matching points between a pair of stereo images. These matching points are then used, along with their 3D scene correspondences, to formulate a non-linear cost function. Direct optimization is subsequently performed to estimate the intrinsic camera parameters by minimizing this non-linear cost function. Following this initial optimization, a fine-tuning layer refines the intrinsic parameters for increased accuracy. Our hybrid approach is characterized by a special optimized architecture that leverages the strengths of end-to-end CNNs for image feature extraction and processing, as well as the pillars of our nonlinear cost function formulation and fine-tuning, to offer a robust and accurate method for stereo camera self-calibration. Extensive experiments on synthetic and real data demonstrate the superior performance of the proposed technique compared to traditional camera self-calibration methods in terms of precision and faster convergence.
Computer vision technology for detecting objects in a complex environment often includes other key technologies, including pattern recognition, artificial intelligence, and digital imageprocessing. It has been shown ...
详细信息
Computer vision technology for detecting objects in a complex environment often includes other key technologies, including pattern recognition, artificial intelligence, and digital imageprocessing. It has been shown that Fast Convolutional neuralnetworks (CNNs) with You Only Look Once (YOLO) is optimal for differentiating similar objects, constant motion, and low image quality. The proposed study aims to resolve these issues by implementing three different object detection algorithms-You Only Look Once (YOLO), Single Stage Detector (SSD), and Faster Region-Based Convolutional neuralnetworks (R-CNN). This paper compares three different deep-learning object detection methods to find the best possible combination of feature and accuracy. The R-CNN object detection techniques are performed better than single-stage detectors like Yolo (You Only Look Once) and Single Shot Detector (SSD) in term of accuracy, recall, precision and loss.
image Caption Generation (ICG), situated at the confluence of computer vision and natural language processing, empowers machines to comprehend visual content and express it in human-like language. This research offers...
详细信息
ISBN:
(数字)9798350372748
ISBN:
(纸本)9798350372748
image Caption Generation (ICG), situated at the confluence of computer vision and natural language processing, empowers machines to comprehend visual content and express it in human-like language. This research offers a comprehensive overview of key concepts, methodologies, and challenges in ICG. The process involves developing algorithms for the automatic generation of contextually relevant captions, utilizing deep neuralnetworks for feature extraction, and employing natural language processing techniques for coherent composition. Recent advancements, particularly in convolutional neuralnetworks for imageprocessing and recurrent neuralnetworks for language modelling, have significantly elevated the performance of image captioning systems. The study delves into the core components of an ICG system, including pre-processing techniques for image data, feature extraction mechanisms, and the integration of language models. Attention mechanisms, a key innovation in this field, enable the model to focus on relevant image regions while generating captions, closely mirroring human attention patterns. Despite notable progress, ICG faces several challenges, such as handling diverse and complex visual scenes, ensuring cross-modal coherence between images and captions, and addressing biases present in training data. Ethical considerations, particularly in applications like automated content generation, are also discussed. The study concludes by highlighting potential future directions in ICG research, including the incorporation of multimodal learning approaches, enhancing the interpretability of generated captions, and addressing societal concerns related to bias and fairness. As ICG continues to evolve, it holds promise for various applications, ranging from accessibility for the visually impaired to improving content indexing and retrieval in multimedia databases. The research also underscores the significance of the accuracy attainments, showcasing the success of the pr
Human Action Recognition and Medical image Segmentation study presents a novel framework that leverages advanced neural network architectures to improve Medical image Segmentation and Human Action Recognition (HAR). G...
详细信息
Human Action Recognition and Medical image Segmentation study presents a novel framework that leverages advanced neural network architectures to improve Medical image Segmentation and Human Action Recognition (HAR). Gated Recurrent Units (GRU) are used in the HAR domain to efficiently capture complex temporal correlations in video sequences, yielding better accuracy, precision, recall, and F1 Score than current models. In computer vision and medical imaging, the current research environment highlights the significance of advanced techniques, especially when addressing problems like computational complexity, resilience, and noise in real-world applications. Improved medical image segmentation and human action recognition (HAR) are of growing interest. While methods such as the V-Net architecture for medical picture segmentation and Spatial Temporal Graph Convolutional networks (ST-GCNs) for HAR have shown promise, they are constrained by things like processing requirement and noise sensitivity. The suggested methods highlight the necessity of sophisticated neural network topologies and optimisation techniques for medical picture segmentation and HAR, with further study focusing on transfer learning and attention processes. A Python tool has been implemented to perform min-max normalization, utilize GRU for human action recognition, employ V- net for medical image segmentation, and optimize with the Adam optimizer, with performance evaluation metrics integrated for comprehensive analysis. This study provides an optimised GRU network strategy for Human Action Recognition with 92% accuracy, and a V-Net-based method for Medical image Segmentation with 88% Intersection over Union and 92% Dice Coefficient.
Deep neuralnetworks have been crucial in several recent developments in artificial intelligence and big data technology, including natural language processing, speech recognition, and computer vision. Given the numer...
详细信息
暂无评论