With the increasing of multi-source heterogeneous data, flexible retrieval across different modalities is an urgent demand in industrial applications. To allow users to control the retrieval results, a novel fabric im...
详细信息
With the increasing of multi-source heterogeneous data, flexible retrieval across different modalities is an urgent demand in industrial applications. To allow users to control the retrieval results, a novel fabric image retrieval method is proposed in this paper based on multi-modal feature fusion. First, the image feature is extracted using the modified pre-trained convolutional neural network to separate macroscopic and fine-grained features, which are then selected and aggregated by the multi-layer perception. The feature of the modification text is extracted by long short-term memory networks. Subsequently, the two features are fused in a visual-semantic joint embedding space by gated and residual structures to control the selective expression of separable image features. To validate the proposed scheme, a fabric image database for multi-modal retrieval is created as the benchmark. Qualitative and quantitative experiments indicate that the proposed method is practicable and effective, which can be extended to other similar industrial fields, like wood and wallpaper.
This paper presents an integrated distributed acoustic sensing (DAS) system with artificial intelligence to provide real-time system monitoring for fence perimeter and buried system applications. The DAS system is a R...
详细信息
This paper presents an integrated distributed acoustic sensing (DAS) system with artificial intelligence to provide real-time system monitoring for fence perimeter and buried system applications. The DAS system is a Rayleigh backscatter based fibre optic sensing system that has been deployed in two real-world, commercial applications to detect acoustic wave propagation and scattering along perimeter lines, and classify intrusions accurately. What we believe to be three novel signal processing methods are proposed to train filters for automatically selecting frequency bands from the power spectrum and generating hyper-spectral images from the data gathered by the DAS system without expert knowledge. The hyper-spectral images are analyzed by a neural network based object detection model. The system achieves 81.8% accuracy on a fence perimeter installation and 60.4% accuracy on a buried system application in detecting and classifying various intrusion events. The evaluation interval of the integrated DAS system framework between event sensing and detection does not exceed 5 s. (c) 2025 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
This research investigates the generalization capabilities of neuralnetworks in deep learning when applied to real-world scenarios where data often contains imperfections, focusing on their adaptability to both noisy...
详细信息
ISBN:
(纸本)1577358872
This research investigates the generalization capabilities of neuralnetworks in deep learning when applied to real-world scenarios where data often contains imperfections, focusing on their adaptability to both noisy and non-noisy scenarios for image retrieval tasks. Our study explores approaches to preserve all available data, regardless of quality, for diverse tasks. The evaluation of results varies per task, due to the ultimate goal of developing a technique to extract relevant information while disregarding noise in the final network design for each specific task. The aim is to enhance accessibility and efficiency of AI across diverse tasks, particularly for individuals or countries with limited resources, lacking access to high-quality data. The dedication is directed towards fostering inclusivity and unlocking the potential of AI for widespread societal benefit.
With the proliferation of social media data, Multimodal Named Entity Recognition (MNER) has received much attention;using different data modalities is crucial for the development of natural language processing and neu...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
With the proliferation of social media data, Multimodal Named Entity Recognition (MNER) has received much attention;using different data modalities is crucial for the development of natural language processing and neuralnetworks. However, existing methods suffer from two drawbacks: 1) textimage pairs in the data only sometimes correspond to each other, and it is impossible to rely on contextual information due to the short text nature of social media. 2) Despite the introduction of visual information, heterogeneity gaps may occur in previous complex fusion methods, leading to misidentification. This paper proposes a new synthetic image with a selected graphic alignment network(SAMNER) to address these challenges and construct a matching relationship between external images and text. To solve the graphic mismatch problem, we use a stable diffusion model to generate the images and perform entity labeling. Specifically, we generate images and perform entity labeling through the stable diffusion model to generate the image with the highest match to the text, filter the generated images by the internal image set to generate the best image, and then perform multimodal fusion to predict the entity labeling, we design a simple and effective multimodal attentional alignment mechanism to obtain a better visual representation, and we conduct a large number of experiments. The experiments prove that our model produces competitive results on the two publicly available datasets.
In the realm of information management, the digitization of handwritten documents is pivotal. This research introduces an advanced Handwritten Optical Character Recognition (HOCR) model, leveraging Convolutional Neura...
详细信息
ISBN:
(纸本)9798350350661;9798350350654
In the realm of information management, the digitization of handwritten documents is pivotal. This research introduces an advanced Handwritten Optical Character Recognition (HOCR) model, leveraging Convolutional neuralnetworks (CNN), Bidirectional Long Short-Term Memory networks (BiLSTM), and the Connectionist Temporal Classification (CTC) loss function. Together Convolutional RNN-based Bi-LSTM CTC (CRBC), demonstrates a robust 94% accuracy, the model adapts seamlessly across various domains, presenting a scalable solution for enhanced handwritten document processing. This fusion of machine learning and natural language processing techniques contributes to improved efficiency in information management, with potential applications in diverse industries and fields.
neuralnetworks (NNs) have made significant progress in recent years and have been applied in a broad range of applications, including speech recognition, image classification, automatic driving, and natural language ...
详细信息
neuralnetworks (NNs) have made significant progress in recent years and have been applied in a broad range of applications, including speech recognition, image classification, automatic driving, and natural language processing. The hardware implementation of NNs presents challenges, and research communities have explored various analog and digital neuronal and synaptic devices for resource-efficient implementation. However, these hardware NNs face several challenges, such as overheads imposed by peripheral circuitry, speed-area tradeoffs, non-idealities associated with memory devices, low on-off resistance ratio, sneak path issues, low weight precision, and power-inefficient converters. This article reviews different synaptic devices and discusses the challenges associated with implementing these devices in hardware, along with corresponding solutions, and prospecting future research directions. Several categories of emerging synaptic devices such as resistive random-access memory (RRAM), phase change memory (PCM), analog-to-digital hybrid volatile memory-based, ferroelectric field effect transistor (FeFET)-based, spintronic-based spin transfer, spin-orbit, magnetic domain wall (DW) and skyrmion synaptic devices have been explored, and a comparison between them is presented. This study provides insights for researchers engaged in the field of hardware neuralnetworks. This article reviews different synaptic devices and discusses the challenges associated with implementing these devices in hardware, along with corresponding solutions, applications, and prospecting future research directions.
This paper investigates advanced techniques in image recognition and classification by integrating deep learning and machine learning approaches to achieve higher accuracy. Through the implementation of sophisticated ...
详细信息
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded syst...
详细信息
ISBN:
(纸本)9783031723582;9783031723599
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded systems, e.g. in robotics or augmented reality, hand tracking based on standard frame-based cameras is too slow and/or power hungry. The latency is limited by the frame rate of the image sensor already, and any subsequent DL processing further increases the latency gap, while requiring substantial power for processing. Dynamic vision sensors, on the other hand, enable sub-millisecond time resolution and output sparse signals that can be processed with an efficient Sigma Delta neural Network (SDNN) model that preserves the sparsity advantage in the neural network. This paper presents the training and evaluation of a small SDNN for hand detection, based on event data from the DHP19 dataset deployed on Intel's Loihi 2 neuromorphic development board. We found it possible to deploy a hand detection model in neuromorphic hardware backend without a notable performance difference to the original GPU implementation, at an estimated mean dynamic power consumption for the network running on the chip of approximate to 7 mW.
Recently, deep convolutional neuralnetworks (CNNs) have achieved remarkable success in single-image super-resolution (SISR) tasks. However, these methods often suffer from high computational and memory requirements, ...
详细信息
Recently, deep convolutional neuralnetworks (CNNs) have achieved remarkable success in single-image super-resolution (SISR) tasks. However, these methods often suffer from high computational and memory requirements, limiting their practicality for real-world applications. To address this challenge, we propose a lightweight and efficient dual-branch information interaction network (DIIN) for SISR. DIIN adopts a dual-branch structure that differs from the typical serial network architectures. Specifically, we design the CNN branch and Transformer branch as parallel structures. In the CNN branch, we employ a symmetric dual-branch feature interaction module (DFIM) to extract valuable local feature information. Concurrently, the Transformer branch utilizes a recursive Transformer to capture long-term global information and enhance reconstructed image details. By simultaneously considering these two branches, our model effectively combines the strengths of CNN in extracting local information and Transformer in capturing global information. Recognizing the complementarity of these two branches in SISR, we further incorporate a coefficient learning scheme to enhance their information interaction and obtain more comprehensive feature information, thereby improving overall model performance. Extensive experiments demonstrate that our DIIN outperforms competitive methods while consuming fewer computational resources and memory.
Stereo camera self-calibration is a complex challenge in computer vision applications such as robotics, object tracking, surveillance and 3D reconstruction. To address this, we propose an efficient, fully automated En...
详细信息
Stereo camera self-calibration is a complex challenge in computer vision applications such as robotics, object tracking, surveillance and 3D reconstruction. To address this, we propose an efficient, fully automated End-To-End AI-Based system for automatic stereo camera self-calibration with varying intrinsic parameters, using only two images of any 3D scene. Our system combines deep convolutional neuralnetworks (CNNs) with transfer learning techniques and fine-tuning. First, our end-to-end convolutional neural network optimized model begins by extracting matching points between a pair of stereo images. These matching points are then used, along with their 3D scene correspondences, to formulate a non-linear cost function. Direct optimization is subsequently performed to estimate the intrinsic camera parameters by minimizing this non-linear cost function. Following this initial optimization, a fine-tuning layer refines the intrinsic parameters for increased accuracy. Our hybrid approach is characterized by a special optimized architecture that leverages the strengths of end-to-end CNNs for image feature extraction and processing, as well as the pillars of our nonlinear cost function formulation and fine-tuning, to offer a robust and accurate method for stereo camera self-calibration. Extensive experiments on synthetic and real data demonstrate the superior performance of the proposed technique compared to traditional camera self-calibration methods in terms of precision and faster convergence.
暂无评论