Cross-modal image-text retrieval has gained increasing attention due to its ability to combine computer vision with natural language processing. Previously, image and text features were extracted and concatenated to f...
详细信息
Cross-modal image-text retrieval has gained increasing attention due to its ability to combine computer vision with natural language processing. Previously, image and text features were extracted and concatenated to feed the transformer-based retrieval network. However, these approaches implicitly aligned the image and text modalities since the self-attention mechanism computes attention coefficients for all input features. In this paper, we propose cross-modal Semantic Alignments Module (SAM) to establish an explicit alignment through enhancing an inter-modal relationship. Firstly, visual and textual representations were extracted from an image and text pair. Secondly, we constructed a bipartite graph by representing the image regions and words in the sentence as nodes, and the relationship between them as edges. Then our proposed SAM allows the model to compute attention coefficients based on the edges in the graph. This process helps explicitly align the two modalities. Finally, a binary classifier was used to determine whether the given image-text pair is aligned. We reported extensive experiments on MS-COCO and Flickr30K test sets, showing that SAM could capture the joint representation between the two modalities and could be applied to the existing retrieval networks.
With the rise of deep learning, the cross collision between artificial intelligence and art, represented by image style transfer, has attracted high attention in the fields of graphic and image technology and art. Bas...
详细信息
Purpose With the help of basic physics, the application of computer algorithms in the form of recent advances such as machine learning and neural networking in textile Industry has been discussed in this review. Scien...
详细信息
Purpose With the help of basic physics, the application of computer algorithms in the form of recent advances such as machine learning and neural networking in textile Industry has been discussed in this review. Scientists have linked the underlying structural or chemical science of textile materials and discovered several strategies for completing some of the most time-consuming tasks with ease and precision. Since the 1980s, computer algorithms and machine learning have been used to aid the majority of the textile testing process. With the rise in demand for automation, deep learning, and neuralnetworks, these two now handle the majority of testing and quality control operations in the form of imageprocessing. Design/methodology/approach The state-of-the-art of artificial intelligence (AI) applications in the textile sector is reviewed in this paper. Based on several research problems and AI-based methods, the current literature is evaluated. The research issues are categorized into three categories based on the operation processes of the textile industry, including yarn manufacturing, fabric manufacture and coloration. Findings AI-assisted automation has improved not only machine efficiency but also overall industry operations. AI's fundamental concepts have been examined for real-world challenges. Several scientists conducted the majority of the case studies, and they confirmed that image analysis, backpropagation and neural networking may be specifically used as testing techniques in textile material testing. AI can be used to automate processes in various circumstances. Originality/value This research conducts a thorough analysis of artificialneural network applications in the textile sector.
Emerging optoelectronic synapses hold immense potential for advancing neuromorphic computing systems. However, achieving precise control over selective responses in optoelectronic memory and clarifying tunable synapti...
详细信息
Emerging optoelectronic synapses hold immense potential for advancing neuromorphic computing systems. However, achieving precise control over selective responses in optoelectronic memory and clarifying tunable synaptic weights has remained challenging. This study reports an optoelectronic synapse utilizing oxygen plasma-assisted defect engineering in tellurene for artificialneuralnetworks. Through DFT calculations and experimental analyses, we demonstrate that tellurene conductance can be modulated by controlling plasma-defined defect engineering, allowing a transition from short-term to long-term synaptic plasticity, largely determined by intrinsic large-lattice-relaxation effects. Our artificial synapses exhibit high linearity, a broad dynamic range, and tunable synaptic weights. Additionally, our optoelectronic synapses display selective sensitivity to multi-spectral light and achieve a pattern recognition accuracy of up to 96.7% across five typical datasets, surpassing even the ideal synapse. These tunable spectral responses, combined with high-performance neuromorphic applications using spike coding, establish a foundation for developments in brain-inspired machine learning, robotics, and real-time data processing.
Instrument tone recognition systems have over time had the highest application value and significance in information retrieval. Notably, the traditional systems and methods often rely on convolutional neuralnetworks ...
详细信息
By implementing neuromorphic paradigms in processing visual information, machine learning became crucial in an ever-increasing number of applications of our everyday lives, ever more performing but also computationall...
详细信息
By implementing neuromorphic paradigms in processing visual information, machine learning became crucial in an ever-increasing number of applications of our everyday lives, ever more performing but also computationally demanding. While a pre-processing of the information passively in the optical domain, before optical-electronic conversion, can reduce the computational requirements for a machine learning task, a comprehensive analysis of computational requirements for hybrid optical-digital neuralnetworks is thus far missing. In this work we critically compare and analyze the performance of different optical, digital and hybrid neural network architectures with respect to their classification accuracy and computational requirements for analog classification tasks of different complexity. We show that certain hybrid architectures exhibit a reduction of computational requirements of a factor >10 while maintaining their performance. This may inspire a new generation of co-designed optical-digital neural network architectures, aimed for applications that require low power consumption like remote sensing devices.
Convolutional neuralnetworks (CNNs) are often favored for their strong learning abilities in tackling automatic intelligent models. The classification of time series data streams spans across many applications of int...
详细信息
Convolutional neuralnetworks (CNNs) are often favored for their strong learning abilities in tackling automatic intelligent models. The classification of time series data streams spans across many applications of intelligent systems. However, the scarcity of effective Machine Learning architectures to handle limited time-series data adversely affects the realization of some crucial applications. In particular, healthcare-related applications are inherently concerned with limited time series datasets. Indeed, building effective artificial intelligence (AI) models for rare diseases using conventional techniques can pose a significant challenge. Utilizing recent advances in deep learning and signal processing techniques, this study introduces a new ensemble deep learning (DL) approach for time series categorization in the presence of limited datasets. Physiological data, such as ECG and voice, are used to demonstrate the functionality of the proposed DL architecture with data obtained from IoT and non-IoT devices. The proposed framework comprises a self-designed deep CNN-LSTM along with ResNet50 and MobileNet transfer learning approaches. The CNN-LSTM architecture includes an enhanced squeeze and excitation block that improves overall *** architecture processes time series data transformed into a 3-Channel image structure via improved recurrence plot (RP), Gramian angular field (GAF), and fuzzy recurrence plot (FRP) methods. The proposed model demonstrated superior classification accuracy on the ECG5000 and TESS datasets compared to other state-of-the-art techniques, validating its efficacy for binary and multiclass classification.
The philosophy of this study focuses on human footprint identification applicable for high-security applications such as the safety of public places, crime scene investigation, impostor identification, biotech labs an...
详细信息
The philosophy of this study focuses on human footprint identification applicable for high-security applications such as the safety of public places, crime scene investigation, impostor identification, biotech labs and blue-chip labs, and identification of infants in hospitals. The paper proposes one of the low-cost hardware to scan the biometric human footprints that utilise image pre-processing and enhancement capabilities for obtaining the features. The algorithm enhances the footprint matching performance by selecting the three sets of local invariant feature detectors - histogram of gradients, maximally stable external regions, and speed up robust features;local binary pattern as texture descriptor, corner point detector, and PCA. Furthermore, descriptive statistics are generated from all the above mentioned footprint features and concatenated to create the final feature vector. The proposed footprint biometric identification will correctly identify or classify the person by training the system with patterns of the interested subjects using an artificialneural network model specially designed for this task. The proposed method gives the classification accuracy at a very encouraging level of 99.55%.
In most practical applications, the feature space of the training datasets and the target domain datasets are inconsistent, or the data distribution between them is inconsistent, which leads to the problem of data sta...
详细信息
In most practical applications, the feature space of the training datasets and the target domain datasets are inconsistent, or the data distribution between them is inconsistent, which leads to the problem of data starvation and makes it difficult for terminal devices to obtain high accurate results. Aiming at the problems of limited terminal device resources, low accuracy of data processing results, and unsatisfactory processing speed, a Heterogeneous Multi-access Edge Computing (MEC) Framework based on Transfer Learning (TL) is proposed, abbreviated as HMECF-TL. This framework adopts a cloud-edge-end three-layer architecture. It uses model transfer to optimize the Convolutional neuralnetworks (CNN) model at each layer to achieve the goal of improving data processing speed and accuracy. Furthermore, a multi-agent Deep Reinforcement Learning Algorithm having Attention Mechanism (DRLAAM) is designed to further increase the timeliness performance of computation-intensive applications. The performance of HMECSF-TL framework is verified by simulation experiments, which not only reduces the delay by more than 24.66 %, but also improves the accuracy by more than 8.34 %. The framework not only increase the computing capacity to solve the shortage of terminal device resources, but also improve the quality of data processing to solve the problem of data starvation.
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investig...
详细信息
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neuralnetworks (CNNs) with conventional radar signal processing methods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signal processing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.
暂无评论