Face recognition is an area of computer vision and imageprocessing that is quickly expanding, with many uses in security, surveillance, and biometric identity. The proposed model is to develop a criminal identificati...
详细信息
Fine-tuning pre-trained vision-Language Models (vLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concer...
详细信息
image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art contrastive approaches, e.g. CLIP (Radford et al., 2021), ALIGN (Jia ...
详细信息
ISBN:
(纸本)9798891760608
image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art contrastive approaches, e.g. CLIP (Radford et al., 2021), ALIGN (Jia et al., 2021), represent images and texts as dense embeddings and calculate the similarity in the dense embedding space as the matching score. On the other hand, sparse semantic features like bag-of-words models are inherently more interpretable, but believed to suffer from inferior accuracy than dense representations. In this work, we show that it is possible to build a sparse semantic representation that is as powerful as, or even better than, dense presentations. We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space. Each token in the space is a (sub-)word in the vocabulary, which is not only interpretable but also easy to integrate with existing information retrieval systems. STAIR model significantly outperforms a CLIP model with +4.9% and +4.3% absolute Recall@1 improvement on COCO-5k text -> image and image -> text retrieval respectively. It also achieved better performance on both of imageNet zero-shot and linear probing compared to CLIP. (1)
The increasing deployment of Advanced Driver Assistance Systems (ADAS) alongside the continual rise in camera sensor resolution has led to high bandwidth, and generally high cost, computation, and intra-vehicle commun...
详细信息
Similarity computation between images or image regions is a necessary precursor for several vision-based applications, such as retrieval, registration, change detection etc. A two-channel convolutional neural network ...
详细信息
Distributed acoustic sensors (DAS) are effective apparatuses that are widely used in many application areas for recording signals of various events with very high spatial resolution along optical fibers. To properly d...
详细信息
Distributed acoustic sensors (DAS) are effective apparatuses that are widely used in many application areas for recording signals of various events with very high spatial resolution along optical fibers. To properly detect and recognize the recorded events, advanced signal processing algorithms with high computational demands are crucial. Convolutional neural networks (CNNs) are highly capable tools to extract spatial information and are suitable for event recognition applications in DAS. Long short-term memory (LSTM) is an effective instrument to process sequential data. In this study, a two-stage feature extraction methodology that combines the capabilities of these neural network architectures with transfer learning is proposed to classify vibrations applied to an optical fiber by a piezoelectric transducer. First, the differential amplitude and phase information is extracted from the phasesensitive optical time domain reflectometer (40-OTDR) recordings and stored in a spatiotemporal data matrix. Then, a state-of-the-art pre-trained CNN without dense layers is used as a feature extractor in the first stage. In the second stage, LSTMs are used to further analyze the features extracted by the CNN. Finally, a dense layer is used to classify the extracted features. To observe the effect of different CNN architectures, the proposed model is tested with five state-of-the-art pre-trained models (vGG-16, ResNet-50, DenseNet-121, MobileNet, and Inception-v3). The results show that using the vGG-16 architecture in the proposed framework manages to obtain a 100% classification accuracy in 50 trainings and got the best results on the 40-OTDR dataset. The results of this study indicate that pre-trained CNNs combined with LSTM are very suitable to analyze differential amplitude and phase information represented in a spatiotemporal data matrix, which is promising for event recognition operations in DAS applications. (c) 2023 Optica Publishing Group
Autonomous terrain classification is an important problem in planetary navigation, whether the goal is to identify scientific sites of interest or to traverse treacherous areas safely. Past Martian rovers have relied ...
详细信息
With the incorporation of artificial intelligence in businesses, particularly features like computer vision, it has become increasingly important to ensure the robustness of the models being used. A popular technique ...
详细信息
Recent years have witnessed an increasingly broad application of artificial intelligence (AI) technologies such as speech recognition, computer vision, natural language processing, machine learning, algorithmic framew...
详细信息
Industrial defect detection is crucial for ensuring product quality and production efficiency, playing a pivotal role in advancing smart manufacturing. This paper reviews defect detection technologies for various indu...
详细信息
Industrial defect detection is crucial for ensuring product quality and production efficiency, playing a pivotal role in advancing smart manufacturing. This paper reviews defect detection technologies for various industrial products, including metals, textiles, and printed circuit boards, and introduces an innovative classification system. It also offers a detailed analysis of recent developments and practical applications of large models in industry defect detection. First, the basic principles of industrial defect detection are outlined. The detection methods are then categorized into three main groups: traditional imageprocessing, machine learning, and deep learning, with their principles, case studies, limitations, and future development directions analyzed. Traditional methods consist of image preprocessing, segmentation, and feature extraction. machine learning methods are divided into point-distance-based, hyperplane-based, tree-based, and neural network-based classification algorithms. Deep learning models are classified into two types: accuracy-oriented and efficiency-oriented. The paper organizes industrial defect datasets by type (multi-product and single-product), evaluates data quality and availability, and summarizes common evaluation metrics for accuracy, efficiency by task requirements. It also compares the latest methods on two public datasets to guide further research in defect detection. Real-world examples illustrate the end-to-end process, from data processing and hardware configuration to model training and deployment, while exploring the value and limitations of these technologies from the perspective of industry stakeholders. Finally, a systematic analysis of the key challenges and corresponding solutions is presented at the data and performance levels, and looks forward to the future direction of technological development, highlighting innovative paths and application potentials.
暂无评论