The proceedings contain 28 papers. The topics discussed include: on-demand multiclass imaging for sample scarcity in industrial environments;a multistage framework for detection of very small objects;exploiting self-i...
ISBN:
(纸本)9781450399531
The proceedings contain 28 papers. The topics discussed include: on-demand multiclass imaging for sample scarcity in industrial environments;a multistage framework for detection of very small objects;exploiting self-imposed constraints on RGB and LiDAR for unsupervised training;detection of fibrillatory episodes in atrial fibrillation rhythms via topology-informed machine learning;multi-scale feature enhancement network for face forgery detection;feature consistent point cloud registration in building information modeling;integrating user gaze with verbal instruction to reliably estimate robotic task parameters in a human-robot collaborative environment;recovering image information from speckle noise by imageprocessing;road lane segmentation using vehicle trajectory tracking and lane demarcation lines;and digital holography vs. display holography - what are their differences and what do they have in common?.
In the AI applications for natural language definitions, image captioning is a field that is expanding quickly. It attempts to capture meaningful interpretations of the interactions between the acquired picture data f...
详细信息
Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks across different data modalities. A PFM (e.g., BERT, ChatGPT, GPT-4) is trained on large-scale data, providing a solid pa...
详细信息
Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks across different data modalities. A PFM (e.g., BERT, ChatGPT, GPT-4) is trained on large-scale data, providing a solid parameter initialization for a wide range of downstream applications. In contrast to earlier methods that use convolution and recurrent modules for feature extraction, BERT learns bidirectional encoder representations from Transformers, trained on large datasets as contextual language models. Similarly, the Generative Pretrained Transformer (GPT) method employs Transformers as feature extractors and is trained on large datasets using an autoregressive paradigm. Recently, ChatGPT has demonstrated significant success in large language models, utilizing autoregressive language models with zero-shot or few-shot prompting. The remarkable success of PFMs has driven significant breakthroughs in AI, leading to numerous studies proposing various methods, datasets, and evaluation metrics, which increases the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, and other data modalities. It covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning, while also exploring advanced PFMs for different data modalities and unified PFMs that address data quality and quantity. Additionally, the review discusses key aspects such as model efficiency, security, and privacy, and provides insights into future research directions and challenges in PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and user-friendly interactive ability for artificial general intelligence.
作者:
Lu, Yufan
Zhejiang Gongshang University Hangzhou China
This research aims to improve the visual target detection and recognition capabilities of shopping robots in various sales environments by optimizing and improving the YOLO algorithm, in order to improve accuracy and ...
详细信息
We characterized manufacturing-induced defects in 316L stainless steels - fabricated by direct metal laser sintering (DMLS) - and investigated their roles in the fatigue behavior of steel parts. The primary defects ta...
详细信息
Industrial automation is undergoing a tremendous change due to the proliferation of the concepts, the Internet of Things (IoT), Cyber-Physical Systems (CPS) and tactile internet, which enables the interconnections of ...
详细信息
ISBN:
(纸本)9781665473507
Industrial automation is undergoing a tremendous change due to the proliferation of the concepts, the Internet of Things (IoT), Cyber-Physical Systems (CPS) and tactile internet, which enables the interconnections of factory floor devices and enterprise network on a wider and fine-grained scale. vision Sensor deployments are getting great momentum in factories, as it improves the quality and productivity of the systems being inspected. Smart vision Sensors[1] removes the need of the additional infrastructures for running the imageprocessing algorithms and visionapplications, by directly running the vision logic on the device and control/monitor the various parameters on the field based on the imageprocessing outputs. Industrial vision sensor (IVIS) is an industrial smart camera, which has a CMOS image sensor[2] and a powerful on-board processing system capable of supporting machinevisionapplications, for improving the product and process qualities and thereby improve the yield and profit. IVIS is capable of extracting applicationspecific information from the captured images and make decisions based on the imageprocessing algorithms implemented on the system, to realize stand-alone intelligent and decision-making automation system. In this paper we present the design and development of IVIS, its application domains and preliminary test results.
Intelligent optimization algorithm is an advanced computing technology, which simulates the biological evolution process in nature or the logical thinking of human beings to find a solution to the problem. In computer...
详细信息
Today's computer vision industry makes extensive use of image recognition. A popular method of image recognition is digit recognition. The recognition of handwritten numbers is one of the most well-known difficult...
详细信息
The increasing popularity of attention mechanisms in deep learning algorithms for computer vision and natural language processing made these models attractive to other research domains. In healthcare, there is a stron...
详细信息
The increasing popularity of attention mechanisms in deep learning algorithms for computer vision and natural language processing made these models attractive to other research domains. In healthcare, there is a strong need for tools that may improve the routines of the clinicians and the patients. Naturally, the use of attention-based algorithms for medical applications occurred smoothly. However, being healthcare a domain that depends on high-stake decisions, the scientific community must ponder if these high-performing algorithms fit the needs of medical applications. With this motto, this paper extensively reviews the use of attention mechanisms in machine learning methods (including Transformers) for several medical applications based on the types of tasks that may integrate several works pipelines of the medical domain. This work distinguishes itself from its predecessors by proposing a critical analysis of the claims and potentialities of attention mechanisms presented in the literature through an experimental case study on medical image classification with three different use cases. These experiments focus on the integrating process of attention mechanisms into established deep learning architectures, the analysis of their predictive power, and a visual assessment of their saliency maps generated by post-hoc explanation methods. This paper concludes with a critical analysis of the claims and potentialities presented in the literature about attention mechanisms and proposes future research lines in medical applications that may benefit from these frameworks.
Aiming at the problems of low detection accuracy, high computational complexity and long-time consumption of visual perception model in a complex mining environment, this research designs a visual information percepti...
详细信息
Aiming at the problems of low detection accuracy, high computational complexity and long-time consumption of visual perception model in a complex mining environment, this research designs a visual information perception system of coal mine comprehensive excavation working face for an edge computing terminal. Firstly, the C3-Fast feature extraction module, spatial pyramid pooling with cross-stage partial connection (SPPCSPC) pooling module, bi-directional feature pyramid network and lightweight decoupled detection head are used to optimize the YOLOv5s model, so as to construct the FSBD-YOLOv5s multi-object detection model. Secondly, the pruning and distillation algorithm is used to lighten the FSBD-YOLOv5s model, and the model complexity is greatly reduced while maintaining the model detection accuracy. Further, the lightweight FSBD-YOLOv5s model is migrated and deployed to the edge computing terminal platform and the TensorRT engine is used to accelerate model inference. Finally, experiments are carried out based on the data set of the coal mine comprehensive excavation working face. The experimental results show that on the edge computing terminal platform, the parameters and computational volume of the lightweight FSBD-YOLOv5s model are reduced by 50.8% and 34.0%, while its detection accuracy and speed reach 94.0% and 43.7 fps, which can fully satisfy the requirements of the accuracy and real-time for the coal mine engineering applications. In the complex operation scene of coal mine, due to adverse environmental factors such as uneven illumination, high dust and mixed man-machine multi-target, the speed and measurement accuracy of traditional visual perception model decrease sharply. In order to solve the above problems, this study proposes to build a visual information perception system for coal mine comprehensive excavation working face for edge computing terminal and combines channel pruning algorithm, knowledge extraction algorithm and TensorRT acceleration e
暂无评论