The burgeoning fields of the Internet of things (IoT) and artificial intelligence (AI) have escalated the demands for image sensing technologies, necessitating advancements in sensor efficiency and functionality. Trad...
详细信息
The burgeoning fields of the Internet of things (IoT) and artificial intelligence (AI) have escalated the demands for image sensing technologies, necessitating advancements in sensor efficiency and functionality. Traditional image sensors, structured on von Neumann architectures with discrete processing units, face challenges, such as high power consumption, latency, and escalated hardware costs. In this work, we introduced a unique approach through the development of a quasi-one-dimensional nanowire Nb3Se12I-based double-ended photosensor. The advanced sensor not only replicated the adaptive behavior of biological vision systems but also effectively managed the decreased sensitivity triggered by intense light stimuli. The integration of the photothermoelectric and bolometric effects allows the device to operate in a self-powered mode, offering broadband detectivity ranging from visible (405 nm) to midwave infrared (4060 nm). Additionally, the quasi-one-dimensional structure enables an angle-dependent response to polarized light with a polarization ratio of 1.83. Our findings suggest that the biomimetic vision adaptive sensor based on Nb3Se12I could effectively enhance the capabilities of smart optical sensors and machinevision systems.
Retinal fundus imaging plays a crucial role in the diagnosis of ophthalmic diseases such as glaucoma, a significant cause of vision loss worldwide. Accurate detection of glaucoma using imageprocessing, machine learni...
详细信息
vision Transformer (viT) architectures are becoming increasingly popular and widely employed to tackle computer visionapplications. Their main feature is the capacity to extract global information through the self-at...
详细信息
vision Transformer (viT) architectures are becoming increasingly popular and widely employed to tackle computer visionapplications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, viT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper first mathematically defines the strategies used to make vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in real...
详细信息
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in realizing effective and real-time control of the flotation process. The goal of this study is to employ imageprocessing techniques and CNN-based features extraction combined with machine learning and deep learning to predict the elemental composition of minerals in the flotation froth. A real world dataset has been collected and preprocessed from a differential flotation circuit at the industrial flotation site based in Guemassa, Morocco. Using image-processing algorithms, the extracted features from the flotation froth include: the texture, the bubble size, the velocity and the color distribution. To predict the mineral concentrate grades, our study includes several supervised machine learning algorithms (ML), artificial neural networks (ANN) and convolutional neural networks (CNN). The industrial experimental evaluations revealed relevant performances with an accuracy up to 0.94. Furthermore, our proposed Hybrid method was evaluated in a real flotation process for the Zn, Pb, Fe and Cu concentrate grades, with an error of precision lesser than 4.53. These results demonstrate the significant potential of our proposed online analyzer as an artificial intelligence application in the field of complex polymetallic flotation circuits (Pb, Fe, Cu, Zn).
Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (vT) have recently become a new state-of-the-art for computer vision ...
详细信息
Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (vT) have recently become a new state-of-the-art for computer visionapplications. Motivated by the success of vTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of vTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla vTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of vT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.
Sketch-to-image is an important task to reduce the burden of creating a color image from scratch. Unlike previous sketch-to-image models, where the image is synthesized in an end-to-end manner, leading to an unnatural...
详细信息
Sketch-to-image is an important task to reduce the burden of creating a color image from scratch. Unlike previous sketch-to-image models, where the image is synthesized in an end-to-end manner, leading to an unnaturalistic image, we propose a method by decomposing the problem into subproblems to generate a more naturalistic and reasonable image. It first generates an intermediate output which is a semantic mask map from the input sketch through instance and semantic segmentation in two levels, background segmentation and foreground segmentation. Background segmentation is formed based on the context of the foreground objects. Then, the foreground segmentations are sequentially added to the created background segmentation. Finally, the generated mask map is fed into an image-to-image translation model to generate an image. Our proposed method works with 92 distinct classes. Compared to state-of-the-art sketch-to-image models, our proposed method outperforms the previous methods and generates better images.
Neural networks have become a cornerstone of computer visionapplications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model comp...
详细信息
Neural networks have become a cornerstone of computer visionapplications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model compression remain critical for improving performance and deploying models on resource-constrained devices. In this work, we address these challenges using Tensor Network-based methods. For HPO, we propose and evaluate the TetraOpt algorithm against various optimization algorithms. These evaluations were conducted on subsets of the NATS-Bench dataset, including CIFAR-10, CIFAR-100, and imageNet subsets. TetraOpt consistently demonstrated superior performance, effectively exploring the global optimization space and identifying configurations with higher accuracies. For model compression, we introduce a novel iterative method that combines CP, SvD, and Tucker tensor decompositions. Applied to ResNet-18 and ResNet-152, we evaluated our method on the CIFAR-10 and Tiny imageNet datasets. Our method achieved compression ratios of up to 14.5x for ResNet-18 and 2.5x for ResNet-152. Additionally, the inference time for processing an image on a CPU remained largely unaffected, demonstrating the practicality of the method.
作者:
Zhou, LongfeiZhang, LinKonz, NicholasMIT
Comp Sci & Artificial Intelligence Lab 77 Massachusetts Ave Cambridge MA 02139 USA Beihang Univ
Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China Duke Univ
Dept Elect & Comp Engn Durham NC 27708 USA
Computer vision (Cv) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of Cv techniques, w...
详细信息
Computer vision (Cv) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of Cv techniques, we present a comprehensive review of the state of the art of these techniques and their applications in manufacturing industries. We survey the most common methods, including feature detection, recognition, segmentation, and three-dimensional modeling. A system framework of Cv in the manufacturing environment is proposed, consisting of a lighting module, a manufacturing system, a sensing module, Cv algorithms, a decision-making module, and an actuator. applications of Cv to different stages of the entire product life cycle are then explored, including product design, modeling and simulation, planning and scheduling, the production process, inspection and quality control, assembly, transportation, and disassembly. Challenges include algorithm implementation, data preprocessing, data labeling, and benchmarks. Future directions include building benchmarks, developing methods for nonannotated data processing, developing effective data preprocessing mechanisms, customizing Cv models, and opportunities aroused by 5G.
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may fun...
详细信息
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machineprocessing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of i...
详细信息
ISBN:
(纸本)9783031539657;9783031539664
Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of image transformations. The objective of this survey is to provide an overview of state-of-the-art applications of diffusion models in image transformations, including image inpainting, super-resolution, restoration, translation, and editing. This survey presents a selection of notable papers and repositories including practical applications of diffusion models for image transformations. The applications are presented in a practical and concise manner, facilitating the understanding of concepts behind diffusion models and how they function. Additionally, it includes a curated collection of GitHub repositories featuring popular examples of these subjects.
暂无评论