Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (vT) have recently become a new state-of-the-art for computer vision ...
详细信息
Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (vT) have recently become a new state-of-the-art for computer visionapplications. Motivated by the success of vTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of vTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla vTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of vT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.
Sketch-to-image is an important task to reduce the burden of creating a color image from scratch. Unlike previous sketch-to-image models, where the image is synthesized in an end-to-end manner, leading to an unnatural...
详细信息
Sketch-to-image is an important task to reduce the burden of creating a color image from scratch. Unlike previous sketch-to-image models, where the image is synthesized in an end-to-end manner, leading to an unnaturalistic image, we propose a method by decomposing the problem into subproblems to generate a more naturalistic and reasonable image. It first generates an intermediate output which is a semantic mask map from the input sketch through instance and semantic segmentation in two levels, background segmentation and foreground segmentation. Background segmentation is formed based on the context of the foreground objects. Then, the foreground segmentations are sequentially added to the created background segmentation. Finally, the generated mask map is fed into an image-to-image translation model to generate an image. Our proposed method works with 92 distinct classes. Compared to state-of-the-art sketch-to-image models, our proposed method outperforms the previous methods and generates better images.
Neural networks have become a cornerstone of computer visionapplications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model comp...
详细信息
Neural networks have become a cornerstone of computer visionapplications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model compression remain critical for improving performance and deploying models on resource-constrained devices. In this work, we address these challenges using Tensor Network-based methods. For HPO, we propose and evaluate the TetraOpt algorithm against various optimization algorithms. These evaluations were conducted on subsets of the NATS-Bench dataset, including CIFAR-10, CIFAR-100, and imageNet subsets. TetraOpt consistently demonstrated superior performance, effectively exploring the global optimization space and identifying configurations with higher accuracies. For model compression, we introduce a novel iterative method that combines CP, SvD, and Tucker tensor decompositions. Applied to ResNet-18 and ResNet-152, we evaluated our method on the CIFAR-10 and Tiny imageNet datasets. Our method achieved compression ratios of up to 14.5x for ResNet-18 and 2.5x for ResNet-152. Additionally, the inference time for processing an image on a CPU remained largely unaffected, demonstrating the practicality of the method.
作者:
Zhou, LongfeiZhang, LinKonz, NicholasMIT
Comp Sci & Artificial Intelligence Lab 77 Massachusetts Ave Cambridge MA 02139 USA Beihang Univ
Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China Duke Univ
Dept Elect & Comp Engn Durham NC 27708 USA
Computer vision (Cv) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of Cv techniques, w...
详细信息
Computer vision (Cv) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of Cv techniques, we present a comprehensive review of the state of the art of these techniques and their applications in manufacturing industries. We survey the most common methods, including feature detection, recognition, segmentation, and three-dimensional modeling. A system framework of Cv in the manufacturing environment is proposed, consisting of a lighting module, a manufacturing system, a sensing module, Cv algorithms, a decision-making module, and an actuator. applications of Cv to different stages of the entire product life cycle are then explored, including product design, modeling and simulation, planning and scheduling, the production process, inspection and quality control, assembly, transportation, and disassembly. Challenges include algorithm implementation, data preprocessing, data labeling, and benchmarks. Future directions include building benchmarks, developing methods for nonannotated data processing, developing effective data preprocessing mechanisms, customizing Cv models, and opportunities aroused by 5G.
This study addresses the critical challenge of distinguishing Unmanned Aerial vehicles (UAvs) from birds in real-time for airspace security in both military and civilian contexts. As UAvs become increasingly common, a...
详细信息
This study addresses the critical challenge of distinguishing Unmanned Aerial vehicles (UAvs) from birds in real-time for airspace security in both military and civilian contexts. As UAvs become increasingly common, advanced systems must accurately identify them in dynamic environments to ensure operational safety. We evaluated several machine learning algorithms, including K-Nearest Neighbors (kNN), AdaBoost, CN2 Rule Induction, and Support vector machine (SvM), employing a comprehensive methodology that included data preprocessing steps such as image resizing, normalization, and augmentation to optimize training on the "Birds vs. Drone Dataset." The performance of each model was assessed using evaluation metrics such as accuracy, precision, recall, F1 score, and Area Under the Curve (AUC) to determine their effectiveness in distinguishing UAvs from birds. Results demonstrate that kNN, AdaBoost, and CN2 Rule Induction are particularly effective, achieving high accuracy while minimizing false positives and false negatives. These models excel in reducing operational risks and enhancing surveillance efficiency, making them suitable for real-time security applications. The integration of these algorithms into existing surveillance systems offers robust classification capabilities and real-time decision-making under challenging conditions. Additionally, the study highlights future directions for research in computational performance optimization, algorithm development, and ethical considerations related to privacy and surveillance. The findings contribute to both the technical domain of machine learning in security and broader societal impacts, such as civil aviation safety and environmental monitoring.
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may fun...
详细信息
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machineprocessing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of i...
详细信息
ISBN:
(纸本)9783031539657;9783031539664
Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of image transformations. The objective of this survey is to provide an overview of state-of-the-art applications of diffusion models in image transformations, including image inpainting, super-resolution, restoration, translation, and editing. This survey presents a selection of notable papers and repositories including practical applications of diffusion models for image transformations. The applications are presented in a practical and concise manner, facilitating the understanding of concepts behind diffusion models and how they function. Additionally, it includes a curated collection of GitHub repositories featuring popular examples of these subjects.
The process of cultivating soil for crop planting and domesticating animals is known as agriculture. A growing agriculture sector indicates an improving economy. Agriculture is considered as the initial pillar that su...
详细信息
The process of cultivating soil for crop planting and domesticating animals is known as agriculture. A growing agriculture sector indicates an improving economy. Agriculture is considered as the initial pillar that supports global food safety. Additionally, it controls the majority of the global economy. Since we depend on agriculture for survival, it needs to be regularly supervised by us. In this global era of computerization, humans depend entirely on cyberspace material as it is super-fast and takes less time as compared to humans. Hence, human vision can be replicated by computer vision. visual data and information are processed and analyzed using computer hardware and software. It covers the procedures for gathering, sending, processing, filtering, storing, and comprehending visual data. The study of computational theory can direct computer vision research, and a variety of applications offer a solid foundation and research platform. The use of machinevision has recently increased in response to the growing need for fast and precise ways to track the production of fruit. machine learning (ML) algorithms make it possible to swiftly and reliably analyze enormous amounts of data, regardless of complexity. It is already widely used in many domains, such as credit analysis, fraud detection, defect sophisticated spam filters, picture recognition patterns, prediction models, and inspection of product features. But with so many options available, it is critical to understand the unique qualities of each approach and the optimal situation in which to apply it. In this review, we have discussed in detail the use of artificial intelligence (AI) in fruit production and summarized more than 110 research applications of AI in fruit production technology. As of now, this review is the first compilation work on the application and prospects of AI-based technology in fruit production systems. This review will provide a single-point comprehensive source of information for acad
Quick and reliable measurement of wood chip moisture content is an everlasting problem for numerous forest-reliant industries such as biofuel, pulp and paper, and bio-refineries. Moisture content is a critical attribu...
详细信息
Quick and reliable measurement of wood chip moisture content is an everlasting problem for numerous forest-reliant industries such as biofuel, pulp and paper, and bio-refineries. Moisture content is a critical attribute of wood chips due to its direct relationship with the final product quality. Conventional techniques for determining moisture content, such as oven-drying, possess some drawbacks in terms of their time-consuming nature, potential sample damage, and lack of real-time feasibility. Furthermore, alternative techniques, including NIR spectroscopy, electrical capacitance, X-rays, and microwaves, have demonstrated potential;nevertheless, they are still constrained by issues related to portability, precision, and the expense of the required equipment. Hence, there is a need for a moisture content determination method that is instant, portable, non-destructive, inexpensive, and precise. This study explores the use of deep learning and machinevision to predict moisture content classes from RGB images of wood chips. A large-scale image dataset comprising 1,600 RGB images of wood chips has been collected and annotated with ground truth labels, utilizing the results of the oven-drying technique. Two high-performing neural networks, MoistNetLite and MoistNetMax, have been developed leveraging Neural Architecture Search (NAS) and hyperparameter optimization. The developed models are evaluated and compared with state-of-the-art deep learning models. Results demonstrate that MoistNetLite achieves 87% accuracy with minimal computational overhead, while MoistNetMax exhibits exceptional precision with a 91% accuracy in wood chip moisture content class prediction. With improved accuracy (9.6% improvement in accuracy by MoistNetMax compared to the best baseline model ResNet152v2) and faster prediction speed (MoistNetLite being twice as fast as MobileNet), our proposed MoistNet models hold great promise for the wood chip processing industry to be efficiently deployed on p
vitiligo, alopecia areata, atopic, and stasis dermatitis are common skin conditions that pose diagnostic and assessment challenges. Skin image analysis is a promising noninvasive approach for objective and automated d...
详细信息
vitiligo, alopecia areata, atopic, and stasis dermatitis are common skin conditions that pose diagnostic and assessment challenges. Skin image analysis is a promising noninvasive approach for objective and automated detection as well as quantitative assessment of skin diseases. This review provides a systematic literature search regarding the analysis of computer vision techniques applied to these benign skin conditions, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The review examines deep learning architectures and imageprocessing algorithms for segmentation, feature extraction, and classification tasks employed for disease detection. It also focuses on practical applications, emphasizing quantitative disease assessment, and the performance of various computer vision approaches for each condition while highlighting their strengths and limitations. Finally, the review denotes the need for disease-specific datasets with curated annotations and suggests future directions toward unsupervised or self-supervised approaches. Additionally, the findings underscore the importance of developing accurate, automated tools for disease severity score calculation to improve ML-based monitoring and diagnosis in dermatology.
暂无评论