检索结果-内蒙古大学图书馆

Ultrabroadband Detection and Self-Powered Functionality in Quasi-One-Dimensional Nb3Se12I Nanowire Photodetectors for Bionic vision applications

引用

ACS APPLIED MATERIALS & INTERFACES 2025年第14期17卷 21448-21458页

作者： Zhang, Jianbin Zhao, Yi Liu, Ge Wang, Guangyi Chen, Liangqiang Shang, Conghui Li, Jiaxuan Zhou, Nan Xu, Hua Yang, Rusen Li, Xiaobo Xidian Univ Sch Adv Mat & Nanotechnol Shaanxi Joint Key Lab Graphene Shaanxi Key Lab High Orbits Electron Mat & Protect Xian 710126 Peoples R China Shaanxi Normal Univ Shaanxi Engn Lab Adv Energy Technol Key Lab Appl Surface & Colloid Chem Shaanxi Key Lab Adv Energy DevicesMinist EducSch Xian 710119 Peoples R China

The burgeoning fields of the Internet of things (IoT) and artificial intelligence (AI) have escalated the demands for image sensing technologies, necessitating advancements in sensor efficiency and functionality. Traditional image sensors, structured on von Neumann architectures with discrete processing units, face challenges, such as high power consumption, latency, and escalated hardware costs. In this work, we introduced a unique approach through the development of a quasi-one-dimensional nanowire Nb3Se12I-based double-ended photosensor. The advanced sensor not only replicated the adaptive behavior of biological vision systems but also effectively managed the decreased sensitivity triggered by intense light stimuli. The integration of the photothermoelectric and bolometric effects allows the device to operate in a self-powered mode, offering broadband detectivity ranging from visible (405 nm) to midwave infrared (4060 nm). Additionally, the quasi-one-dimensional structure enables an angle-dependent response to polarized light with a polarization ratio of 1.83. Our findings suggest that the biomimetic vision adaptive sensor based on Nb3Se12I could effectively enhance the capabilities of smart optical sensors and machine vision systems.

关键词： photothermoelectric self-powered Nb3Se12I ultrabroadband detection bionicvisual adaptation

来源：评论

学校读者我要写书评

暂无评论

Retinal fundus image enhancement using an ensemble framework for accurate glaucoma detection

引用

Neural Computing and applications 2024年 1-19页

作者： Lenka, Satyabrata Mayaluri, Zefree Lazarus Panda, Ganapati Department of Electrical Engineering C.V. Raman Global University Odisha Bhubaneswar India C.V. Raman Global University Odisha Bhubaneswar India

Retinal fundus imaging plays a crucial role in the diagnosis of ophthalmic diseases such as glaucoma, a significant cause of vision loss worldwide. Accurate detection of glaucoma using image processing, machine learning, and deep learning approaches depends on the effectiveness with which the retinal fundus images are captured. Poor-quality images with artifacts, including uneven illumination, blur, and color distortion, can lead to incorrect diagnoses. In this work, we propose an end-to-end glaucoma detection model based on the ensemble of image enhancement networks, segmentation networks, and image classification networks. The proposed approach consists of an improved version of generative adversarial network (GAN) called the cycle consistency GAN (cycle-GAN) for image quality enhancement, U-Net for optic cup and optic disc segmentation, and support vector machine for image classification. The cycle-GAN model uses autoencoders as generators and a deep convolutional neural network (CNN) as discriminators to generate high-quality fundus images. The cup-to-disc ratio, a popular feature, is utilized to categorize fundus images as either glaucomatous or non-glaucomatous. We use six imbalanced datasets for experimental analysis of the proposed ensemble model, including ORIGA, ACRIMA, DRISTI-GS, REFUGE, Messidor, and Mendeley. The experimental findings demonstrate that the proposed ensemble model works better than individual models such as GAN, Autoencoder, deep CNN, and also from existing methods. The proposed method not only reduces the artifacts from fundus images but also solves the problem of imbalanced datasets for accurate glaucoma detection. The experimental results show maximum accuracy, precision, recall, and F-measure values of 0.968, 0.821, 0.974, and 0.891, respectively. © The Author(s), under exclusive licence to Springer-verlag London Ltd., part of Springer Nature 2024.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

A Survey on Efficient vision Transformers: Algorithms, Techniques, and Performance Benchmarking

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2024年第12期46卷 7682-7700页

作者： Papa, Lorenzo Russo, Paolo Amerini, Irene Zhou, Luping Sapienza Univ Rome Dept Comp Control & Management Engn I-00185 Rome Italy Univ Sydney Sch Elect & Informat Engn Fac Engn Sydney NSW 2006 Australia

vision Transformer (viT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, viT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper first mathematically defines the strategies used to make vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.

关键词： Computer vision computational efficiency vision transformer

来源：评论

学校读者我要写书评

暂无评论

Hybrid features extraction for the online mineral grades determination in the flotation froth using Deep Learning

引用

ENGINEERING applications OF ARTIFICIAL INTELLIGENCE 2024年 129卷

作者： Bendaouia, Ahmed Abdelwahed, El Hassan Qassimi, Sara Boussetta, Abdelmalek Benzakour, Intissar Benhayoun, Abderrahmane Amar, Oumkeltoum Bourzeix, Francois Baina, Karim Cherkaoui, Mouhamed Hasidi, Oussama Cadi Ayyad Univ Fac Sci Semlalia Comp Sci Dept Comp Syst Engn Lab LISI Marrakech Morocco Moroccan Fdn Adv Sci Innovat & Res UM6P Rabat Morocco R&D & Engn Ctr Managem Grp Reminex Marrakech Morocco Cadi Ayyad Univ Fac Sci & Technol Comp & Syst Engn Lab L2IS Marrakech Morocco Mohammed V Univ Ecole Natl Super Informat Anal Syst ENSIAS Rabat 10000 Morocco ENSMR Engn Sch Rabat Morocco

The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in realizing effective and real-time control of the flotation process. The goal of this study is to employ image processing techniques and CNN-based features extraction combined with machine learning and deep learning to predict the elemental composition of minerals in the flotation froth. A real world dataset has been collected and preprocessed from a differential flotation circuit at the industrial flotation site based in Guemassa, Morocco. Using image-processing algorithms, the extracted features from the flotation froth include: the texture, the bubble size, the velocity and the color distribution. To predict the mineral concentrate grades, our study includes several supervised machine learning algorithms (ML), artificial neural networks (ANN) and convolutional neural networks (CNN). The industrial experimental evaluations revealed relevant performances with an accuracy up to 0.94. Furthermore, our proposed Hybrid method was evaluated in a real flotation process for the Zn, Pb, Fe and Cu concentrate grades, with an error of precision lesser than 4.53. These results demonstrate the significant potential of our proposed online analyzer as an artificial intelligence application in the field of complex polymetallic flotation circuits (Pb, Fe, Cu, Zn).

关键词： machine learning Deep learning Computer vision Features extraction Mining industry Industry 4.0 Flotation

来源：评论

学校读者我要写书评

暂无评论

Ensemble of vision transformer architectures for efficient Alzheimer's Disease classification

引用

BRAIN INFORMATICS 2024年第1期11卷 25页

作者： Shaffi, Noushath viswan, vimbi Mahmud, Mufti Sultan Qaboos Univ Coll Sci Dept Comp Sci POB 36 Muscat 123 Oman Univ Technol & Appl Sci Coll Comp & Informat Sci Sohar OM311 Oman Nottingham Trent Univ Dept Comp Sci Nottingham NG11 8NS England Nottingham Trent Univ Med Technol Innovat Facil Nottingham NG11 England Nottingham Trent Univ Comp & Informat Res Ctr Nottingham NG11 8NS England

Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (vT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of vTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of vTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla vTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of vT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.

关键词： vision transformer Convolutional neural networks machine learning models Alzheimer's Disease Swin transformer Data efficient image transformers Bidirectional encoder representation from image transformers

来源：评论

学校读者我要写书评

暂无评论

Sketch-to-image synthesis via semantic masks

引用

MULTIMEDIA TOOLS AND applications 2024年第10期83卷 29047-29066页

作者： Baraheem, Samah S. Tam v. Nguyen Umm Al Qura Univ Dept Comp Sci Al Lith Saudi Arabia Univ Dayton Dept Comp Sci Dayton OH 45469 USA

Sketch-to-image is an important task to reduce the burden of creating a color image from scratch. Unlike previous sketch-to-image models, where the image is synthesized in an end-to-end manner, leading to an unnaturalistic image, we propose a method by decomposing the problem into subproblems to generate a more naturalistic and reasonable image. It first generates an intermediate output which is a semantic mask map from the input sketch through instance and semantic segmentation in two levels, background segmentation and foreground segmentation. Background segmentation is formed based on the context of the foreground objects. Then, the foreground segmentations are sequentially added to the created background segmentation. Finally, the generated mask map is fed into an image-to-image translation model to generate an image. Our proposed method works with 92 distinct classes. Compared to state-of-the-art sketch-to-image models, our proposed method outperforms the previous methods and generates better images.

关键词： Sketch-to-image generation Sketch-to-image synthesis Computer vision Generative adversarial networks Instance and semantic segmentation machine learning

来源：评论

学校读者我要写书评

暂无评论

Tensor Network Methods for Hyperparameter Optimization and Compression of Convolutional Neural Networks

引用

APPLIED SCIENCES-BASEL 2025年第4期15卷 1852-1852页

作者： Naumov, A. Melnikov, A. Perelshtein, M. Melnikov, Ar. Abronin, v. Oksanichenko, F. Terra Quantum AG Kornhausstr 25 CH-9000 St Gallen Switzerland

Neural networks have become a cornerstone of computer vision applications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model compression remain critical for improving performance and deploying models on resource-constrained devices. In this work, we address these challenges using Tensor Network-based methods. For HPO, we propose and evaluate the TetraOpt algorithm against various optimization algorithms. These evaluations were conducted on subsets of the NATS-Bench dataset, including CIFAR-10, CIFAR-100, and imageNet subsets. TetraOpt consistently demonstrated superior performance, effectively exploring the global optimization space and identifying configurations with higher accuracies. For model compression, we introduce a novel iterative method that combines CP, SvD, and Tucker tensor decompositions. Applied to ResNet-18 and ResNet-152, we evaluated our method on the CIFAR-10 and Tiny imageNet datasets. Our method achieved compression ratios of up to 14.5x for ResNet-18 and 2.5x for ResNet-152. Additionally, the inference time for processing an image on a CPU remained largely unaffected, demonstrating the practicality of the method.

关键词： tensortrain optimisation hyperparameter optimisation image classification computer vision and pattern recognition convolutional neural networks model compression CP decomposition SvD decomposition tucker decomposition

来源：评论

学校读者我要写书评

暂无评论

Computer vision Techniques in Manufacturing

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2023年第1期53卷 105-117页

作者： Zhou, Longfei Zhang, Lin Konz, Nicholas MIT Comp Sci & Artificial Intelligence Lab 77 Massachusetts Ave Cambridge MA 02139 USA Beihang Univ Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China Duke Univ Dept Elect & Comp Engn Durham NC 27708 USA

Computer vision (Cv) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of Cv techniques, we present a comprehensive review of the state of the art of these techniques and their applications in manufacturing industries. We survey the most common methods, including feature detection, recognition, segmentation, and three-dimensional modeling. A system framework of Cv in the manufacturing environment is proposed, consisting of a lighting module, a manufacturing system, a sensing module, Cv algorithms, a decision-making module, and an actuator. applications of Cv to different stages of the entire product life cycle are then explored, including product design, modeling and simulation, planning and scheduling, the production process, inspection and quality control, assembly, transportation, and disassembly. Challenges include algorithm implementation, data preprocessing, data labeling, and benchmarks. Future directions include building benchmarks, developing methods for nonannotated data processing, developing effective data preprocessing mechanisms, customizing Cv models, and opportunities aroused by 5G.

关键词： image edge detection image segmentation Task analysis Robot sensing systems Sensors Feature detection Three-dimensional displays Assembly computer vision (Cv) deep learning inspection machine intelligence machine learning manufacturing production robotics survey

来源：评论

学校读者我要写书评

暂无评论

The JPEG Pleno Learning-Based Point Cloud Coding Standard: Serving Man and machine

引用

IEEE ACCESS 2025年 13卷 43289-43315页

作者： Guarda, Andre F. R. Rodrigues, Nuno M. M. Pereira, Fernando Inst Telecomunicacoes P-1049001 Lisbon Portugal Politecn Leiria ESTG P-2411901 Leiria Portugal Univ Lisbon Inst Super Tecn P-1049001 Lisbon Portugal

Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.

关键词： Transform coding Point cloud compression Encoding Standards image coding Three-dimensional displays Geometry image color analysis Artificial intelligence Codecs JPEG Pleno standard learning-based coding man and machine point cloud coding

来源：评论

学校读者我要写书评

暂无评论

Exploring image Transformations with Diffusion Models: A Survey of applications and Implementation Code 9th

Exploring Image Transformations with Diffusion Models: A Sur...

引用

9th Annual Conference on machine Learning, Optimization and Data science (LOD)

作者： Arellano, Silvia Otero, Beatriz Tous, Ruben Univ Politecn Cataluna Barcelona Spain

ISBN: (纸本)9783031539657;9783031539664

Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of image transformations. The objective of this survey is to provide an overview of state-of-the-art applications of diffusion models in image transformations, including image inpainting, super-resolution, restoration, translation, and editing. This survey presents a selection of notable papers and repositories including practical applications of diffusion models for image transformations. The applications are presented in a practical and concise manner, facilitating the understanding of concepts behind diffusion models and how they function. Additionally, it includes a curated collection of GitHub repositories featuring popular examples of these subjects.

关键词： Diffusion Models image Transformations applications Computer vision Inpainting Restoration Translation Editing Super-resolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：