检索结果-内蒙古大学图书馆

Compute-First Optical Detection for Noise-Resilient visual Perception

ACS PHOTONICS 2025年第2期12卷 1137-1145页

作者： Kim, Jungmin Yu, Nanfang Yu, Zongfu Univ Wisconsin Dept Elect & Comp Engn Madison WI 53706 USA Columbia Univ Dept Appl Phys & Appl Math New York NY 10027 USA

During machine visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy environments, such as a thermal imaging system, however, the neural performance faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception, as benchmarked with MNIST classification. A quantitative analysis of the relationship between signal concentration and noise robustness supports our idea with its practical implementation in an incoherent imaging system. This compute-first detection scheme can advance infrared machine vision technologies for industrial and defense applications.

关键词： Optical processing linear transformation signal-to-noise ratio metasurface optical neuralnetwork visual perception

来源：评论

学校读者我要写书评

暂无评论

Hand Gesture Recognition on Edge Devices: Sensor Technologies, Algorithms, and processing Hardware

引用

SENSORS 2025年第6期25卷 1687-1687页

作者： Fertl, Elfi Castillo, Encarnacion Stettinger, Georg Cuellar, Manuel P. Morales, Diego P. Infineon Technol AG D-85579 Neubiberg Germany Univ Granada Dept Elect & Comp Technol E-18071 Granada Spain Univ Granada Dept Comp Sci & Artificial Intelligence E-18071 Granada Spain

Hand gesture recognition (HGR) is a convenient and natural form of human-computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this paper gives an overview focused on device-free HGR. That means we evaluate HGR systems that do not require the user to wear something like a data glove or hold a device. HGR systems are explored regarding technology, hardware, and algorithms. The interconnectedness of timing and power requirements with hardware, pre-processing algorithm, classification, and technology and how they permit more or less granularity, accuracy, and number of gestures is clearly demonstrated. Sensor modalities evaluated are WIFI, vision, radar, mobile networks, and ultrasound. The pre-processing technologies stereo vision, multiple-input multiple-output (MIMO), spectrogram, phased array, range-doppler-map, range-angle-map, doppler-angle-map, and multilateration are explored. Classification approaches with and without ML are studied. Among those with ML, assessed algorithms range from simple tree structures to transformers. All applications are evaluated taking into account their level of integration. This encompasses determining whether the application presented is suitable for edge integration, their real-time capability, whether continuous learning is implemented, which robustness was achieved, whether ML is applied, and the accuracy level. Our survey aims to provide a thorough understanding of the current state of the art in device-free HGR on edge devices and in general. Finally, on the basis of present-day challenges and opportunities in this field, we outline which further research we suggest for HGR improvement. Our goal is to promote the development of efficient and accurate gesture recognition systems.

关键词： hand gesture recognition edge machine learning artificial intelligence algorithms signal processing radar WiFi 4G 5G LTE ultrasound lidar vision image processing AI accelerators

来源：评论

学校读者我要写书评

暂无评论

Adaptive adam-based optimizers using second-order weight decoupling and gradient-aware weight decay for vision transformer

引用

machine vision AND applications 2025年第3期36卷 1-14页

作者： Sai, Boyapati Hemanth Mukherjee, Snehasis Dubey, Shiv Ram Shiv Nadar Inst Eminence Greater Noida UP India Indian Inst Informat Technol Allahabad Prayagraj UP India

Optimizers play important roles in enhancing the performance of a deep network. A study on different optimizers is necessary to understand the effect of optimizers on the performance of the deep network for a given target task, such as image classification. Several attempts were made to investigate the effect of optimizers on the performance of CNNs. However, such experiments have not been carried out on vision transformers (viT), despite the recent success of viT in various image processing tasks. In this paper, we conduct exhaustive experiments with viT using different optimizers. In our experiments, we found that weight decoupling and weight decay in optimizers play important roles in training viT. We focused on the concept of weight decoupling and tried different variations of it to investigate to what extent weight decoupling is beneficial for a viT. We propose two techniques that provide better results than weight-decoupled optimizers: (i) The weight decoupling step in optimizers involves a linear update of the parameter with weight decay as the scaling factor. We propose a quadratic update of the parameter which involves using a linear as well as squared parameter update using the weight decay as the scaling factor. (ii) We propose using different weight decay values for different parameters depending on the gradient value of the loss function with respect to that parameter. A smaller weight decay is used for parameters with a higher gradient value and vice versa. image classification experiments are conducted over CIFAR-100 and TinyimageNet datasets to observe the performance of these proposed methods with respect to state-of-the-art optimizers such as Adam, RAdam, and AdaBelief. The code is available at https://***/Hemanth-Boyapati/Adaptive-weight-decay-optimizers.

关键词： Adam-based optimizer Weight decoupling Transformers Weight decay Adaptive optimizers

来源：评论

学校读者我要写书评

暂无评论

EFL-LCNN: Enhanced face localization augmented light convolutional neural network for human emotion recognition

引用

MULTIMEDIA TOOLS AND applications 2024年第4期83卷 12089-12110页

作者： Bellamkonda, Sivaiah Settipalli, Lavanya Indian Inst Informat Technol Kottayam Dept Comp Sci & Engn Kottayam 686635 Kerala India Indian Inst Informat Technol Dept Cyber Secur Kottayam 686635 Kerala India

Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machine vision applications. Many deep learning based FER models have been developed and shown good results on emotion recognition. However, FER using deep learning still suffering from illumination conditions, noise around the face such as hair, background, and other ambience conditions. To mitigate such issues and improve the performance of FER, we propose Enhanced Face Localization augmented Light Convolution Neural Network (EFL-LCNN). EFL-LCNN incorporates three phase pre-processing and Light CNN, a trimmed vGG16 model. Three phase pre-processing includes face detection, enhanced face region cropping for ambience noise removal and image enhancement using CLAHE for addressing illumination problems. Three phase pre-processing is followed by the implementation of Light CNN to improve FER performance with reduced complexity. The EFL-LCNN is rigorously tested on four publicly available benchmark datasets: JAFFE, CK, MUG and KDEF. It is observed from the empirical results that the EFL-LCNN boosted recognition accuracies significantly when compared with the state-of-the-art.

关键词： Convolutional Neural Networks Deep networks Facial emotion recognition Face localization Feature extraction image enhancement

来源：评论

学校读者我要写书评

暂无评论

Attention-Based image Caption Generation 5th

Attention-Based Image Caption Generation

引用

5th International Conference on Data Science, machine Learning and applications

作者： Manasa, M. Sowmya, D. Reddy, Y. Supriya Sreedevi, Pogula G Pulla Reddy Engn Coll Dept Comp Sci & Engn Data Sci Kurnool 5185007 India G Pulla Reddy Engn Coll Dept Comp Sci & Engn Kurnool 5185007 India Rajeev Gandhi Mem Coll Engn & Technol Dept Comp Sci & Engn Nandyal 518501 India

ISBN: (纸本)9789819780334;9789819780310;9789819780303

The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption generator can be incredibly useful for producing captions. In this study, we present a unique method for creating picture captions utilizing an attention mechanism that concentrates on pertinent areas of the image while it creates captions. On benchmark datasets, our model, which uses deep neural networks to extract picture attributes and produce captions, obtains state-of-the-art results, confirming the effectiveness of the attention mechanism in raising the caliber of the generated captions. We also offer a thorough evaluation of the performance of our approach and talk about potential future directions for enhancing image caption generation.

关键词： image Caption Computer vision Natural Language processing Attention Mechanism Performance

来源：评论

学校读者我要写书评

暂无评论

image Segmentation Techniques: Statistical, Comprehensive, Semi-Automated Analysis and an Application Perspective Analysis of Mathematical Expressions

引用

ARCHIvES OF COMPUTATIONAL METHODS IN ENGINEERING 2023年第1期30卷 457-495页

作者： Sakshi Kukreja, vinay Chitkara Univ Inst Engn & Technol Ludhiana Punjab India

Segmentation has been a rooted area of research having diverse dimensions. The roots of image segmentation and its associated techniques have supported computer vision, pattern recognition, image processing, and it holds variegated applications in crucial domains. To compile the vast literature on machine learning and deep learning-based segmentation techniques and proffer statistical, comprehensive, semi-automated, and application-specific analysis, which could contribute to the ongoing research. 16,674 studies have been filtered out from the pool of 22,088 studies collocated by executing a search string on the Scopus database. These studies are analyzed for their meta-data, comprehensive content and reviewed to identify key research areas using the topic modeling-based method (LDA). Also, the segmentation role for mathematical expression recognition has been fathomed out. IEEE is a ubiquitous name in the terms of the renowned publisher, reputed journal (IEEE Access), and most cited affiliation (#10,472). Three out of five extracted topic solutions by the LDA model be evidence of streaming research areas in image segmentation. Medical image processing, machine vision and Object Identification are the accentuated domains in the context. The streamlining of comprehensive analysis puts forth neural network-based approaches as a trend. Inquisition of segmentation techniques for mathematical expressions articulate neural-based segmentation techniques (CNN, RNN, LSTM) as preeminent segmentation techniques and geometrical features as focused features of the process. To sum up, the purpose of the current study is to summarize the best available research on image segmentation after synthesizing the results of an assorted set of studies.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

A Survey on Graph Neural Networks and Graph Transformers in Computer vision: A Task-Oriented Perspective

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2024年第12期46卷 10297-10318页

作者： Chen, Chaoqi Wu, Yushuang Dai, Qiyuan Zhou, Hong-Yu Xu, Mutian Yang, Sibei Han, Xiaoguang Yu, Yizhou Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China Chinese Univ Hong Kong Sch Sci & Engn Shenzhen 518172 Peoples R China Chinese Univ Hong Kong Future Network Intelligence Inst Shenzhen 518172 Peoples R China ShanghaiTech Univ Sch Informat Sci & Technol Shanghai 201210 Peoples R China Shanghai Engn Res Ctr Intelligent Vis & Imagi Shanghai 201210 Peoples R China

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

关键词： Task analysis Computer vision Three-dimensional displays Transformers Point cloud compression visualization videos Computer vision graph transformers graph neural networks medical image analysis point clouds and meshes vision and language

来源：评论

学校读者我要写书评

暂无评论

On-machine Wear Measurement for Milling Cutter Based on machine vision 5

On-machine Wear Measurement for Milling Cutter Based on Mach...

引用

5th International Conference on Mechatronics Technology and Intelligent Manufacturing (ICMTIM)

作者： Yu, Jiarui Zan, Tao Liu, Weibo Li, Yikun Peng, Junxi Lei, Qichang Beijing Univ Technol Coll Mech & Energy Engn Beijing Peoples R China Northeastern Univ Coll Informat Sci & Engn Shenyang Peoples R China Beijing Univ Technol Coll Beijing Dublin Int Beijing Peoples R China

ISBN: (纸本)9798350363272;9798350363265

As a key factor in the milling process, the wear status of the milling cutter has a significant impact on the machining quality of the workpiece. To detect wear on a milling machine efficiently and precisely, this paper presents the development of a milling machine wear detection system based on machine vision and digital image processing. The system including link mechanisms and industrial camera is designed for auxiliary localization and collection of on-machine images of milling cutter status. The image preprocessing method based on automatic threshold segmentation and Canny edge detection operator is proposed to identify the edge of cutter wear. The Maximum connected domains algorithm is used to screen the wear area of the milling cutter and the amount of wear is obtained based on a calibrated scaling method. Experimental results show that the proposed system is suitable for industrial use due to its rapid detection speed and strong recognition accuracy, which are desirable for engineering applications.

关键词： milling cutter wear detection machine vision image processing auxiliary localization mechanism

来源：评论

学校读者我要写书评

暂无评论

Enhancing crop yield estimation from remote sensing data: a comparative study of the Quartile Clean image method and vision transformer

引用

DISCOvER APPLIED SCIENCES 2024年第11期6卷 610页

作者： Thakkar, Manan vanzara, Rakeshkumar Ganpat Univ Comp Sci & Informat Technol Mehsana Gujarat India Ganpat Univ Informat Technol Mehsana Gujarat India

The use of high-altitude remote sensing (RS) data from aerial and satellite platforms presents considerable challenges for agricultural monitoring and crop yield estimation due to the presence of noise caused by atmospheric interference, sensor anomalies, and outlier pixel values. This paper introduces a "Quartile Clean image" pre-processing technique to address these data issues by analyzing quartile pixel values in local neighborhoods to identify and adjust outliers. Applying this technique to 20,946 Moderate Resolution Imaging Spectroradiometer (MODIS) images from 2002 to 2015, improved the mean peak signal-to-noise ratio (PSNR) to 40.91 dB. Integrating Quartile Clean data with Convolutional Neural Networks (CNN) models with exponential decay learning rate scheduling achieved RMSE improvements up to 5.88% for soybeans and 21.85% for corn, while Long Short-Term Memory (LSTM) models demonstrated RMSE reductions up to 11.52% for soybeans and 29.92% for corn using exponential decay learning rates. To compare the proposed method with state-of-the-art technique, we introduce the vision Transformer (viT) model for crop yield estimation. The viT model, applied to the same dataset, achieves remarkable performance without explicit pre-processing, with R2 scores ranging from 0.9752 to 0.9875 for soybean and 0.9540 to 0.9888 for corn yield estimation. The RMSE values range from 7.75086 to 9.76838 for soybean and 26.25265 to 34.20382 for corn, demonstrating the viT model's robustness. This research contributes by (1) introducing the Quartile Clean image method for enhancing RS data quality and improving crop yield estimation accuracy, and (2) comparing it with the state-of-the-art viT model. The results demonstrate the effectiveness of the proposed approach and highlight the potential of the viT model for crop yield estimation, representing a valuable advancement in processing high-altitude imagery for precision agriculture applications. Novel Quartile Clean image technique i

关键词： machine learning Deep learning Quartile clean image MODIS Remote sensing Data pre-processing Noisy feature handling vision transformer

来源：评论

学校读者我要写书评

暂无评论

Twinned attention network for occlusion-aware facial expression recognition

引用

machine vision AND applications 2025年第1期36卷 1-18页

作者： Devasena, G. vidhya, v. Indian Inst Informat Technol Dept Comp Sci & Engn Tiruchirappalli Tamilnadu India

Facial expression recognition (FER) is a tedious task in image processing for complex real-world scenarios that are captured under different lighting conditions, facial obstructions, and a diverse range of facial orientations. To address this issue, a novel Twinned attention network (Twinned-Att) is proposed in this paper for an efficient FER in occluded images. The proposed Twinned-Att network is designed in two separate modules: Holistic module (HM) and landmark centric module (LCM). The holistic module comprises of dual coordinate attention block (Dual-CA) and the Cross Convolution block (Cross-conv). The Dual-CA block is essential for learning positional, spatial, and contextual information by highlighting the most prominent characteristics in the facial regions. The Cross-conv block learns the spatial inter-dependencies and correlations to identify complex relationships between various facial regions. The LCM emphasizes smaller and distinct local regions while maintaining resilience against occlusions. vigorous experiments have been undertaken to improve the efficacy of the proposed Twinned-Att. The results produced by the Twinned-Att illustrate the remarkable responses which achieve the accuracies of 86.92%, 85.64%, 78.40%, 69.82%, 64.71%, 85.52%, and 85.83% for the datasets viz., RAF DB, FER PLUS, FER 2013, FED RO, SFEW 2.0, occluded RAF DB and occluded FER Plus respectively. The proposed Twinned-Att network is experimented with various backbone networks, including Resnet-18, Resnet-50, and Resnet-152. It consistently outperforms well and highlights its prowess in addressing the challenges of robust FER in the images captured in complex real-world environments.

关键词： Facial expression recognition Occluded images Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：