检索结果-内蒙古大学图书馆

9th International Conference on Signal and image processing (ICSIP)

作者： Kang, Naixin Yi, Pengfei Geng, Qingshu Dong, Jing Liu, Rui Wang, Ling Dalian Univ Sch Software Engn Key Lab Adv Design & Intelligent Comp Minist Educ Dalian Peoples R China

ISBN: (纸本)9798350350920

Human Action Recognition is one of the most applied research directions in the field of Computer vision, which is widely used in human-computer interaction, Augmented Reality (AR) technology, security monitoring, and other scenarios. However, due to the complexity of human action gestures, existing Human Action Recognition methods have certain deficiencies in dealing with variable human gestures and action information, and the accuracy needs to be improved. To improve the accuracy, We propose a multi-dimensional network model based on SC-LSTM(Skip-Connection + LSTM). First, a Temporal Feature Extraction Module is designed based on SC-LSTM, and a Spatial Feature Extraction Module is designed based on CNN and Multi-Attention Mechanism to extract potential human action features from both temporal and spatial dimensions, respectively. Then, a separate SC-LSTM classification network is utilized to process these spatio-temporal features to obtain the final HAR results. The experimental results show that compared to other algorithms, the present model can more fully utilize the information in the temporal dimension, and thus performs better in terms of HAR accuracy.

关键词： Human Action Recognition machine Learning Computer vision Human-Computer Interaction Security

来源：评论

学校读者我要写书评

暂无评论

LeCA: In-Sensor Learned Compressive Acquisition for Efficient machine vision on the Edge 23

LeCA: In-Sensor Learned Compressive Acquisition for Efficien...

引用

50th Annual International Symposium on Computer Architecture (ISCA)

作者： Ma, Tianrui Boloor, Adith Jagadish Yang, Xiangxing Cao, Weidong Williams, Patrick Sun, Nan Chakrabarti, Ayan Zhang, Xuan Washington Univ St Louis MO 63110 USA Univ Texas Austin Austin TX USA Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798400700958

With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly consumed, not by humans, but by downstream CV algorithms. However, capturing high-fidelity and high-resolution images is energy-intensive. It not only dominates the energy consumption of the sensor itself (i.e. in low-power edge devices), but also contributes to significant memory burdens and performance bottlenecks in the later storage, processing, and communication stages. In this paper, we systematically explore a new paradigm of in-sensor processing, termed "learned compressive acquisition" (LeCA). Targeting machine vision applications on the edge, the LeCA framework exploits the joint learning of a sensor autoencoder structure with the downstream CV algorithms to effectively compress the original image into low-dimensional features with adaptive bit depth. We employ column-parallel analog-domain processing directly inside the image sensor to perform the compressive encoding of the raw image, resulting in meaningful hardware savings, and energy efficiency improvements. Evaluated within a modern machine vision processing pipeline, LeCA achieves 4x, 6x, and 8x compression ratios prior to any digital compression, with minimal accuracy loss of 0.97%, 0.98%, and 2.01% on imageNet, outperforming existing methods. Compared with the conventional full-resolution image sensor and the state-of-the-art compressive sensing sensor, our LeCA sensor is 6.3x and 2.2x more energy-efficient while reaching a 2x higher compression ratio.

关键词： CMOS image sensor image compression autoencoder

来源：评论

学校读者我要写书评

暂无评论

An improved deep transfer learning approach to identify the human face mask in real-time considering the COVID-19 pandemic

引用

MULTIMEDIA TOOLS AND applications 2024年第7期83卷 21695-21743页

作者： Rusia, Mayank Kumar Singh, Dushyant Kumar Motilal Nehru Natl Inst Technol Allahabad Dept Comp Sci & Engn Prayagraj Uttar Pradesh India

Face recognition systems have always been in great demand for various security-based applications, authentication being one of them. Due to the continuous upsurge of the COVID-19 pandemic and emerging variants of coronaviruses, wearing a face mask has been made mandatory in many countries, especially in crowded places. This situation poses significant challenges to the face recognition systems in recognizing the person's identity with face mask-based partial occlusion. Therefore, an update is needed in the traditional face recognition systems to ascertain whether the person is wearing a mask. This manuscript offers a novel Fiducial Point-based Non-local Means De-Noising (FP-NMDN) method for data pre-processing. This manuscript also proposed two comprehensive feature extraction mechanisms, i.e., transfer learning-based models and a customized Convolutional Neural Network (CNN) model. The experiment is conducted for five popular baseline architectures viz. Visual Geometry Group (VGG16), Residual Network (ResNet50), MobileNetV2, InceptionV3, and EfficientNetB0 with fine-tuning of hyperparameters and a customized CNN architecture. A modified dense network with a new classification layer has been introduced to obtain high classification results in less inference time. The datasets are collected from four valid sources;Kaggle Medical Masked Face, Real-world Masked Face, Face Mask, and open-source datasets that have been resynthesized based on predefined experimental criteria named Dataset-I and other existing datasets as Dataset-ii. The experimental results reveal that our optimized transfer learning-based ResNet50 model achieves the best accuracy of 99.68% and 99.67% for Dataset-I and Dataset-ii, respectively. Besides, our customized CNN model outperforms other recent methods regarding overhead and inference time.

关键词： COVID-19 Partial occlusion Biometrics Computer vision Transfer learning image processing

来源：评论

学校读者我要写书评

暂无评论

Scale-adaptive gesture computing: detection, tracking and recognition in controlled complex environments

引用

machine vision AND applications 2024年第4期35卷 75-75页

作者： Kirupakaran, Anish Monsley Laskar, Rabul Hussain Natl Inst Technol Silchar Dept ECE Cachar 788010 Assam India

Complexity intensifies when gesticulations span various scales. Traditional scale-invariant object recognition methods often falter when confronted with case-sensitive characters in the English alphabet. The literature underscores a notable gap, the absence of an open-source multi-scale un-instructional gesture database featuring a comprehensive dictionary. In response, we have created the NITS (gesture scale) database, which encompasses isolated mid-air gesticulations of ninety-five alphanumeric characters. In this research, we present a scale-centric framework that addresses three critical aspects: (1) detection of smaller gesture objects: our framework excels at detecting smaller gesture objects, such as a red color marker. (2) Removal of redundant self co-articulated strokes: we propose an effective approach to eliminate redundant self co-articulated strokes often present in gesture trajectories. (3) Scale-variant approach for recognition: to tackle the scale vs. size ambiguity in recognition, we introduce a novel scale-variant methodology. Our experimental results reveal a substantial improvement of approximately 16% compared to existing state-of-the-art recognition models for mid-air gesture recognition. These outcomes demonstrate that our proposed approach successfully emulates the perceptibility found in the human visual system, even when utilizing data from monophthalmic vision. Furthermore, our findings underscore the imperative need for comprehensive studies encompassing scale variations in gesture recognition.

关键词： Gesture computing Computer vision Deep learning Scale variance Human-computer interaction

来源：评论

学校读者我要写书评

暂无评论

Optimum Selection of image Object Attributes for Object-Based image Analysis and High Classification Accuracy 8th

Optimum Selection of Image Object Attributes for Object-Base...

引用

8th International Conference on Computer vision and image processing (CVIP)

作者： Khadanga, Ganesh Jain, Kamal IIT Roorkee Roorkee 247667 Uttar Pradesh India

ISBN: (纸本)9783031581731;9783031581748

Object-based image analysis (OBIA) is extensively used for the classification of High-Resolution Satellite imagery (HRSI). The various attributes of the image segments like spectral, spatial and textural, can be generated for analysis and classification purposes. However, the use of all these attributes may not lead to attaining high classification accuracy. Experiments have shown that, a suitable set of these features need to be identified for faster and accurate classification of imageries. The filter based methods likeChi-Square, Information-gain and ReliefF are extensively used for identification and ranking the best set of parameters. The random tree based Boruta machine learning feature ranking method is also used in identifying the feature ranking along with the above algorithms. Subsequently, a learner is fused with a filter and the resultant receiver operating characteristic (ROC) plot of the model has been used to identify the best accuracy and the minimal set of attributes for identifying an individual feature like roads, trees, grass, buildings and shadow. The best set of parameters for a class is identified by the best ROC plot. The best parameters are identified from Boruta feature analysis. The results indicate that the identified smaller feature set helps in enhancing classification accuracy.

关键词： HRSI OBIA Boruta Feature Selection SVM

来源：评论

学校读者我要写书评

暂无评论

Open-Set Object Detection By Aligning Known Class Representations

Open-Set Object Detection By Aligning Known Class Representa...

引用

IEEE/CVF Winter Conference on applications of Computer vision (WACV)

作者： Sarkar, Hiran Chudasama, Vishal Onoe, Naoyuki Wasnik, Pankaj Balasubramanian, Vineeth N. Sony Res India Bengaluru India Indian Inst Technol Hyderabad Hyderabad Telangana India

ISBN: (纸本)9798350318920;9798350318937

Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contrast, we propose a new semantic clustering-based approach to facilitate a meaningful alignment of clusters in semantic space and introduce a class decorrelation module to enhance inter-cluster separation. Our approach further incorporates an object focus module to predict objectness scores, which enhances the detection of unknown objects. Further, we employ i) an evaluation technique that penalizes low-confidence outputs to mitigate the risk of misclassification of the unknown objects and ii) a new metric called HMP that combines known and unknown precision using harmonic mean. Our extensive experiments demonstrate that the proposed model achieves significant improvement on the MS-COCO & PASCAL VOC dataset for the OSOD task.

关键词： Algorithms Algorithms and algorithms formulations image recognition and understanding machine learning architectures

来源：评论

学校读者我要写书评

暂无评论

Padding-Free Convolution Based on Preservation of Differential Characteristics of Kernels 22

Padding-Free Convolution Based on Preservation of Differenti...

引用

22nd IEEE International Conference on machine Learning and applications, ICMLA 2023

作者： Leng, Kuangdai Thiyagalingam, Jeyan Science and Technology Facilities Council Scientific Computing Department Didcot United Kingdom

ISBN: (纸本)9798350345346

Convolution is a fundamental operation in image processing and machine learning. Aimed primarily at maintaining image size, padding is a key ingredient of convolution, which, however, can introduce undesirable boundary effects. We present a non-padding-based method for size-keeping convolution based on the preservation of differential characteristics of kernels. The main idea is to make convolution over an incomplete sliding window 'collapse' to a linear differential operator evaluated locally at its central pixel, which no longer requires information from the neighbouring missing pixels. While the underlying theory is rigorous, our final formula turns out to be simple: the convolution over an incomplete window is achieved by convolving its nearest complete window with a transformed kernel. This formula is computationally lightweight, involving neither interpolation or extrapolation nor restrictions on image and kernel sizes. Our method favours data with smooth boundaries, such as high-resolution images and fields from physics. Our experiments include: i) filtering analytical and non-analytical fields from computational physics and, ii) training convolutional neural networks (CNNs) for the tasks of image classification, semantic segmentation and super-resolution reconstruction. In all these experiments, our method has exhibited visible superiority over the compared ones. © 2023 IEEE.

关键词： computer vision convolutional neural network differential operator machine learning padding

来源：评论

学校读者我要写书评

暂无评论

Unsupervised neural network-based image restoration framework for pattern fidelity improvement and robust metrology

引用

JOURNAL OF MICRO-NANOPATTERNING MATERIALS AND METROLOGY-JM3 2023年第3期22卷

作者： Du, Zijian Pu, Lingling Wei, Paul Yuan, Rui Kim, Jeeeon Tan, Jiaoying ASML Silicon Valley San Jose CA 95131 USA

Background: Scanning electron microscope (SEM) images acquired by E-beam tools for inspection and metrology applications are usually degraded by blurring and additive noises. Blurring sources include the intrinsic point spread function of optics, lens aberration, and potential motion blur caused by the wafer stage movements during the image acquisition process. Noise sources include shot noise, quantization noise, and electronic read-out noise. image degradation caused by blurring and noise usually leads to noisy, inaccurate metrology results. For low-dosage metrology applications, metrology algorithms often fail to obtain successful measurements due to elevated levels of blurring and noise. image restoration and enhancement are necessary as preprocessing steps to obtain meaningful metrology results. Initial success was obtained by applying neural network-based framework to drastically improve image quality and metrology precision as is demonstrated in the previous ***: We aim to provide more details on the neural network model architecture, model regularization, and training dynamics to better understand the model's behavior. We also analyze the effect of image restoration on key metrology performances such as line edge roughness and mean critical dimension of the ***: Non-machine learning-based image quality enhancement methods fail to restore low-quality SEM images to a satisfactory degree. More recent convolutional neural networks and vision transformer-based, supervised deep learning models have achieved superior performance in various low-level image processing and computer vision tasks. Nevertheless, they require a huge amount of training data that contain high-quality ground truth images. Unfortunately, high-quality ground truth images for low-dosage SEM images do not exist. Instead, we use self-supervised U-Net combined with a fully connected network (FCN) to recover low-dosage images without the need for ground truth training images. The

关键词： deep learning U-Net convolutional neural networks fully connected network image restoration deblurring self-supervised model generative modeling metrology

来源：评论

学校读者我要写书评

暂无评论

Toward quantifying the real-versus-synthetic imagery data 'reality gap': analysis and practical applications 2

Toward quantifying the real-versus-synthetic imagery data 'r...

引用

Conference on Synthetic Data for Artificial Intelligence and machine Learning - Tools, Techniques, and applications ii

作者： Reinhardt, Colin N. Brockman, Sarah Blue, Rusty Clipp, Brian Hoogs, Anthony US Navy Informat Warfare Ctr San Diego CA 92152 USA Univ Washington Elect & Comp Engn Dept Seattle WA 98195 USA Kitware Inc Chapel Hill NC USA

ISBN: (纸本)9781510673892;9781510673885

Synthetically-generated imagery holds the promise of being a panacea for the challenges of real world datasets. Yet it continues to be frequently observed that deep learning model performance is not as good when trained with synthetic data versus real measured imagery. In this study we present analyses and illustration of the use of several statistical metrics, measures, and visualization tools based on the distance and similarity between real and synthetic data empirical distributions in the latent feature embedding space, which provide a quantitative understanding of the relevant image-domain distribution discrepancy issues hampering the generation of performant simulated datasets. We also demonstrate the practical applications of these tools and techniques in a novel study comparing latent space embedding vector distributions of real, pristine synthetic, and synthetic modified by physics-based degradation models. The results may assist deep learning practitioners and synthetic imagery modelers with evaluating latent space embedding distributional dissimilarity and improving model performance when using simulation tools to generate synthetic imagery training data.

关键词： machine learning deep learning synthetic data domain gap reality gap sim-to-real image classification computer vision CV training distribution latent space feature embedding vectors

来源：评论

学校读者我要写书评

暂无评论

Towards stronger illumination robustness of local feature detection and description based on auxiliary learning

引用

SIGNAL image AND VIDEO processing 2024年第SUPPL 1期18卷 575-584页

作者： Bian, Houqin Fan, Shihe Zhang, Haolin Qin, Lunming Cui, Haoyang Wang, Xi Shanghai Univ Elect Power Coll Elect & Informat Engn Shanghai 201306 Peoples R China Xi An Jiao Tong Univ Natl Engn Res Ctr Visual Informat & Applicat Natl Key Lab Human Machine Hybrid Augmented Intell Xian Peoples R China Xi An Jiao Tong Univ Inst Artificial Intelligence & Robot Xian Peoples R China Beijing Jiaotong Univ Sch Elect & Informat Engn Beijing 100044 Peoples R China

Local feature detection and description play a crucial role in various computer vision tasks, including image matching. Variations in illumination conditions significantly affect the accuracy of these applications. However, existing methods inadequately address this issue. In this paper, a novel algorithm based on illumination auxiliary learning module (IALM) is introduced. Firstly, a new local feature extractor named illumination auxiliary Superpoint (IA-Superpoint) is established, based on the integration of IALM and Superpoint. Secondly, illumination-aware auxiliary training focuses on capturing the effects of illumination variations during feature extraction through tailored loss functions and jointly learning mechanisms. Lastly, in order to evaluate the illumination robustness of local features, a metric is proposed by simulating various illumination disturbances. Experiments on HPatches and RDNIM datasets demonstrate that the performance of local feature extraction is greatly improved by our method. Compared to the baseline method, the proposed method exhibits improvements in both mean matching accuracy and homography estimation.

关键词： Deep learning methods Local feature Auxiliary learning image matching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：