检索结果-内蒙古大学图书馆

Block-based compressive imaging with a swin transformer

OPTICS EXPRESS 2025年第5期33卷 9587-9603页

作者： Zhao, Sheng-Jie Yin, Zhi-yu Yu, Si-Bo Wang, Wei Yu, Hong-zhu Li, Wen-hao Tao, Chen Chinese Acad Sci Changchun Inst Opt & Fine Mech & Phys Changchun 130033 Jilin Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Natl Engn Res Ctr Diffract Gratings Mfg & Applicat Changchun 130033 Jilin Peoples R China

Block-based compressive imaging (BCI) is based on the compressive sensing principle, which uses a spatial light modulator and a low-resolution detector to perform parallel high-speed sampling, followed by super-resolution algorithm to reconstruct target image. When compared with traditional compressive imaging, BCI reduces the computational effort but introduces block artifacts. This paper proposes a data-driven deep neural network based on the swin transformer called SwinBCI, which introduces the local attention and shifted window mechanisms to improve the target image reconstruction quality. By using the dataset to train the model to obtain priori knowledge and performing graphics processing unit-accelerated computation, the computation time is greatly reduced to realize real-time BCI. We achieve better reconstruction performances with cake cutting-Hadamard matrix sampling than with Bernoulli matrix sampling. Comparison with three other classical compressed sensing reconstruction methods on four common image datasets and images acquired experimentally using the actual BCI system show that SwinBCI achieves faster high-quality reconstruction at each sampling rate.

关键词： deep learning Imaging systems Machine vision Neural networks Spatial light modulators Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

real-time denoising of ultrasound images based on deep learning

引用

MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING 2022年第8期60卷 2229-2244页

作者： Cammarasana, Simone Nicolardi, Paolo Patane, Giuseppe CNR IMATI Via De Marini 6 Genoa Italy Esaote SpA Via E Melen 7 Genoa Italy

Ultrasound images are widespread in medical diagnosis for muscle-skeletal, cardiac, and obstetrical diseases, due to the efficiency and non-invasiveness of the acquisition methodology. However, ultrasound acquisition introduces noise in the signal, which corrupts the resulting image and affects further processing steps, e.g. segmentation and quantitative analysis. We define a novel deep learning framework for the real-time denoising of ultrasound images. Firstly, we compare state-of-the-art methods for denoising (e.g. spectral, low-rank methods) and select WNNM (Weighted Nuclear Norm Minimisation) as the best denoising in terms of accuracy, preservation of anatomical features, and edge enhancement. Then, we propose a tuned version of WNNM (tuned-WNNM) that improves the quality of the denoised images and extends its applicability to ultrasound images. Through a deep learning framework, the tuned-WNNM qualitatively and quantitatively replicates WNNM results in real-time. Finally, our approach is general in terms of its building blocks and parameters of the deep learning and high-performance computing framework;in fact, we can select different denoising algorithms and deep learning architectures.

关键词： image denoising deep learning real-time denoising Biomedical data Ultrasound images

来源：评论

学校读者我要写书评

暂无评论

learning Observers' Gaze Dynamics: An Efficient and Mobile Sport Scenery Recognition Pipeline

引用

IEEE ACCESS 2025年 13卷 53188-53202页

作者： Lv, Huiting Gao, Jiashun Li, Yu Li, Hongcheng Xiamen Ocean Vocat Coll Xiamen 361000 Peoples R China

This study addresses the challenge of semantically sorting complex scenes in a mobile environment by processing multimodal visual inputs to create detailed landscape representations. Central to the approach is a streamlined multi-layer hierarchical model that mimics human attention dynamics, using the BING objectness metric to quickly identify significant areas by recognizing objects across different scales and contexts. To enhance feature extraction, time-sensitive and manifold-guided selectors are employed to prioritize high-quality visual features, while a low-rank active learning (LAL) algorithm simulates human-like focus on key visual zones, specifically in sports scenes. The model generates a Gaze Shift Path (GSP), which directs the collection of composite CNN features, ultimately classifying the scenes into distinct landscape types using a support vector machine (SVM). Experimental results on seven scene image sets have shown that our method outperforms the others by 2%similar to 5% . Additionally, our calculated deep GSP features can greatly facilitate image clustering. Last but not least, our visualized GSPs are over 90% consistent with real-world human gaze behaviors, which explains the competitiveness of our method.

关键词： Visualization Feature extraction Accuracy Sports Scene classification Active learning Support vector machines Semantics Videos Optical flow Gaze dynamics observer visual categorization feature selector low-rank

来源：评论

学校读者我要写书评

暂无评论

A Low-Cost Vegetable Quality Assessment System Based on Microscopy images in deep learning Edge Computing: A Pilot Study on Potato Tuber

引用

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS 2024年第3期70卷 6343-6353页

作者： Biswas, Sumona Barma, Shovan Indian Inst Informat Technol Guwahati Dept Elect & Commun Engn Gauhati 713103 India

This work details design and development of a microscopy image-based vegetable quality assessment system (Prototype) by adopting deep learning (DL) technique on edge device. Current automated machine learning methods primarily utilize outer-surface images of vegetables/fruits, often lacking in precise quantification of nutrient content such as carbohydrates, minerals, vitamins, etc. Indeed, such nutrient ingredients can be assessed by examining micro-level cell attributes of microscopy images in DL framework. However, vegetable quality detection based on microscopy/DLs on resource-constrained edge devices poses significant challenges. To address these problems, a portable, cost-effective, efficient, and real-time prototype has been realized. It involves configuring a microscopy image generation module using low-cost Foldscope lens coupled with smartphones and on-device analysis by designing a new lightweight DL architecture and segmentation algorithm. The analysis is executed via a smartphone application, ensuring advantages like bandwidth and energy efficiency, user privacy, local processing without external servers. For system validation, a pilot study has been conducted on the widely consumed potato tuber, focusing on the assessment of starch presence as a key quality metric. The system successfully assesses cell attributes, i.e., starch quantity of 10-25% in similar to 24s, which is very much consistent. In a comparative study, the network outperforms the existing state-of-the-art lightweight networks by achieving the highest recognition accuracy upto 88.8% and F1-score 85.83 with lesser parameters (1.5M) and FLOPs (118M). Thus, the study demonstrates its applicability for vegetable quality assessment in an easy, affordable, and effective way. Further, the proposed idea can be extended to other vegetables/fruits.

关键词： deep learning edge computing Foldscope lightweight network potato smartphone vegetable quality

来源：评论

学校读者我要写书评

暂无评论

Improving human detection in the presence of cartoon characters using retrained deep learning models

引用

SIGNAL image AND VIDEO processing 2025年第6期19卷 1-16页

作者： Tiong, Wei Jie Hum, Yan Chai Lee, Ying Loong Lai, Khin Wee Yau, Kok Lim Yap, Wun-She Tee, Yee Kai Univ Tunku Abdul Rahman Lee Kong Chian Fac Engn & Sci Kajang Malaysia Univ Malaya Fac Engn Dept Biomed Engn Kuala Lumpur Malaysia

When computer vision techniques are used to identify humans in public places, the presence of cartoon characters can often result in false detections as humans, complicating the task of human recognition and hindering the application of such technology in public. This paper aims to minimize the false detection rate by retraining the pretrained human detection models using transfer learning. The retraining process involves the utilization of a dataset consisting of two classes: humans and cartoon characters, with 11,000 images per class. The instances in the dataset are carefully labeled before splitting into training, validation, and testing sets. Each selected model is retrained, evaluated, and compared to the commonly used pretrained human detection models. The results reveal that the retrained YOLOv8n model performs the best for real-time application;it achieves 96.97% accuracy, 99.52% precision, 97.42% recall, 98.46% F1 score and a false detection rate of 8.16% yet has a small model size of 6.09 MB only. In addition, it outperforms all the pretrained models in terms of accuracy (by 5.38%) and F1 score (by 2.85%) in reducing the false detection rate of cartoon characters as humans. This has great implications in human counting and customer analytics. However, false detections of cartoons as humans still exist in either the pretrained or retrained models. More sophisticated models such as Vision Transformer will be studied in the future to minimize or completely eliminate the false detections since this can be done easily by a human being.

关键词： Human detection Cartoon characters deep learning Customer analysis Computer vision

来源：评论

学校读者我要写书评

暂无评论

IOA-YOLO: detection of illegal overhead cables based on linear enhancement and dual perception

引用

SIGNAL image AND VIDEO processing 2025年第8期19卷 1-11页

作者： Wu, Yujie Dai, Jiguang Zhang, Tengda Ma, Zheng Liaoning Tech Univ Sch Geomat Yulong St Fuxin 123000 Liaoning Provin Peoples R China

real-time and accurate detection of overhead cables violating street-level regulations is crucial for smart city management. Existing methods face challenges like slender target nature, occlusion, multi-scale variability, and high inter-class similarity. This paper presents the IOA-YOLO model. It incorporates a Line-Target Enhancement Module (LTEM) for better slender object feature extraction, a Global-Local Dual Perception Module (GDPM) to boost robustness against occlusion, and a Hybrid Iterative Detection Head (HIDH) for multi-scale feature extraction using intra-and inter-layer information. An uncertainty-aware loss function (UAL) is introduced to suppress background interference and reduce inter-class similarity impact. Experiments on a custom dataset show IOA-YOLO outperforms existing methods, achieving 93.94% precision and 88.17% recall, with a good balance between accuracy and efficiency. It also adapts well to various urban environments and lighting conditions, demonstrating robust stability and great real-world deployment potential.

关键词： Disorderly overhead cables along the street deep learning Street view images Objection detection

来源：评论

学校读者我要写书评

暂无评论

Human Action Recognition (HAR) using image processing on deep learning

Human Action Recognition (HAR) using Image Processing on Dee...

引用

IEEE International Conference on Control System, Computing and Engineering (ICCSCE)

作者： Ahmad Puad Ismail Muhammad Afiq Bin Azahar Nooritawati Md Tahir Kamarulazhar Daud Nazirah Mohamat Kasim Electrical Engineering Studies Universiti Teknologi MARA Cawangan Pulau Pinang Malaysia Electrical Engineering Studies Universiti Teknologi MARA Shah Alam Malaysia

The advancement of artificial intelligence (AI) has bought many advances to human society as a whole. By using daily activities and integrating the technology from the fruits of AI, we can manage to gain further access to knowledge we can only begin to imagine. In identifying human action recognition (HAR); processing photos and videos to discern whether a human is present, then mapping the subject classified, which lastly determines the action being carried out is the objective. To achieve this, various steps are taken and careful approach is required, with the extensive amount of research, numerous troubleshooting and experimentation is required. The AI architecture has to learn from dataset collected for it to discern the identification of action properly. HAR is achieved by using Python code using real-time webcam feed. Human pose detection library known as MediaPipe Pose Detection detects human anatomy from input through joints key-points. MediaPipe algorithm that extract features in x-y-z axis with visibility (four variables) and the extracted data is trained using CNN-LSTM based on the trained and tested algorithm classifier model. The output obtained produced an RGB-skeleton and an action label on the detected subject as standing, waving, walking and sitting, has yielded good results.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Transformer-based real-time Focus Detection Technique for Wide-Field Interferometric Microscopy 31

A Transformer-based Real-Time Focus Detection Technique for ...

引用

31st IEEE Conference on Signal processing and Communications Applications (SIU)

作者： Polat, Can Gungor, Alper Yorulmaz, Mustafa Kizilelma, Bilal Cukur, Tolga Bilkent Univ Elekt & Elekt Muhendisligi Bolumu Ankara Turkiye Wavebreak Media Ltd Arastirma & Gelistirme Birimi Cork Ireland Aselsan Arastirma Merkezi Ankara Turkiye

ISBN: (纸本)9798350343557

Wide-field interferometric microscopy (WIM) has been utilized for visualization of individual biological nanoparticles with high sensitivity. However, the image quality is highly affected by the focusing of the image. Hence, focus detection has been an active research field within the scope of imaging and microscopy. To tackle this issue, we propose a novel convolution and transformer based deep learning technique to detect focus in WIM. The method is compared to other focus detecton techniques and is able to obtain higher precision with less number of parameters. Furthermore, the model achieves real-time focus detection thanks to its low inference time.

关键词： deep learning focus detection convolution transformer microscopy real-time

来源：评论

学校读者我要写书评

暂无评论

Twin-stage Unet-like network for single image deraining

引用

SIGNAL image AND VIDEO processing 2024年第2期18卷 1285-1293页

作者： Zhou, Weina Wang, Xiu Shanghai Maritime Univ Coll Informat Engn Shanghai Peoples R China

The performance of visual processing is commonly constrained in extreme outside weather such as heavy rain. Rain streaks may substantially damage image optical quality and impact image processing in many scenarios. Thus, it has practical application value in researching the problem of single image rain removal. However, removing rain streaks from a single image is a challenging task. Although end-to-end learning approaches based on convolutional neural networks have lately made significant progress on this task, most existing methods still cannot perform deraining well. They fail to process the details of the background layer, resulting in the loss of certain information. To address this issue, we propose a single image deraining network named twin-stage Unet-like network (TUNet). Specifically, a reconstitution residual block (RRB) is presented as the basic structure of encoder-decoder to obtain more spatial contextual information for extracting rain components. Then, a residual sampling module (RSM) is introduced to perform downsampling and upsampling operations to preserve residual properties in the structure while obtaining deeper image features. Finally, the convolutional block attention module (CBAM) is adopted to fuse shallow and deep features of the same size in the model. Extensive experiments on five publicly synthetic datasets and a real-world dataset demonstrate that our proposed TUNet model outperforms the state-of-the-art deraining approaches. The average PSNR value of TUNet is 0.41 dB higher than the state-of-the-art method (OSAM-Net) on synthetic datasets.

关键词： Deraining Residual block Instance normalization Convolutional block attention module

来源：评论

学校读者我要写书评

暂无评论

Generating image Captions based on deep learning and Natural language processing 8

Generating Image Captions based on Deep Learning and Natural...

引用

8th IEEE International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), ICRITO 2020

作者： Sehgal, Smriti Sharma, Jyoti Chaudhary, Natasha Amity University Dept. of Computer Science and Engineering Noida India

ISBN: (纸本)9781728170169

This model enables an individual to input an image and output a description for the same. The research paper makes use of the functionalities of deep learning and NLP (Natural Language processing). image Caption Generation is an important task as it allows us automate the task of generating captions for any image. This functionality enables us to easily organize files without paying heed to the task of captioning. It is also important for making dynamic web pages. This paper is for people who are visually impaired or suffer from short sightedness. So, rather than looking at an image with trouble they can easily read the caption generated by this model in a larger format. It can also be used to give description of a video in real time on later implementation for a video. © 2020 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：