检索结果-内蒙古大学图书馆

A Video Frame Extrapolation Scheme Using deep learning-Based Uni-Directional Flow Estimation and Pixel Warping

IEEE ACCESS 2023年 11卷 105885-105891页

作者： Ban, Tae-Won Gyeongsang Natl Univ Dept Intelligent Commun Engn Jinju 52828 Gyeongsangnam South Korea

This paper investigates video frame extrapolation, which can predict future frames from current and past frames. Although there have been many studies on video frame extrapolation in recent years, most of them suffer from the unsatisfactory image quality of the predicted frames such as severe blurring because it is difficult to predict the movement of future pixels for multi-modal video frames, especially with fast changing frames. An additional process such as frame alignment or recurrent prediction can improve the quality of the predicted frames, but it hinders real-time extrapolation. Motivated by the significant progress in video frame interpolation using deep learning-based flow estimation, a simplified video frame extrapolation scheme using deep learning-based uni-directional flow estimation is proposed to reduce the processing time compared to conventional video frame extrapolation schemes without compromising the image quality of the predicted frames. In the proposed scheme, the uni-directional flow is first estimated from the current and past frames through a flow network consisting of four flow blocks and the current frame is forward-warped through the estimated flow to predict a future frame. The proposed flow network is trained and evaluated using the Vimeo-90K triplet dataset. The performance of the proposed scheme is analyzed using the trained flow network in terms of prediction time as well as the similarity between predicted and ground truth frames such as the structural similarity index measure and mean absolute error of pixels, and compared to that of the state-of-the-art schemes such as Iterative and cycleGAN schemes. Extensive experiments show that the proposed scheme improves prediction quality by 2.1% and reduces prediction time by 99.7% compared to the state-of-the-art scheme.

关键词： Video frame extrapolation video frame prediction flow estimation flow network deep learning

来源：评论

学校读者我要写书评

暂无评论

Singular-value Gain Compensation: Robust and efficient GPR preprocessing method enhancing zero-shot underground object segmentation by Segment-Anything Model

引用

NDT & E INTERNATIONAL 2025年 154卷

作者： Chen, Jingzi Mizutani, Tsukasa Univ Tokyo Inst Ind Sci 4-6-1 Komaba Meguro Tokyo 1138654 Japan Univ Tokyo Sch Engn 7-3-1 Hongo Bunkyo City Tokyo 1538505 Japan

This paper introduces Singular-value Gain Compensation (SGC), a robust preprocessing method for Ground Penetrating Radar (GPR) that integrates Singular Value Decomposition (SVD) and time Gain Compensation (TGC). SGC effectively enhances the signal-to-noise ratio while maintaining weak signal integrity, facilitating the application of pretrained zero-shot segmentation models. Through extensive evaluations using simulated and real-world data, SGC demonstrates superior performance in image quality and segmentation accuracy compared to traditional methods, showing the improvements of +3.1 dB in PSNR and 23% in segmentation's IoU in complex simulated scenarios. It also shows 20% and 14% improvements in pipe and void segmentations on real-world data. Additionally, SGC is computationally efficient, reducing both time and memory requirements, making it practical for large-scale infrastructure assessments. The method's efficacy in enhancing GPR image analysis without extensive computational resources marks a significant advancement in ground penetrating radar preprocessing and provide more possibilities for future research in the downstream tasks combining with recent deep learning models.

关键词： Non-destructive testing Ground penetrating radar Digital signal processing Infrastructure maintenance Subsurface sensing

来源：评论

学校读者我要写书评

暂无评论

Enhanced YOLOv8 with attention mechanisms for accurate detection of colorectal polyps

引用

BIOMEDICAL SIGNAL processing AND CONTROL 2025年 100卷

作者： Wang, Shuangyuan Lin, Shengmao Sun, Fujia Li, Xiaobo Univ Shanghai Sci & Technol Sch Mech Engn Shanghai 200093 Peoples R China Shanghai Jiao Tong Univ Renji Hosp Med Sch Shanghai 200025 Peoples R China

Colorectal cancer (CRC) is one of the most common and deadly cancers in the world, and most cases arise from polyps. Colonoscopy is a widely recognized and effective method for polyp diagnosis. However, clinical diagnosis has a high rate of missed polyps. Despite the capacity of deep learning methods to enhance the detection rate by extracting diverse features of polyps, the real-time performance, error rate, and misidentification ratio in actual clinical diagnosis have yet to meet the criteria for practical utilization. Here, we propose an improved structure for accurate polyp detection by enhancing the YOLOv8 algorithm to overcome these obstacles. Firstly, we introduce an enhanced Reverse Attention Mechanism Channel (RA-S) module to improve the algorithm's detection performance by fusing global feature information with local details of the image. Then, we integrate an attention mechanism into a Path Aggregation Network (PAnet) to improve the algorithm's ability to fuse multiscale features to adapt to the variations in polyps. Finally, the proposed method was validated using the ETISLARIB dataset, which was not part of the training data. The proposed method achieved high precision (92.1 %), recall (84.5 %), and F1 (88.1 %) on the public ETIS-LARIB dataset, showcasing the robust detection performance and generalization ability of the proposed method.

关键词： Polyp detection deep learning YOLOv8 Attention mechanism Colorectal cancer

来源：评论

学校读者我要写书评

暂无评论

real-time deployment of BI-RADS breast cancer classifier using deep-learning and FPGA techniques

引用

JOURNAL OF real-time image processing 2023年第4期20卷 80页

作者： Maria, H. Heartlin Kayalvizhi, R. Malarvizhi, S. Venkatraman, Revathi Patil, Shantanu Kumar, A. Senthil SRM Inst Sci & Technol Dept Elect & Commun Chennai 603203 India SRM Inst Sci & Technol Dept Networking & Commun Engn Chennai 603203 India SRM Inst Sci & Technol Dept Translat Med & Res Chennai 603203 India SRM Inst Sci & Technol Dept Radiodiag Chennai 603203 India

Breast cancer is commonly recognized as the second most frequent malignancy in women worldwide. Breast cancer therapy includes surgical surgery, radiation therapy, and medication which can be exceedingly successful, with 90% or higher survival rates, especially when the condition is discovered early. This work is one such approach for early detection of breast cancer relying on the BI-RADS score. In this regard, a computer-aided-diagnosis system based on a bespoke Digital Mammogram Diagnostic Convolutional Neural Network (DMD-CNN) model that can aid in the categorization of mammogram breast lesions is proposed. Furthermore, a PYNQ-based acceleration through the Artix 7 FPGA is employed for deployment of DMD-CNN model's hardware acceleration platform which is the first of its kind for breast cancer, yielding a performance accuracy of 98.2%, the proposed model exceeded the state-of-the-art approach. The comparative analysis performed in the study has shown that the proposed method has resulted in a 4% increase in accuracy and a good recognition rate of 96% when compared to the existing model. A k-fold cross-validation (k = 5, 7, 9 the reported accuracy score values are 96.2%, 97.5% and 98.1%, respectively) approach was used to test and assess the integrated system. Extensive testing using mammography datasets was carried out to determine the increased performance of the suggested approach. Experiments reveal that when compared to the DMD-CNN model acceleration to GPU, the suggested solution not only optimizes resource utilization but also decreases power consumption to 3.12 W. Hardware acceleration through FPGA resulted in processing and analyzing nearly 91 images in a second where a single image will be processed using CPU.

关键词： FPGA deep learning CNN Breast cancer Bi-rads Mammogram

来源：评论

学校读者我要写书评

暂无评论

The Application of Self-Attention Mechanism in real-time Micro-Expression Detection: A Case Study of rt-MESNet 2

The Application of Self-Attention Mechanism in Real-Time Mic...

引用

2nd IEEE International Conference on image processing and Computer Applications, ICIPCA 2024

作者： Xu, Chuhuan Ningbo University of Technology Department of Statistics and Data Science Ningbo China

ISBN: (纸本)9798350360240

Micro-expressions, fleeting and subtle facial expressions, possess significant application potential. However, their brief duration, low intensity, and localized motion pose challenges for traditional detection methods. Recent advancements in deep learning have shown promise in micro-expression detection, yet effectively capturing the spatiotemporal features remains a hurdle. The self-attention mechanism, adept at learning feature dependencies, has demonstrated effectiveness in various visual tasks. This paper provides a scholarly review of applying the self-attention mechanism in real-time micro-expression detection, illustrated by the enhanced model, rt-MESNet, and discusses prospective avenues for future research. © 2024 IEEE.

关键词： deep learning

来源：评论

学校读者我要写书评

暂无评论

DLA-E: a deep learning accelerator for endoscopic images classification

引用

JOURNAL OF BIG DATA 2023年第1期10卷 76页

作者： Bolhasani, Hamidreza Jassbi, Somayyeh Jafarali Sharifi, Arash Islamic Azad Univ Dept Comp Engn Sci & Res Branch Tehran Iran

The super power of deep learning in image classification problems have become very popular and applicable in many areas like medical sciences. Some of the medical applications are real-time and may be implemented in embedded devices. In these cases, achieving the highest level of accuracy is not the only concern. Computation runtime and power consumption are also considered as the most important performance indicators. These parameters are mainly evaluated in hardware design phase. In this research, an energy efficient deep learning accelerator for endoscopic images classification (DLA-E) is proposed. This accelerator can be implemented in the future endoscopic imaging equipments for helping medical specialists during endoscopy or colonoscopy in order of making faster and more accurate decisions. The proposed DLA-E consists of 256 processing elements with 1000 bps network on chip bandwidth. Based on the simulation results of this research, the best dataflow for this accelerator based on MobileNet v2 is kcp_ws from the weight stationary (WS) family. Total energy consumption and total runtime of this accelerator on the investigated dataset is 4.56 x 10(9) MAC (multiplier-accumulator) energy and 1.73 x 10(7) cycles respectively, which is the best result in comparison to other combinations of CNNs and dataflows.

关键词： deep learning accelerator Dataflow deep neural networks Convolutional neural networks Medical images Endoscopy

来源：评论

学校读者我要写书评

暂无评论

Advanced machine vision techniques for real-time quality detection and grading of navel oranges using high-resolution imaging and deep learning algorithms

Advanced machine vision techniques for real-time quality det...

引用

International Conference on Computer Vision, Robotics, and Automation Engineering, CRAE 2024

作者： Zhang, Xiaoyan Beijing Institute of Technology Zhuhai Zhuhai519000 China

ISBN: (纸本)9781510682283

In recent years, the rapid development of computer vision and artificial intelligence has significantly advanced agricultural applications, particularly in the quality detection and grading of navel oranges. This review explores the latest advancements in machine vision technologies for navel orange quality assessment, focusing on image processing, deep learning algorithms, and multispectral imaging. image processing techniques, fundamental to machine vision systems, enable the extraction of visual features such as color, shape, and texture, enhancing detection accuracy. Recent studies have demonstrated the efficacy of deep learning algorithms, particularly convolutional neural networks (CNNs), in achieving high-precision grading and real-time performance. Furthermore, multispectral and hyperspectral imaging technologies offer rich spectral information, facilitating more accurate quality detection and maturity assessment. Despite significant progress, challenges such as complex natural environments, lighting conditions, and the high cost of imaging equipment persist. Future research directions include integrating multi-source data fusion, developing efficient deep learning algorithms, and promoting cost-effective imaging technologies. Our proposed advanced algorithm employs the DAHENG machine vision experimental platform and MER-132-30UC camera for high-resolution image acquisition, with HALCON software for sophisticated image processing and real-time classification. This system enhances accuracy and efficiency in navel orange quality detection and grading, addressing key challenges in current agricultural practices. © 2024 SPIE.

关键词： Citrus fruits

来源：评论

学校读者我要写书评

暂无评论

Explore the potential of deep learning and hyperchaotic map in the meaningful visual image encryption scheme

引用

IET image processing 2023年第11期17卷 3235-3257页

作者： Chen, Wei Wang, Yichuan Xiao, Yeqiu Hei, Xinhong Xian Univ Technol Sch Comp Sci & Engn Xian Peoples R China Xian Univ Technol Sch Comp Sci & Engn Xian 710048 Peoples R China

In recent years, meaningful visual image encryption schemes that the plain image is compressed and encrypted and then hidden into the carrier image have received increasing attention. This paper proposes a new meaningful visual image encryption scheme, which consists of three stages: compression (compression network)-encryption (2D-SLC hyperchaotic map)-hiding (matrix encoding). First, the advantages of deep learning are explored. It can compress the width, height, channel, and pixel values of the plain image simultaneously. Second, a new 2D-SLC hyperchaotic map is designed to ensure security. It has a larger chaotic space and better randomness. Finally, to obtain a high-quality cipher image, the secure secret image is hidden in the grey carrier image by matrix encoding. The scheme can compress and encrypt the grey or colour plain image and then hide it in a grey carrier image. In addition, the theoretical peak signal-to-noise ratio (PSNR) between the cipher image and the carrier image is improved from 40.9292 to 42.1785 dB. The total running time is only about 0.35, 0.87 and 3.1 s for a 256 x 256, 512 x 512 and 1024 x 1024 grey or colour plain image, respectively.

关键词： chaos image colour analysis image processing

来源：评论

学校读者我要写书评

暂无评论

Intelligent surveillance system for Surawiwat school domitory using unmanned aerial vehicles 19

Intelligent surveillance system for Surawiwat school domitor...

引用

19th International Joint Symposium on Artificial Intelligence and Natural Language processing

作者： Khueankham, Pamorn Tantrairatn, Suradet Suranaree Univ Technol Sch Mechatron Engn Inst Engn Nakhon Ratchasima Thailand

ISBN: (纸本)9798331509927;9798331509910

This research presents an innovative approach to dormitory surveillance at Surawiwat School by employing an unmanned aerial vehicle (UAV) for autonomous monitoring. The UAV is used for aerial reconnaissance, allowing for efficient surveillance of the dormitory's perimeter to enhance security. By capturing high-resolution aerial images, the system aims to identify and track potential intruders, specifically those attempting to climb the dormitory fence. The images captured by the UAV are processed using advanced machine learning techniques, with a focus on object detection through deep learning. The system is built around a Convolutional Neural Network (CNN) and leverages the YOLOv8 (You Only Look Once) algorithm. YOLOv8 is recognized for its high accuracy and real-time processing capabilities, making it an ideal choice for real-time surveillance and detection tasks. The CNN-based model is trained to accurately detect human figures and identify unusual activities within the captured images. When the system detects an intruder, it sends an immediate alert, along with the captured aerial image, through the Line application to designated personnel. This instant notification system enhances response times, allowing school security to address potential threats proactively. Overall, this research demonstrates a sophisticated, AI-driven surveillance solution that combines UAV capabilities with state-of-the-art object detection, contributing to enhanced safety and security for school dormitories.

关键词： Machine learning deep learning image processing Object detection

来源：评论

学校读者我要写书评

暂无评论

A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids

引用

SENSORS 2025年第10期25卷 3082-3082页

作者： Sunwoo, Hasik Lee, Seungwoo Paik, Woojin Konkuk Univ Dept Comp Engn Glocal Campus268 Chungwon Daero Chungju Si 27478 Chungcheongbuk South Korea

Accurate intravenous (IV) fluid monitoring is critical in healthcare to prevent infusion errors and ensure patient safety. Traditional monitoring methods often depend on dedicated hardware, such as weight sensors or optical systems, which can be costly, complex, and challenging to scale across diverse clinical settings. This study introduces a software-defined sensing approach that leverages semantic segmentation using the pyramid scene parsing network (PSPNet) to estimate the remaining IV fluid volumes directly from images captured by standard smartphones. The system identifies the IV container (vessel) and its fluid content (liquid) using pixel-level segmentation and estimates the remaining fluid volume without requiring physical sensors. Trained on a custom IV-specific image dataset, the proposed model achieved high accuracy with mean intersection over union (mIoU) scores of 0.94 for the vessel and 0.92 for the fluid regions. Comparative analysis with the segment anything model (SAM) demonstrated that the PSPNet-based system significantly outperformed the SAM, particularly in segmenting transparent fluids without requiring manual threshold tuning. This approach provides a scalable, cost-effective alternative to hardware-dependent monitoring systems and opens the door to AI-powered fluid sensing in smart healthcare environments. Preliminary benchmarking demonstrated that the system achieves near-real-time inference on mobile devices such as the iPhone 12, confirming its suitability for bedside and point-of-care use.

关键词： smart healthcare deep learning semantic segmentation PSPNet medical image processing IV fluid monitoring software-defined sensors real-time computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：