This paper investigates video frame extrapolation, which can predict future frames from current and past frames. Although there have been many studies on video frame extrapolation in recent years, most of them suffer ...
详细信息
This paper investigates video frame extrapolation, which can predict future frames from current and past frames. Although there have been many studies on video frame extrapolation in recent years, most of them suffer from the unsatisfactory image quality of the predicted frames such as severe blurring because it is difficult to predict the movement of future pixels for multi-modal video frames, especially with fast changing frames. An additional process such as frame alignment or recurrent prediction can improve the quality of the predicted frames, but it hinders real-time extrapolation. Motivated by the significant progress in video frame interpolation using deeplearning-based flow estimation, a simplified video frame extrapolation scheme using deeplearning-based uni-directional flow estimation is proposed to reduce the processingtime compared to conventional video frame extrapolation schemes without compromising the image quality of the predicted frames. In the proposed scheme, the uni-directional flow is first estimated from the current and past frames through a flow network consisting of four flow blocks and the current frame is forward-warped through the estimated flow to predict a future frame. The proposed flow network is trained and evaluated using the Vimeo-90K triplet dataset. The performance of the proposed scheme is analyzed using the trained flow network in terms of prediction time as well as the similarity between predicted and ground truth frames such as the structural similarity index measure and mean absolute error of pixels, and compared to that of the state-of-the-art schemes such as Iterative and cycleGAN schemes. Extensive experiments show that the proposed scheme improves prediction quality by 2.1% and reduces prediction time by 99.7% compared to the state-of-the-art scheme.
This paper introduces Singular-value Gain Compensation (SGC), a robust preprocessing method for Ground Penetrating Radar (GPR) that integrates Singular Value Decomposition (SVD) and time Gain Compensation (TGC). SGC e...
详细信息
This paper introduces Singular-value Gain Compensation (SGC), a robust preprocessing method for Ground Penetrating Radar (GPR) that integrates Singular Value Decomposition (SVD) and time Gain Compensation (TGC). SGC effectively enhances the signal-to-noise ratio while maintaining weak signal integrity, facilitating the application of pretrained zero-shot segmentation models. Through extensive evaluations using simulated and real-world data, SGC demonstrates superior performance in image quality and segmentation accuracy compared to traditional methods, showing the improvements of +3.1 dB in PSNR and 23% in segmentation's IoU in complex simulated scenarios. It also shows 20% and 14% improvements in pipe and void segmentations on real-world data. Additionally, SGC is computationally efficient, reducing both time and memory requirements, making it practical for large-scale infrastructure assessments. The method's efficacy in enhancing GPR image analysis without extensive computational resources marks a significant advancement in ground penetrating radar preprocessing and provide more possibilities for future research in the downstream tasks combining with recent deeplearning models.
Colorectal cancer (CRC) is one of the most common and deadly cancers in the world, and most cases arise from polyps. Colonoscopy is a widely recognized and effective method for polyp diagnosis. However, clinical diagn...
详细信息
Colorectal cancer (CRC) is one of the most common and deadly cancers in the world, and most cases arise from polyps. Colonoscopy is a widely recognized and effective method for polyp diagnosis. However, clinical diagnosis has a high rate of missed polyps. Despite the capacity of deeplearning methods to enhance the detection rate by extracting diverse features of polyps, the real-time performance, error rate, and misidentification ratio in actual clinical diagnosis have yet to meet the criteria for practical utilization. Here, we propose an improved structure for accurate polyp detection by enhancing the YOLOv8 algorithm to overcome these obstacles. Firstly, we introduce an enhanced Reverse Attention Mechanism Channel (RA-S) module to improve the algorithm's detection performance by fusing global feature information with local details of the image. Then, we integrate an attention mechanism into a Path Aggregation Network (PAnet) to improve the algorithm's ability to fuse multiscale features to adapt to the variations in polyps. Finally, the proposed method was validated using the ETISLARIB dataset, which was not part of the training data. The proposed method achieved high precision (92.1 %), recall (84.5 %), and F1 (88.1 %) on the public ETIS-LARIB dataset, showcasing the robust detection performance and generalization ability of the proposed method.
Breast cancer is commonly recognized as the second most frequent malignancy in women worldwide. Breast cancer therapy includes surgical surgery, radiation therapy, and medication which can be exceedingly successful, w...
详细信息
Breast cancer is commonly recognized as the second most frequent malignancy in women worldwide. Breast cancer therapy includes surgical surgery, radiation therapy, and medication which can be exceedingly successful, with 90% or higher survival rates, especially when the condition is discovered early. This work is one such approach for early detection of breast cancer relying on the BI-RADS score. In this regard, a computer-aided-diagnosis system based on a bespoke Digital Mammogram Diagnostic Convolutional Neural Network (DMD-CNN) model that can aid in the categorization of mammogram breast lesions is proposed. Furthermore, a PYNQ-based acceleration through the Artix 7 FPGA is employed for deployment of DMD-CNN model's hardware acceleration platform which is the first of its kind for breast cancer, yielding a performance accuracy of 98.2%, the proposed model exceeded the state-of-the-art approach. The comparative analysis performed in the study has shown that the proposed method has resulted in a 4% increase in accuracy and a good recognition rate of 96% when compared to the existing model. A k-fold cross-validation (k = 5, 7, 9 the reported accuracy score values are 96.2%, 97.5% and 98.1%, respectively) approach was used to test and assess the integrated system. Extensive testing using mammography datasets was carried out to determine the increased performance of the suggested approach. Experiments reveal that when compared to the DMD-CNN model acceleration to GPU, the suggested solution not only optimizes resource utilization but also decreases power consumption to 3.12 W. Hardware acceleration through FPGA resulted in processing and analyzing nearly 91 images in a second where a single image will be processed using CPU.
Micro-expressions, fleeting and subtle facial expressions, possess significant application potential. However, their brief duration, low intensity, and localized motion pose challenges for traditional detection method...
详细信息
The super power of deeplearning in image classification problems have become very popular and applicable in many areas like medical sciences. Some of the medical applications are real-time and may be implemented in e...
详细信息
The super power of deeplearning in image classification problems have become very popular and applicable in many areas like medical sciences. Some of the medical applications are real-time and may be implemented in embedded devices. In these cases, achieving the highest level of accuracy is not the only concern. Computation runtime and power consumption are also considered as the most important performance indicators. These parameters are mainly evaluated in hardware design phase. In this research, an energy efficient deeplearning accelerator for endoscopic images classification (DLA-E) is proposed. This accelerator can be implemented in the future endoscopic imaging equipments for helping medical specialists during endoscopy or colonoscopy in order of making faster and more accurate decisions. The proposed DLA-E consists of 256 processing elements with 1000 bps network on chip bandwidth. Based on the simulation results of this research, the best dataflow for this accelerator based on MobileNet v2 is kcp_ws from the weight stationary (WS) family. Total energy consumption and total runtime of this accelerator on the investigated dataset is 4.56 x 10(9) MAC (multiplier-accumulator) energy and 1.73 x 10(7) cycles respectively, which is the best result in comparison to other combinations of CNNs and dataflows.
In recent years, the rapid development of computer vision and artificial intelligence has significantly advanced agricultural applications, particularly in the quality detection and grading of navel oranges. This revi...
详细信息
In recent years, meaningful visual image encryption schemes that the plain image is compressed and encrypted and then hidden into the carrier image have received increasing attention. This paper proposes a new meaning...
详细信息
In recent years, meaningful visual image encryption schemes that the plain image is compressed and encrypted and then hidden into the carrier image have received increasing attention. This paper proposes a new meaningful visual image encryption scheme, which consists of three stages: compression (compression network)-encryption (2D-SLC hyperchaotic map)-hiding (matrix encoding). First, the advantages of deeplearning are explored. It can compress the width, height, channel, and pixel values of the plain image simultaneously. Second, a new 2D-SLC hyperchaotic map is designed to ensure security. It has a larger chaotic space and better randomness. Finally, to obtain a high-quality cipher image, the secure secret image is hidden in the grey carrier image by matrix encoding. The scheme can compress and encrypt the grey or colour plain image and then hide it in a grey carrier image. In addition, the theoretical peak signal-to-noise ratio (PSNR) between the cipher image and the carrier image is improved from 40.9292 to 42.1785 dB. The total running time is only about 0.35, 0.87 and 3.1 s for a 256 x 256, 512 x 512 and 1024 x 1024 grey or colour plain image, respectively.
This research presents an innovative approach to dormitory surveillance at Surawiwat School by employing an unmanned aerial vehicle (UAV) for autonomous monitoring. The UAV is used for aerial reconnaissance, allowing ...
详细信息
ISBN:
(纸本)9798331509927;9798331509910
This research presents an innovative approach to dormitory surveillance at Surawiwat School by employing an unmanned aerial vehicle (UAV) for autonomous monitoring. The UAV is used for aerial reconnaissance, allowing for efficient surveillance of the dormitory's perimeter to enhance security. By capturing high-resolution aerial images, the system aims to identify and track potential intruders, specifically those attempting to climb the dormitory fence. The images captured by the UAV are processed using advanced machine learning techniques, with a focus on object detection through deeplearning. The system is built around a Convolutional Neural Network (CNN) and leverages the YOLOv8 (You Only Look Once) algorithm. YOLOv8 is recognized for its high accuracy and real-timeprocessing capabilities, making it an ideal choice for real-time surveillance and detection tasks. The CNN-based model is trained to accurately detect human figures and identify unusual activities within the captured images. When the system detects an intruder, it sends an immediate alert, along with the captured aerial image, through the Line application to designated personnel. This instant notification system enhances response times, allowing school security to address potential threats proactively. Overall, this research demonstrates a sophisticated, AI-driven surveillance solution that combines UAV capabilities with state-of-the-art object detection, contributing to enhanced safety and security for school dormitories.
Accurate intravenous (IV) fluid monitoring is critical in healthcare to prevent infusion errors and ensure patient safety. Traditional monitoring methods often depend on dedicated hardware, such as weight sensors or o...
详细信息
Accurate intravenous (IV) fluid monitoring is critical in healthcare to prevent infusion errors and ensure patient safety. Traditional monitoring methods often depend on dedicated hardware, such as weight sensors or optical systems, which can be costly, complex, and challenging to scale across diverse clinical settings. This study introduces a software-defined sensing approach that leverages semantic segmentation using the pyramid scene parsing network (PSPNet) to estimate the remaining IV fluid volumes directly from images captured by standard smartphones. The system identifies the IV container (vessel) and its fluid content (liquid) using pixel-level segmentation and estimates the remaining fluid volume without requiring physical sensors. Trained on a custom IV-specific image dataset, the proposed model achieved high accuracy with mean intersection over union (mIoU) scores of 0.94 for the vessel and 0.92 for the fluid regions. Comparative analysis with the segment anything model (SAM) demonstrated that the PSPNet-based system significantly outperformed the SAM, particularly in segmenting transparent fluids without requiring manual threshold tuning. This approach provides a scalable, cost-effective alternative to hardware-dependent monitoring systems and opens the door to AI-powered fluid sensing in smart healthcare environments. Preliminary benchmarking demonstrated that the system achieves near-real-time inference on mobile devices such as the iPhone 12, confirming its suitability for bedside and point-of-care use.
暂无评论