Block-based compressive imaging (BCI) is based on the compressive sensing principle, which uses a spatial light modulator and a low-resolution detector to perform parallel high-speed sampling, followed by super-resolu...
详细信息
Block-based compressive imaging (BCI) is based on the compressive sensing principle, which uses a spatial light modulator and a low-resolution detector to perform parallel high-speed sampling, followed by super-resolution algorithm to reconstruct target image. When compared with traditional compressive imaging, BCI reduces the computational effort but introduces block artifacts. This paper proposes a data-driven deep neural network based on the swin transformer called SwinBCI, which introduces the local attention and shifted window mechanisms to improve the target image reconstruction quality. By using the dataset to train the model to obtain priori knowledge and performing graphics processing unit-accelerated computation, the computation time is greatly reduced to realize real-time BCI. We achieve better reconstruction performances with cake cutting-Hadamard matrix sampling than with Bernoulli matrix sampling. Comparison with three other classical compressed sensing reconstruction methods on four common image datasets and images acquired experimentally using the actual BCI system show that SwinBCI achieves faster high-quality reconstruction at each sampling rate.
Ultrasound images are widespread in medical diagnosis for muscle-skeletal, cardiac, and obstetrical diseases, due to the efficiency and non-invasiveness of the acquisition methodology. However, ultrasound acquisition ...
详细信息
Ultrasound images are widespread in medical diagnosis for muscle-skeletal, cardiac, and obstetrical diseases, due to the efficiency and non-invasiveness of the acquisition methodology. However, ultrasound acquisition introduces noise in the signal, which corrupts the resulting image and affects further processing steps, e.g. segmentation and quantitative analysis. We define a novel deeplearning framework for the real-time denoising of ultrasound images. Firstly, we compare state-of-the-art methods for denoising (e.g. spectral, low-rank methods) and select WNNM (Weighted Nuclear Norm Minimisation) as the best denoising in terms of accuracy, preservation of anatomical features, and edge enhancement. Then, we propose a tuned version of WNNM (tuned-WNNM) that improves the quality of the denoised images and extends its applicability to ultrasound images. Through a deeplearning framework, the tuned-WNNM qualitatively and quantitatively replicates WNNM results in real-time. Finally, our approach is general in terms of its building blocks and parameters of the deeplearning and high-performance computing framework;in fact, we can select different denoising algorithms and deeplearning architectures.
This study addresses the challenge of semantically sorting complex scenes in a mobile environment by processing multimodal visual inputs to create detailed landscape representations. Central to the approach is a strea...
详细信息
This study addresses the challenge of semantically sorting complex scenes in a mobile environment by processing multimodal visual inputs to create detailed landscape representations. Central to the approach is a streamlined multi-layer hierarchical model that mimics human attention dynamics, using the BING objectness metric to quickly identify significant areas by recognizing objects across different scales and contexts. To enhance feature extraction, time-sensitive and manifold-guided selectors are employed to prioritize high-quality visual features, while a low-rank active learning (LAL) algorithm simulates human-like focus on key visual zones, specifically in sports scenes. The model generates a Gaze Shift Path (GSP), which directs the collection of composite CNN features, ultimately classifying the scenes into distinct landscape types using a support vector machine (SVM). Experimental results on seven scene image sets have shown that our method outperforms the others by 2%similar to 5% . Additionally, our calculated deep GSP features can greatly facilitate image clustering. Last but not least, our visualized GSPs are over 90% consistent with real-world human gaze behaviors, which explains the competitiveness of our method.
This work details design and development of a microscopy image-based vegetable quality assessment system (Prototype) by adopting deeplearning (DL) technique on edge device. Current automated machine learning methods ...
详细信息
This work details design and development of a microscopy image-based vegetable quality assessment system (Prototype) by adopting deeplearning (DL) technique on edge device. Current automated machine learning methods primarily utilize outer-surface images of vegetables/fruits, often lacking in precise quantification of nutrient content such as carbohydrates, minerals, vitamins, etc. Indeed, such nutrient ingredients can be assessed by examining micro-level cell attributes of microscopy images in DL framework. However, vegetable quality detection based on microscopy/DLs on resource-constrained edge devices poses significant challenges. To address these problems, a portable, cost-effective, efficient, and real-time prototype has been realized. It involves configuring a microscopy image generation module using low-cost Foldscope lens coupled with smartphones and on-device analysis by designing a new lightweight DL architecture and segmentation algorithm. The analysis is executed via a smartphone application, ensuring advantages like bandwidth and energy efficiency, user privacy, local processing without external servers. For system validation, a pilot study has been conducted on the widely consumed potato tuber, focusing on the assessment of starch presence as a key quality metric. The system successfully assesses cell attributes, i.e., starch quantity of 10-25% in similar to 24s, which is very much consistent. In a comparative study, the network outperforms the existing state-of-the-art lightweight networks by achieving the highest recognition accuracy upto 88.8% and F1-score 85.83 with lesser parameters (1.5M) and FLOPs (118M). Thus, the study demonstrates its applicability for vegetable quality assessment in an easy, affordable, and effective way. Further, the proposed idea can be extended to other vegetables/fruits.
When computer vision techniques are used to identify humans in public places, the presence of cartoon characters can often result in false detections as humans, complicating the task of human recognition and hindering...
详细信息
When computer vision techniques are used to identify humans in public places, the presence of cartoon characters can often result in false detections as humans, complicating the task of human recognition and hindering the application of such technology in public. This paper aims to minimize the false detection rate by retraining the pretrained human detection models using transfer learning. The retraining process involves the utilization of a dataset consisting of two classes: humans and cartoon characters, with 11,000 images per class. The instances in the dataset are carefully labeled before splitting into training, validation, and testing sets. Each selected model is retrained, evaluated, and compared to the commonly used pretrained human detection models. The results reveal that the retrained YOLOv8n model performs the best for real-time application;it achieves 96.97% accuracy, 99.52% precision, 97.42% recall, 98.46% F1 score and a false detection rate of 8.16% yet has a small model size of 6.09 MB only. In addition, it outperforms all the pretrained models in terms of accuracy (by 5.38%) and F1 score (by 2.85%) in reducing the false detection rate of cartoon characters as humans. This has great implications in human counting and customer analytics. However, false detections of cartoons as humans still exist in either the pretrained or retrained models. More sophisticated models such as Vision Transformer will be studied in the future to minimize or completely eliminate the false detections since this can be done easily by a human being.
real-time and accurate detection of overhead cables violating street-level regulations is crucial for smart city management. Existing methods face challenges like slender target nature, occlusion, multi-scale variabil...
详细信息
real-time and accurate detection of overhead cables violating street-level regulations is crucial for smart city management. Existing methods face challenges like slender target nature, occlusion, multi-scale variability, and high inter-class similarity. This paper presents the IOA-YOLO model. It incorporates a Line-Target Enhancement Module (LTEM) for better slender object feature extraction, a Global-Local Dual Perception Module (GDPM) to boost robustness against occlusion, and a Hybrid Iterative Detection Head (HIDH) for multi-scale feature extraction using intra-and inter-layer information. An uncertainty-aware loss function (UAL) is introduced to suppress background interference and reduce inter-class similarity impact. Experiments on a custom dataset show IOA-YOLO outperforms existing methods, achieving 93.94% precision and 88.17% recall, with a good balance between accuracy and efficiency. It also adapts well to various urban environments and lighting conditions, demonstrating robust stability and great real-world deployment potential.
The advancement of artificial intelligence (AI) has bought many advances to human society as a whole. By using daily activities and integrating the technology from the fruits of AI, we can manage to gain further acces...
The advancement of artificial intelligence (AI) has bought many advances to human society as a whole. By using daily activities and integrating the technology from the fruits of AI, we can manage to gain further access to knowledge we can only begin to imagine. In identifying human action recognition (HAR); processing photos and videos to discern whether a human is present, then mapping the subject classified, which lastly determines the action being carried out is the objective. To achieve this, various steps are taken and careful approach is required, with the extensive amount of research, numerous troubleshooting and experimentation is required. The AI architecture has to learn from dataset collected for it to discern the identification of action properly. HAR is achieved by using Python code using real-time webcam feed. Human pose detection library known as MediaPipe Pose Detection detects human anatomy from input through joints key-points. MediaPipe algorithm that extract features in x-y-z axis with visibility (four variables) and the extracted data is trained using CNN-LSTM based on the trained and tested algorithm classifier model. The output obtained produced an RGB-skeleton and an action label on the detected subject as standing, waving, walking and sitting, has yielded good results.
Wide-field interferometric microscopy (WIM) has been utilized for visualization of individual biological nanoparticles with high sensitivity. However, the image quality is highly affected by the focusing of the image....
详细信息
ISBN:
(纸本)9798350343557
Wide-field interferometric microscopy (WIM) has been utilized for visualization of individual biological nanoparticles with high sensitivity. However, the image quality is highly affected by the focusing of the image. Hence, focus detection has been an active research field within the scope of imaging and microscopy. To tackle this issue, we propose a novel convolution and transformer based deeplearning technique to detect focus in WIM. The method is compared to other focus detecton techniques and is able to obtain higher precision with less number of parameters. Furthermore, the model achieves real-time focus detection thanks to its low inference time.
The performance of visual processing is commonly constrained in extreme outside weather such as heavy rain. Rain streaks may substantially damage image optical quality and impact imageprocessing in many scenarios. Th...
详细信息
The performance of visual processing is commonly constrained in extreme outside weather such as heavy rain. Rain streaks may substantially damage image optical quality and impact imageprocessing in many scenarios. Thus, it has practical application value in researching the problem of single image rain removal. However, removing rain streaks from a single image is a challenging task. Although end-to-end learning approaches based on convolutional neural networks have lately made significant progress on this task, most existing methods still cannot perform deraining well. They fail to process the details of the background layer, resulting in the loss of certain information. To address this issue, we propose a single image deraining network named twin-stage Unet-like network (TUNet). Specifically, a reconstitution residual block (RRB) is presented as the basic structure of encoder-decoder to obtain more spatial contextual information for extracting rain components. Then, a residual sampling module (RSM) is introduced to perform downsampling and upsampling operations to preserve residual properties in the structure while obtaining deeper image features. Finally, the convolutional block attention module (CBAM) is adopted to fuse shallow and deep features of the same size in the model. Extensive experiments on five publicly synthetic datasets and a real-world dataset demonstrate that our proposed TUNet model outperforms the state-of-the-art deraining approaches. The average PSNR value of TUNet is 0.41 dB higher than the state-of-the-art method (OSAM-Net) on synthetic datasets.
This model enables an individual to input an image and output a description for the same. The research paper makes use of the functionalities of deeplearning and NLP (Natural Language processing). image Caption Gener...
详细信息
暂无评论