In the advanced field of imageprocessing and Computer vision (IP/Cv), there is a trend toward utilising parallel processing in computer architectures for enhanced efficiency, striking a balance between general-purpos...
详细信息
ISBN:
(纸本)9798350355291;9798350355284
In the advanced field of imageprocessing and Computer vision (IP/Cv), there is a trend toward utilising parallel processing in computer architectures for enhanced efficiency, striking a balance between general-purpose capabilities and hardware-specific processes. The RISC-v standard, now backed by a wide array of compilers, frameworks, and operating systems, is paving the way for innovative cores. Our introduction of a Multi-Processor Systems on Chip (MPSoC), MPRISC-v, is a testament to this evolution. This system incorporates a Network on Chip (NoC) for robust intra-chip communication. The processing System (PS) seamlessly integrates and manages it through a user-friendly API crafted to simplify the development cycle. To ascertain its effectiveness, we tested it on a Zynq Ultrascale+ MPSoC device, deploying a Sobel-based application benchmark. By evaluating its efficiency in terms of cycles/pixels, our findings underscore its potential and spotlight areas ripe for further enhancement.
Facial expression generation in computer vision is essential for improving human-computer interaction by enabling machines to interpret and respond to human emotions effectively. This area has attracted considerable r...
详细信息
ISBN:
(纸本)9798331541859;9798331541842
Facial expression generation in computer vision is essential for improving human-computer interaction by enabling machines to interpret and respond to human emotions effectively. This area has attracted considerable research interest. In this context, we introduce a new approach for generating facial expressions from a single neutral image and a target expression label. Our method, referred to as Motion-Oriented Diffusion Model (MODM), leverages latent diffusion techniques, which are known for their ability to learn complex latent spaces and integrate controlled stochasticity to diversify generated content. The main idea of MODM is separating the embedding space into identity and motion domains, and applying diffusion to the motion latent space only. This strategy enhances our model capability to generate various facial expressions while ensuring that the identity details remain consistent across different expressions. To assess the effectiveness of MODM, we perform qualitative and quantitative evaluations using the MUG facial expression database. The preliminary results demonstrate that MODM can generate realistic videos of the six basic facial expressions, preserving the identity of the input subject while accurately representing different emotional states. Additionally, our study highlights promising directions for potential future research and improvements.
image classification is one of the main parts of computer vision, which is important in applications like self-driving automotives/vehicle systems. While working with image/video data it needs huge amount of resources...
详细信息
Depth information is useful in many imageprocessing and computer visionapplications, but in photography, depth information is lost in the process of projecting a real-world scene onto a 2D plane. Extracting depth in...
详细信息
machine learning-based algorithms using fully convolutional networks (FCNs) have been a promising option for medical image segmentation. However, such deep networks silently fail if input samples are drawn far from th...
详细信息
ISBN:
(纸本)9781665493468
machine learning-based algorithms using fully convolutional networks (FCNs) have been a promising option for medical image segmentation. However, such deep networks silently fail if input samples are drawn far from the training data distribution, thus causing critical problems in automatic data processing pipelines. To overcome such outof-distribution (OoD) problems, we propose a novel OoD score formulation and its regularization strategy by applying an auxiliary add-on classifier to an intermediate layer of an FCN, where the auxiliary module is helfpul for analyzing the encoder output features by taking their class information into account. Our regularization strategy train the module along with the FCN via the principle of outlier exposure so that our model can be trained to distinguish OoD samples from normal ones without modifying the original network architecture. Our extensive experiment results demonstrate that the proposed approach can successfully conduct effective OoD detection without loss of segmentation performance. In addition, our module can provide reasonable explanation maps along with OoD scores, which can enable users to analyze the reliability of predictions.
Recent technological advancements have significantly improved indoor autonomous vision systems (IAvSs), underscoring the critical need to enhance their capability to interpret real-world environments in a manner simil...
详细信息
Recent technological advancements have significantly improved indoor autonomous vision systems (IAvSs), underscoring the critical need to enhance their capability to interpret real-world environments in a manner similar to human perception. In response to this challenge, this paper introduces DEADFL-UNet, a groundbreaking framework that enhances the existing EADFL-UNet architecture. EADFL-UNet utilized the EfficientNetB3 model, supplemented by a new Super Attention Block and CBW-FL Loss Function, to tackle the significant data imbalance found in the NYUv2 dataset. Our enhancement focuses on using the MobileNetv2 model in conjunction with several fine-tuning techniques to maximize Depth characteristics in tandem with RGB ones inside the prior architecture. By applying the proposed techniques, we achieved an improvement of approximately 6% in mIoU (Mean Intersection over Union) compared to the original EADFL-UNet model, which was previously published. Furthermore, the difference between the fine-tuned and non-fine-tuned versions is 1.91% in mIoU, demonstrating the significant effectiveness of the fine-tuning technique. To confirm the real-time FPS (Frame Per Second) performance of each model, this technique has undergone extensive testing and assessment using standard metrics, not only on pre-existing datasets but also in a ROS2 (Robot Operating System) simulation environment. These proven techniques have potential for various applications in autonomous systems, such as robotic vision, GPS (Global Positioning System) position tracking, autonomous vehicles, and security, improving accuracy and efficiency.
Point source detection algorithms play a pivotal role across diverse applications, influencing fields such as astronomy, biomedical imaging, environmental monitoring, and beyond. This article reviews the algorithms us...
详细信息
Point source detection algorithms play a pivotal role across diverse applications, influencing fields such as astronomy, biomedical imaging, environmental monitoring, and beyond. This article reviews the algorithms used for space imaging applications from ground and space telescopes. The main difficulties in detection arise from the incomplete knowledge of the impulse function of the imaging system, which depends on the aperture, atmospheric turbulence (for ground-based telescopes), and other factors, some of which are time-dependent. Incomplete knowledge of the impulse function decreases the effectiveness of the algorithms. In recent years, deep learning techniques have been employed to mitigate this problem and have the potential to outperform more traditional approaches. The success of deep learning techniques in object detection has been observed in many fields, and recent developments can further improve the accuracy. However, deep learning methods are still in the early stages of adoption and are used less frequently than traditional approaches. In this review, we discuss the main challenges of point source detection, as well as the latest developments, covering both traditional and current deep learning methods. In addition, we present a comparison between the two approaches to better demonstrate the advantages of each methodology.
The rapid development of machinevisionapplications demands hardware that can sense and process visual information in a single monolithic unit to avoid redundant data transfer. Here, we design and demonstrate a monol...
详细信息
The rapid development of machinevisionapplications demands hardware that can sense and process visual information in a single monolithic unit to avoid redundant data transfer. Here, we design and demonstrate a monolithic vision enhancement chip with light-sensing, memory, digital-to-analog conversion, and processing functions by implementing a 619-pixel with 8582 transistors and physical dimensions of 10 mm by 10 mm based on a wafer-scale two-dimensional (2D) monolayer molybdenum disulfide (MoS2). The light- sensing function with analog MoS2 transistor circuits offers low noise and high photosensitivity. Furthermore, we adopt a MoS2 analog processing circuit to dynamically adjust the photocurrent of individual imaging sensor, which yields a high dynamic light- sensing range greater than 90 decibels. The vision chip allows the applications for contrast enhancement and noise reduction of imageprocessing. This large-scale monolithic chip based on 2D semiconductors shows multiple functions with light sensing, memory, and processing for artificial machinevisionapplications, exhibiting the potentials of 2D semiconductors for future electronics.
Monocular depth estimation is an important step in many downstream tasks in machinevision. We address the topic of estimating monocular depth from defocus blur which can yield more accurate results than the semantic ...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Monocular depth estimation is an important step in many downstream tasks in machinevision. We address the topic of estimating monocular depth from defocus blur which can yield more accurate results than the semantic based depth estimation methods. The existing monocular depth from defocus techniques are sensitive to the particular camera that the images are taken from. We show how several camera-related parameters affect the defocus blur using optical physics equations and how they make the defocus blur depend on these parameters. The simple correction procedure we propose can alleviate this problem which does not require any retraining of the original model. We created a synthetic dataset which can be used to test the camera independent performance of depth from defocus blur models. We evaluate our model on both synthetic and real datasets (DDFF12 and NYU depth v2) obtained with different cameras and show that our methods are significantly more robust to the changes of cameras. Code: https://github. com/ sleekEagle/ defocus_ ***
Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on h...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Within (semi-)automated visual industrial inspection, learning-based approaches for assessing visual defects, including deep neural networks, enable the processing of otherwise small defect patterns in pixel size on high-resolution imagery. The emergence of these often rarely occurring defect patterns explains the general need for labeled data corpora. To alleviate this issue and advance the current state of the art in unsupervised visual inspection, this work proposes a DifferNet-based solution enhanced with attention modules: AttentDifferNet. It improves image-level detection and classification capabilities on three visual anomaly detection datasets for industrial inspection: InsPLAD-fault, MvTec AD, and Semiconductor Wafer. In comparison to the state of the art, AttentDifferNet achieves improved results, which are, in turn, highlighted throughout our quali-quantitative study. Our quantitative evaluation shows an average improvement compared to DifferNet - of 1.77 +/- 0.25 percentage points in overall AUROC considering all three datasets, reaching SOTA results in InsPLAD-fault, an industrial inspection in-the-wild dataset. As our variants to AttentDifferNet show great prospects in the context of currently investigated approaches, a baseline is formulated, emphasizing the importance of attention for industrial anomaly detection both in the wild and in controlled environments.
暂无评论