To enhance the appeal of residential real estate listings and captivate online customers, clean and visually convincing indoor scenes are highly desirable. In this research, we introduce an innovative image inpainting...
详细信息
To enhance the appeal of residential real estate listings and captivate online customers, clean and visually convincing indoor scenes are highly desirable. In this research, we introduce an innovative image inpainting model designed to seamlessly replace undesirable elements within images of indoor residential spaces with realistic and coherent alternatives. While Generative Adversarial Networks (GANs) have demonstrated remarkable potential for removing unwanted objects, they can be resource-intensive and face difficulties in consistently producing high-quality outcomes, particularly when unwanted objects are scattered throughout the images. To empower small- and medium-sized businesses with a competitive edge, we present a novel GAN model that is resource-efficient and requires minimal training time using arbitrary mask generation and a novel half-perceptual loss function. Our GAN model achieves compelling results in removing unwanted elements from indoor scenes, demonstrating the capability to train within a single day using a single GPU, all while minimizing the need for extensive post-processing.
Quantifying atmospheric turbulence intensity is a challenging task, particularly when assessing real-world scenarios. In this paper, we propose a deeplearning method for quantifying atmospheric turbulence intensity b...
详细信息
Quantifying atmospheric turbulence intensity is a challenging task, particularly when assessing real-world scenarios. In this paper, we propose a deeplearning method for quantifying atmospheric turbulence intensity based on the space-time domain analysis from videos depicting different turbulence levels. We capture videos of a static image under controlled air turbulence intensities using an inexpensive camera, and then, by slicing these videos in the space-time domain, we extract spatio-temporal representations of the turbulence dynamics. These representations are then fed into a Convolutional Neural Network for classification. This network effectively learns to discriminate between different turbulence regimes based on the spatio-temporal features extracted from a real-world experiment captured in video slices.
To avoid the time-consuming and often monotonous task of manual inspection of crystallization plates, a Python-based program to automatically detect crystals in crystallization wells employing deeplearning techniques...
详细信息
To avoid the time-consuming and often monotonous task of manual inspection of crystallization plates, a Python-based program to automatically detect crystals in crystallization wells employing deeplearning techniques was developed. The program uses manually scored crystallization trials deposited in a database of an in-house crystallization robot as a training set. Since the success rate of such a system is able to catch up with manual inspection by trained persons, it will become an important tool for crystallographers working on biological samples. Four network architectures were compared and the SqueezeNet architecture performed best. In detecting crystals AlexNet accomplished a better result, but with a lower threshold the mean value for crystal detection was improved for SqueezeNet. Two assumptions were made about the imaging rate. With these two extremes it was found that an imageprocessing rate of at least two times, but up to 58 times in the worst case, would be needed to reach the maximum imaging rate according to the deeplearning network architecture employed for real-time classification. To avoid high workloads for the control computer of the CrystalMation system, the computing is distributed over several workstations, participating voluntarily, by the grid programming system from the Berkeley Open Infrastructure for Network Computing (BOINC). The outcome of the program is redistributed into the database as automatic real-time scores (ARTscore). These are immediately visible as colored frames around each crystallization well image of the inspection program. In addition, regions of droplets with the highest scoring probability found by the system are also available as images.
Although deeplearning-based continuous sign language translation (CSLT) models have made great progress in recent years, they are still faced with various difficulties and limitations when applied to practical scenar...
详细信息
Although deeplearning-based continuous sign language translation (CSLT) models have made great progress in recent years, they are still faced with various difficulties and limitations when applied to practical scenarios. In order to better apply the technology of deeplearning, we propose the adaptive route sign transformer framework for CSLT. The adaptive routing strategy is proposed to solve the problem that the accuracy of the deeplearning model trained in the laboratory scene is greatly reduced when it is applied to the real scene, and the back-end part of the model, we present, adopts transformer-style decoder architecture to real-time translate sentences from the spatiotemporal context around the signer. By means of network layer visualization, we demonstrate that the attention mechanism of the model captures the hand and face regions of signers, which is often crucial for semantic analysis of video sign language. In this paper, we introduce the Chinese sign language corpus of the business scene which show sign language communication in a bank, a station, etc. It has certain impetuses for further research on video sign language translation. Experiments are carried out the PHOENIX-Weather 2014T (RWTH Aachen University, Germany);the proposed model outperforms the state-of-the-art in inference times and accuracy using only raw RGB as input.
Hyperspectral imaging can be conceptualized as a three-dimensional dataset of spectral information related to a particular landscape. Generally speaking, these are aerial photographs captured by Earth observation sate...
详细信息
Hyperspectral imaging can be conceptualized as a three-dimensional dataset of spectral information related to a particular landscape. Generally speaking, these are aerial photographs captured by Earth observation satellites. A useful analogy for a hyperspectral image is one of a cube formed with the image acquired along the X and Y axes and a third dimension of spectral bands of varying wavelengths. Given the wealth of data contained within these images, they have been employed in both civilian and military applications such as terrain recognition, urban development supervision, recognition of rare minerals, and various other objectives. The increased utilization of these images has garnered the interest of researchers striving to create solutions that may enable faster processing of the images via parallel processing. In this context, FPGA technology is an option capable of facilitating the implementation of such a system for observation satellites. This research is situated within this framework and aims to develop an FPGA-synthesized hardware accelerator to facilitate real -time hyperspectral image categorization. By taking this approach, hardware-specific solutions can be implemented for embedded applications that process hyperspectral images and can also be integrated with further imageprocessing steps. The proposed accelerator was constructed based on an advanced algorithmic model, resulting in outcomes consistent with those generated by the software -based solution. The experimental results demonstrate that the engineered accelerator can attain a pixel classification time equal to or less than the pixel acquisition time, thus conforming to the real -timeprocessing criteria concerning classification time. Further, the manufactured accelerator exhibits scalability that can classify distinct datasets with varying classes concurrently while maintaining a uniform logic resource utilization.
Defect detection in chip packaging is a crucial step to ensure product quality and reliability. Traditional methods typically employ image-processing techniques for defect detection during the chip manufacturing proce...
详细信息
Defect detection in chip packaging is a crucial step to ensure product quality and reliability. Traditional methods typically employ image-processing techniques for defect detection during the chip manufacturing process. However, these solutions require manual feature extraction and have limited adaptability to complex scenarios. Thus, deep-learning (DL)-based methods have received widespread attention. Nevertheless, they may fail to achieve the requirements of real-time and high accuracy, and effective datasets are still missing. In this article, we construct a new chip package surface defect detection dataset, which contains 2919 images and four common defect types. To our knowledge, it is the only dataset for simultaneous detection of multiple chips. Also, we propose a real-time chip package surface defect detection method based on the you only look once version 7 (YOLOv7) model to solve the challenge of detecting small targets. In particular, we utilize k -means++ to recluster the anchor frames, merge the convolutional block attention module (CBAM) attention mechanism and receptive field block (RFB) structure, as well as replace traditional nonmaximum suppression (NMS) with our newly proposed confidence propagation cluster (CP-Cluster) to further increase detection accuracy and result confidence. Finally, we evaluate our method by performing many ablation experiments on the dataset we created. The experimental results demonstrate that compared to the original YOLOv7, the proposed method improves the mean average precision@0.5 (mAP@0.5) by 1.39%, the speed of detection by 21.6%, reduces the amount of computation by 17.7%, and the number of parameters by 66.4%, respectively. This proves the superiority and practicality of the proposed method.
The rapid adoption of Advanced Driver Assistance Systems (ADAS) in modern vehicles, aiming to elevate driving safety and experience, necessitates the real-timeprocessing of high-definition video data. This requiremen...
详细信息
The rapid adoption of Advanced Driver Assistance Systems (ADAS) in modern vehicles, aiming to elevate driving safety and experience, necessitates the real-timeprocessing of high-definition video data. This requirement brings about considerable computational complexity and memory demands, highlighting a critical research void for a design integrating high FPS throughput with optimal Mean Average Precision (mAP) and Mean Intersection over Union (mIoU). Performance improvement at lower costs, multi-tasking ability on a single hardware platform, and flawless incorporation into memory-constrained devices are also essential for boosting ADAS performance. Addressing these challenges, this study proposes an ADAS multi-task learning hardware-software co-design approach underpinned by the Kria KV260 Multi-Processor System-on-Chip Field Programmable Gate Array (MPSoC-FPGA) platform. The approach facilitates efficient real-time execution of deeplearning algorithms specific to ADAS applications. Utilizing the BDD100K+Waymo, KITTI, and CityScapes datasets, our ADAS multi-task learning system endeavours to provide accurate and efficient multi-object detection, segmentation, and lane and drivable area detection in road images. The system deploys a segmentation-based object detection strategy, using a ResNet-18 backbone encoder and a Single Shot Detector architecture, coupled with quantization-aware training to augment inference performance without compromising accuracy. The ADAS multi-task learning offers customization options for various ADAS applications and can be further optimized for increased precision and reduced memory usage. Experimental results showcase the system's capability to perform real-time multi-class object detection, segmentation, line detection, and drivable area detection on road images at approximately 25.4 FPS using a 1920 x 1080p Full HD camera. Impressively, the quantized model has demonstrated a 51% mAP for object detection, 56.62% mIoU for image segmen
In this paper, a deeplearning -based underwater positioning scheme is proposed to achieve robust feature tracking of an autonomous underwater vehicle (AUV) in sonar image during dynamic docking. To address the issues...
详细信息
In this paper, a deeplearning -based underwater positioning scheme is proposed to achieve robust feature tracking of an autonomous underwater vehicle (AUV) in sonar image during dynamic docking. To address the issues that the distorted feature and acoustic noises lead significant difficulty to detection and tracking of AUV in acoustic image during dynamic docking, first, a pre -trained You Only Look Once (YOLO) network is applied to detect both body and head features of AUV. Second, we introduce an Intersection Over Union (IOU) match -based backend which preliminarily filters the error detections of AUV head based on the rigid relationship between body and head of AUV. Subsequently, Simple Online and realtime Tracking with a deep association metric (deepSort) is utilized to achieve track matching of all detection results including error detections and real target. Moreover, a scoring mechanism is presented to further remove the unfiltered error detections based on the motion tendency of detection tracks. Experiment result shows that the proposed scheme enables real-time and robust feature tracking of AUV with the interference of feature distortion, reverberation and environmental noises.
real-world images captured in remote sensing, image or video retrieval, and outdoor surveillance are often degraded due to poor weather conditions, such as rain and mist. These conditions introduce artifacts that make...
详细信息
real-world images captured in remote sensing, image or video retrieval, and outdoor surveillance are often degraded due to poor weather conditions, such as rain and mist. These conditions introduce artifacts that make visual analysis challenging and limit the performance of high-level computer vision methods. In time-critical applications, it is vital to develop algorithms that automatically remove rain without compromising the quality of the image contents. This article proposes a novel approach called QSAM-Net, a quaternion multi-stage multiscale neural network with a self-attention module. The algorithm requires significantly fewer parameters by a factor of 3.98 than the real-valued counterpart and state-of-the-art methods while improving the visual quality of the images. The extensive evaluation and benchmarking on synthetic and real-world rainy images demonstrate the effectiveness of QSAM-Net. This feature makes the network suitable for edge devices and applications requiring near real-time performance. Furthermore, the experiments show that the improved visual quality of images also leads to better object detection accuracy and training speed.
A deeplearning-based face anti-spoofing system has been proposed here. This work has been implemented in four segments. Firstly, an image preprocessing task is performed to extract the facial region. Then, the textur...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
A deeplearning-based face anti-spoofing system has been proposed here. This work has been implemented in four segments. Firstly, an image preprocessing task is performed to extract the facial region. Then, the texture analysis of the facial region is performed to compute discriminant features. For this, a robust approach to deeplearning techniques is needed, starting with defining some convolutional neural network (CNN) architectures for feature computation, followed by the classification of genuine vs. imposter face liveliness. The motivation of this work is to find both software- and hardware-based solutions to access biometric-based real-time systems through robust and vigorous face-liveness detection techniques. The recognition system's performances are further improved by image acquisition-challenging issues, image augmentation, fine-tuning, transfer learning, and the fusion of various trained CNN models. Finally, the above steps have been embedded in Raspberry Pi devices to build the system for real-time applications. The experimentation with two benchmark databases, NUAA and CASIA Replay-Attack, and comparing the performance with some well-known methods relating to the proposed system area show the proposed system's superiority.
暂无评论