image composition plays a common but important role in photo editing. To acquire photo-realistic composite images, one must adjust the appearance and visual style of the foreground to be compatible with the background...
详细信息
ISBN:
(纸本)9781665445092
image composition plays a common but important role in photo editing. To acquire photo-realistic composite images, one must adjust the appearance and visual style of the foreground to be compatible with the background. Existing deeplearning methods for harmonizing composite images directly learn an image mapping network from the composite to real one, without explicit exploration on visual style consistency between the background and the foreground images. To ensure the visual style consistency between the foreground and the background, in this paper, we treat image harmonization as a style transfer problem. In particular, we propose a simple yet effective Region-aware Adaptive Instance Normalization (RAIN) module, which explicitly formulates the visual style from the background and adaptively applies them to the foreground. With our settings, our RAIN module can be used as a drop-in module for existing image harmonization networks and is able to bring significant improvements. Extensive experiments on the existing image harmonization benchmark datasets shows the superior capability of the proposed method. Code is available at https://***/junleen/RainNet.
Intraoperative adverse events (iAEs) increase rates of postoperative mortality and morbidity. Identifying iAEs is important to quality assurance and postoperative care, but requires expertise, is time consuming, and e...
详细信息
Intraoperative adverse events (iAEs) increase rates of postoperative mortality and morbidity. Identifying iAEs is important to quality assurance and postoperative care, but requires expertise, is time consuming, and expensive. Automated or partially-automated techniques are, therefore, desirable. Previous work showed that conventional imageprocessing has not worked well with real-world laparoscopic videos. We present a novel modular deeplearning system that can partially automate the process of iAE screening using videos of laparoscopic procedures. The system consists of a stabilizer to reduce camera motion, a spatiotemporal feature extractor, and a multi-stage temporal convolutional neural network to detect adverse events. We apply a novel focal-uncertainty smoothing loss to handle class imbalance and to address multi-task uncertainty. The system is evaluated using 5-fold cross-validation on a large (228 hours) dataset of laparoscopic videos, and we perform ablation studies to investigate the effects of stabilization and focal-uncertainty loss. Our system achieves an AUROC of 0.952, an average precision (AP) of 0.626 in thermal injury detection, and an AUROC of 0.823 and an AP of 0.336 in bleeding detection. Our novel modular deeplearning system outperforms conventional deeplearning baselines. The model can be used as a screening tool to search for high risk events and to provide feedback for operation quality improvements and postoperative care. Source code available on GitHub: https://***/ICSSresearch/IAE-video.
The precise detection of plant centres is important for growth monitoring, enabling the continuous tracking of plant development to discern the influence of diverse factors. It holds significance for automated systems...
详细信息
ISBN:
(数字)9798350372977
ISBN:
(纸本)9798350372984
The precise detection of plant centres is important for growth monitoring, enabling the continuous tracking of plant development to discern the influence of diverse factors. It holds significance for automated systems like robotic harvesting, facilitating machines in locating and engaging with plants. In this paper, we explore the YOLOv4 (You Only Look Once) real-time neural network detector for plant centre detection. Our dataset, comprising over 12,000 images from 151 Arabidopsis thaliana accessions, is used to fine-tune the model. Evaluation of the dataset reveals the model's proficiency in centre detection across various accessions, boasting an mAP of 99.79% at a 50 % IoU threshold. The model demonstrates real-timeprocessing capabilities, achieving a frame rate of approximately 50 FPS. This outcome underscores its rapid and efficient analysis of video or image data, showcasing practical utility in time-sensitive applications.
Traditional text classifiers are limited to predicting over a fixed set of labels. However, in many real-world applications the label set is frequently changing. For example, in intent classification, new intents may ...
详细信息
COVID-19 pandemic is spreading continuously causing serious health problems. Wearing face mask is one of the prominent precautions people can easily follow. In this paper, we have built a model for face-mask detection...
详细信息
The novel coronavirus (nCoV-19) was first detected in December 2019. It had spread worldwide and was declared coronavirus disease (COVID-19) pandemic by March 2020. Patients presented with a wide range of symptoms aff...
详细信息
Significant number of modern films depict some form of tobacco use, but rarely depict its real-life consequences such as addiction, illness and death. As per [1], anti-tobacco health warnings are mandatory for scenes ...
详细信息
Measurement and analysis of Partial Discharge (PD) patterns have appeared as an emerging field in assessing insulation failure in High Voltage apparatus. This paper uses a PD signal combined with the deep convolution ...
详细信息
Measurement and analysis of Partial Discharge (PD) patterns have appeared as an emerging field in assessing insulation failure in High Voltage apparatus. This paper uses a PD signal combined with the deep convolution -optimized learning machine classifier (DC-OLMC) to predict the location of water droplets in 11 kV polymer insulators subjected to alternating currents. There are two major confront when applying the proposed algorithm: i) Contamination is a significant issue in PD signal measurement, which causes a reduction in recognition rate (RR), and ii) with minimal computing time, high-level feature extraction and recognition. Traditional condition monitoring methods of insulators concentrated on extracting fewer priority features from the input patterns. In the current work, to address this problem, an Alexnet with Bacterial Foraging Algorithm (BFO) based optimized kernel parameter classifier and Translation Invariant Wavelet Transform (TIWT) is employed to remove interference from PD signals. The analysis demonstrates that the suggested technique, with an identification rate of 99.17%, is considered a valuable tool for locating water droplets in high-voltage insulators.
The agriculture industry faces huge economic losses due to bacterial, viral or fungal infections in the crops due to which farmers lose 15 to 20% of their total profit every year. India is the second largest producer ...
详细信息
In a world where technology is developing more quickly than ever before, many of these advances have been aimed at making people's lives easier. To aid with these efforts, we are creating an integrated picture pro...
详细信息
ISBN:
(数字)9798350352689
ISBN:
(纸本)9798350352696
In a world where technology is developing more quickly than ever before, many of these advances have been aimed at making people's lives easier. To aid with these efforts, we are creating an integrated picture processor for visually impaired people. The main forms of human communication nowadays are speech and text. A person must have the vision to view text-based information. Nonetheless, persons without the ability to see can still learn things by listening. The integrated image processor is an assistive text-reading tool that uses a camera to help the blind read text on labels, printed notes, and merchandise. It entails text extraction from an image using optical character recognition (OCR) and text-to-speech (TTS) conversion to turn it into speech. This method aids the blind in reading the text and serves as the foundation for the creation of a prototype that will enable the blind to identify objects in the real world. Using Jetson Nano, the text from product descriptions is retrieved and rendered as speech, with mobility as the primary consideration. It entails text extraction from an image using optical character recognition (OCR) and text-to-speech (TTS) conversion to turn it into speech. This method aids the blind in reading the text and serves as the foundation for the creation of a prototype that will enable the blind to identify objects in the real world. Using Raspberry Pi, the text from product descriptions is retrieved and rendered as speech, with mobility as the primary consideration. By including a battery backup, portability is made possible and could be used in future technology. The user can use the device at any time and any place because of its portability. The project also has a feature that uses OpenCV and TensorFlow to recognize cash notes. Additionally, this system incorporates deeplearning-based object identification, which uses MobileNets and SSD methods for object detection, to recognize various items for recognizing barriers in front of
暂无评论