Coreference resolution aims to identify words and phrases which refer to the same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form...
ISBN:
(纸本)9798350307184
Coreference resolution aims to identify words and phrases which refer to the same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First, we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentences without coreferring expressions or labeled chains. We propose a new technique that learns to identify coreference chains using weak supervision, only from imagetext pairs and a regularization using prior linguistic knowledge. Our model yields large performance gains over several strong baselines in resolving coreferences. We also show that coreference resolution helps improve grounding narratives in images.
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are cr...
详细信息
ISBN:
(数字)9783031581816
ISBN:
(纸本)9783031581809;9783031581816
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computervision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (https://***/ beasthunter758/GradEML).
image captioning is a challenging task that lies at the intersection of computervision and Natural Language processing. There exists a legion of works that generate meaningful and realistic descriptions of images. Re...
详细信息
We introduce COOL-CHIC, a Coordinate-based Low Complexity Hierarchical image Codec. It is a learned alternative to autoencoders with 629 parameters and 680 multiplications per decoded pixel. COOL-CHIC offers compressi...
ISBN:
(纸本)9798350307184
We introduce COOL-CHIC, a Coordinate-based Low Complexity Hierarchical image Codec. It is a learned alternative to autoencoders with 629 parameters and 680 multiplications per decoded pixel. COOL-CHIC offers compression performance close to modern conventional MPEG codecs such as HEVC and is competitive with popular autoencoder-based systems. This method is inspired by Coordinate-based Neural Representations, where an image is represented as a learned function which maps pixel coordinates to RGB values. The parameters of the mapping function are then sent using entropy coding. At the receiver side, the compressed image is obtained by evaluating the mapping function for all pixel coordinates. COOL-CHIC implementation is made open-source(1).
Traditional remote sensing imageprocessing is not able to provide timely information for near real-time applications due to the hysteresis of satellite-ground mutual communication and low processing efficiency. On-bo...
详细信息
This research presents a new approach for blind single-image transparency separation, a significant challenge in imageprocessing. The proposed framework divides the task into two parallel processes: feature separatio...
详细信息
ISBN:
(纸本)9781728198354
This research presents a new approach for blind single-image transparency separation, a significant challenge in imageprocessing. The proposed framework divides the task into two parallel processes: feature separation and image reconstruction. The feature separation task leverages two deep image prior (DIP) networks to recover two distinct layers. An exclusion loss and deep feature separation loss are used to decompose features. For the image reconstruction task, we minimize the difference between the mixed image and the re-mixed image while also incorporating a regularizer to impose natural priors on each layer. Our results indicate that our method performs comparably or outperforms state-of-the-art approaches when tested on various image datasets.
In view of the insufficient ability of the currently existing deep learning-based methods to repair image high-frequency information and the small sensory field of the traditional convolutional methods. A two-stage im...
详细信息
In the AI applications for natural language definitions, image captioning is a field that is expanding quickly. It attempts to capture meaningful interpretations of the interactions between the acquired picture data f...
详细信息
The complex underwater environment and the absorption and scattering of light in the water lead to color degradation and loss of detail during the underwater imaging process. To address these problems, we propose a si...
详细信息
Near-Infrared (NIR) images are widely used in a variety of low-light situations for security and safety applications. A colorised version of NIR images provide better image understanding and interpretation of features...
详细信息
ISBN:
(数字)9783031581816
ISBN:
(纸本)9783031581809;9783031581816
Near-Infrared (NIR) images are widely used in a variety of low-light situations for security and safety applications. A colorised version of NIR images provide better image understanding and interpretation of features. Because the number of NIR-RGB paired datasets is limited and often unavailable, a method to convert a given NIR image to an RGB image is highly desirable. The present work proposes an unsupervised image to image translation technique for generating colorized images (UGCI) for transforming an input NIR image to an RGB image. UGCI outperforms present NIR-RGB colorizing models and have shown approximately 57% improvement in terms of Frechet inception distance (FID) with reduced training time and less memory usage. Finally, a thorough comparative study based on different datasets is carried out to confirm superiority over leading colorization approaches in qualitative and quantitative assessments.
暂无评论