Given the substantial memory and data rate requirements for uncompressed seismic image transmission, effective compression algorithms are essential. Seismic data is crucial for two main analytical tasks: fault line de...
详细信息
ISBN:
(纸本)9798400710759
Given the substantial memory and data rate requirements for uncompressed seismic image transmission, effective compression algorithms are essential. Seismic data is crucial for two main analytical tasks: fault line detection and salt trace identification. Fault line detection aids in understanding subsurface geology, assessing seismic hazards, and identifying potential oil and gas reservoirs. Salt trace identification is important for detecting salt bodies, mapping hydrocarbon traps, and assessing drilling hazards. While significant effort has been directed toward designing compression algorithms for natural images, these methods often produce visual artifacts when applied to seismic images, reducing performance in analytical tasks such as fault line detection and salt trace identification. The unique statistical signatures and patterns in seismic images are critical for accurate analysis, making it essential to preserve semantic information during compression. In this paper, we propose a deep learning-based compression model for seismic images to improve task accuracy. This model preserves unique statistical signatures and patterns by being trained for both rate loss and task loss. Our framework, "Compressive Domain Analytics for Seismic images," performs analytical tasks like fault line detection and salt trace identification directly in the compressed domain. We demonstrated that our model achieves better rate-distortion performance for both tasks than JPEG and WebP, significantly outperforming current state-of-the-art methods. Since our model performs tasks directly in the compressed domain, it not only achieves state-of-the-art performance but also reduces computational time compared to JPEG.
Air quality estimation through sensor-based methods is widely used. Nevertheless, their frequent failures and maintenance challenges constrain the scalability of air pollution monitoring efforts. Recently, it has been...
详细信息
ISBN:
(纸本)9798400710759
Air quality estimation through sensor-based methods is widely used. Nevertheless, their frequent failures and maintenance challenges constrain the scalability of air pollution monitoring efforts. Recently, it has been demonstrated that air quality estimation can be done using image-based methods. These methods offer several advantages including ease of use, scalability, and low cost. However, the accuracy of these methods hinges significantly on the diversity and magnitude of the dataset utilized. The advancement of air quality estimation through image analysis has been limited due to the lack of available datasets. Addressing this gap, we present TRAQID - Traffic-Related Air Quality image Dataset, a novel dataset capturing 26,678 front and rear images of traffic alongside co-located weather parameters, multiple levels of Particulate Matters (PM) and Air Quality Index (AQI) values. Spanning over multiple seasons, with over 70 hours of data collection in the twin cities of Hyderabad and Secunderabad, India, the TRAQID offers diverse day and night imagery amid unstructured traffic conditions, encompassing six AQI categories ranging from "Good" to "Severe". State-of-the-art air quality estimation techniques, which were trained on a smaller and less-diverse dataset, showed poor results on the dataset presented in this paper. TRAQID models various uncertainty types, including seasonal changes, unstructured traffic patterns, and lighting conditions. The information from the two views (front and rear) of the traffic can be combined to improve the estimation performance in such challenging conditions. As such, the TRAQID serves as a benchmark for image-based air quality estimation tasks and AQI prediction, given its diversity and magnitude. Dataset Link
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images, requiring all training data for all objects to be available from the start. In dynamic environment...
详细信息
ISBN:
(纸本)9798400710759
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images, requiring all training data for all objects to be available from the start. In dynamic environments, it's impractical to gather data for all objects at once;data becomes available in phases with restrictions on past data access. Therefore, the model must reconstruct new objects while retaining the ability to reconstruct previous objects without accessing prior data. Additionally, existing 3D reconstruction methods in continual learning fail to reproduce previous shapes accurately, as they are not designed to manage changing shape information in dynamic scenes. To this end, we propose a continual learning-based 3D reconstruction method. Our goal is to design a model that can accurately reconstruct previously seen classes even after training on new ones, ensuring faithful reconstruction of both current and previous objects. To achieve this, we propose using variational distribution from the latent space, which represent abstract shapes and effectively retain shape information within a simplified code structure that requires minimal memory. Additionally, saliency maps preserve object attributes, capturing both local minor shape details and the overall shape structure. We employ experience replay to leverage these saliency maps effectively. Both methods ensure that the shape is faithfully reconstructed, preserving all minor details from the previous dataset. This is vital due to resource constraints in storing extensive training data. Thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.
Semantic segmentation plays an increasingly important role in the field of computervision. The current semantic segmentation algorithms are mainly based on full convolutional network downsampling for feature extracti...
详细信息
We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing net...
详细信息
ISBN:
(纸本)9798400710759
We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. This is compounded by the crucial need to mitigate catastrophic forgetting, particularly given the limited access to past datasets, which complicates maintaining correspondence between network parameters and datasets across all sessions. Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates and coupled disruptions in certain nodes. To address these challenges, we propose the following strategies. To reduce the storage of the dense layer parameters, we propose a parameter distribution learning method that significantly reduces the storage requirements. In the continual learning framework employing variational inference, our study introduces a regularization term that specifically targets the dynamics and population of the mean and variance of the parameters. This term aims to retain the benefits of KL divergence while addressing related challenges. To ensure proper correspondence between network parameters and the data, our method introduces an importance-weighted Evidence Lower Bound term to capture data and parameter correlations. This enables storage of common and distinctive parameter hyperspace bases. The proposed method partitions the parameter space into common and distinctive subspaces, with conditions for effective backward and forward knowledge transfer, elucidating the network-parameter dataset correspondence. The experimental results demonstrate the effectiveness of our method across diverse datasets and various combinations of sequential datasets, yielding superior performance compared to existing approaches.
Accurately segmenting brain tumors and predicting survival is crucial for diagnostic and personalized treatment plans. Recently, convolutional neural networks (CNNs) based on U-Net architecture have been extensively u...
详细信息
ISBN:
(纸本)9798400710759
Accurately segmenting brain tumors and predicting survival is crucial for diagnostic and personalized treatment plans. Recently, convolutional neural networks (CNNs) based on U-Net architecture have been extensively used to segment brain tumors. It is essential to capture both local and global dependencies to segment brain tumors accurately. Therefore, several studies have considered U-Net variants by combining a CNN and a Transformer. However, the skip connections in these Transformer-based U-Net variants do not incorporate the features from multiple scales available at the encoder. To address this issue, we propose a new 3D U-Net variant called Multi-Scale Transformer-CNN Network (MTC-Net) that incorporates a Multi-Scale Transformer Convolution (MTC) block into the skip connections. The MTC block extracts local multi-scale features from the CNN encoder blocks and global features using Swin Transformer blocks to handle the wide variability in tumor size and enhance the feature representation capability. The segmentation mask predicted by the proposed MTC-Net is subsequently used to predict a patient's overall survival time. Most studies on overall survival prediction have used handcrafted radiomic features that lack the ability to fully model complex tumor patterns. In contrast, deep features are specifically adapted to brain tumors. We observed that instead of using only a CNN, combining it with a Transformer produces deep features that provide more accurate predictions. For this purpose, we extract deep features using our Transformer-based network MTC-Net to construct a regression model for predicting overall survival. Comprehensive experimentation on the BraTS 2020 and 2021 benchmark datasets proved the efficacy of the proposed components. MTC-Net outperformed the CNN- and Transformer-based state-of-the-art segmentation networks. Moreover, our approach outperformed the state-of-the-art survival prediction systems.
Rice is a popular staple diet in India, and its demand has recently increased. Thanjavur, located in the Cauvery Delta region, is known as the rice granary of South India. Due to recent technological advancements, dig...
详细信息
ISBN:
(纸本)9789819752119;9789819752126
Rice is a popular staple diet in India, and its demand has recently increased. Thanjavur, located in the Cauvery Delta region, is known as the rice granary of South India. Due to recent technological advancements, digital farming and globalization have significantly impacted the agricultural industry. It is crucial to differentiate between types of rice grains to prevent fraudulent labeling during import and export. To achieve this, a dataset, namely "TaPaSe Dataset", comprising five varieties of rice, including MTU 1010, MTU 1290, Narmadha, Pacha Ponni, and Sonna Masur, which are mainly cultivated in Thanjavur, has been collected. We designed an image acquisition system to capture the aforementioned varieties in real time. The captured paddy rice images are highly challenging in the sense that all the images are captured under illumination and scale variations. We evaluated existing deep learning models to understand their ability to classify paddy seed varieties. The existing pre-trained models attain remarkable recognition rates on the proposed paddy seed varieties dataset.
Video Moment Retrieval (VMR) is the task of linking a query with a relevant moment from a video. Although, recently, there has been work on the VMR task where a query is linked to a single moment, the corresponding ta...
详细信息
ISBN:
(纸本)9798400710759
Video Moment Retrieval (VMR) is the task of linking a query with a relevant moment from a video. Although, recently, there has been work on the VMR task where a query is linked to a single moment, the corresponding task where the query needs to be linked to multiple moments has been understudied. In this paper, we aim to work on the VMR task primarily by leveraging chapters of YouTube videos, i.e., video segments. YouTube chapters provide a meaningful segmentation of videos annotated by content authors. These annotated segmented regions help the viewer to navigate to a specific segment of the long video in which the user is interested. We present the CHAPVIDMR (Chapter-based Video Moment Retrieval) dataset, containing 10.8K user queries (obtained using GPT4) formed using multiple chapter names and other metadata extracted from videos using the YouTube API. Furthermore, we benchmark the proposed dataset on two VMR tasks: chapter classification-based VMR and segmentation-based VMR. In the chapter classification-based VMR task, the model classifies which of the chapters associated with a video are most relevant to the query. We represent a chapter by exploiting text (subtitles), audio, and visual (captions, video) modalities using state-of-the-art feature representation techniques and experiment with an exhaustive ablation for each modality. In segmentation-based VMR, the video is divided into various segments, and the segments most likely to answer the query are identified and returned. We benchmark our dataset on the state-of-the-art methods for segmentation tasks. We find that for the chapter classification-based VMR task, Sentence-BERT with subtitles and visual captions leads to the best results, while for the segmentation-based VMR, UniVTG is the most accurate. We make our code and data publicly available(1).
In this paper we propose a method about line segment matching based on point-line invariants. We use ORB and EDlines, two efficient and stable methods, to extract point and line features respectively. Then we introduc...
详细信息
ISBN:
(纸本)9781510666313;9781510666320
In this paper we propose a method about line segment matching based on point-line invariants. We use ORB and EDlines, two efficient and stable methods, to extract point and line features respectively. Then we introduce the implementation details of our matching algorithm. It includes affine invariance of the ratio of the distance from two coplanar points to the line and pairwise constraint of the geometric relationship between two lines. In order to eliminate mismatches, we use a series of methods to optimize the result of our algorithm. We set up a scoring mechanism for candidate matches and the final matches will be given by evaluating the voting matrix. Its performance is evaluated by extensive experiments. The results show that our proposed method outperforms the mainstream methods, and are robust to rotation, scale, blur and other transformations.
This paper introduces an effective algorithm for classifying maize leaf diseases using a publicly available image dataset. While the dataset itself, consisting of maize leaf disease images, was developed by indian res...
详细信息
暂无评论