Availability of limited training remote sensing datasets is one of the problems in deep learning, as deep architectures require a large number of training samples for proper training. In this paper, we present a techn...
详细信息
ISBN:
(数字)9781665427920
ISBN:
(纸本)9781665427920
Availability of limited training remote sensing datasets is one of the problems in deep learning, as deep architectures require a large number of training samples for proper training. In this paper, we present a technique for data augmentation based on a spectral indexed generative adversarial network to train deep convolutional neural networks. This technique uses the spectral characteristic of multispectral (MS) images to support data augmentation in order to generate realistic training samples with respect to each land-use and land-cover class. The impact of multispectral remote sensing data generated through the spectral indexed GAN are evaluated through classification experiments. Experimental results obtained on the classification of the Sentinel-2 Eurosatallband datasets show that data augmentation through spectral indexed GAN enhances the main accuracy metrics.
Recently, online forums have emerged as a stage for consumers to comment and share their reviews. These user comments serve as a valuable data source for marketing professionals and analysts. Nevertheless, conventiona...
详细信息
ISBN:
(数字)9798350393804
ISBN:
(纸本)9798350393811
Recently, online forums have emerged as a stage for consumers to comment and share their reviews. These user comments serve as a valuable data source for marketing professionals and analysts. Nevertheless, conventional user interfaces often present an overwhelming volume of comments in a linear-structured list, significantly impeding the efficiency of marketing professionals in analyzing the feedback. In response to this challenge, we introduce CommentVis, an interactive visualization tool that helps users grasp the skeleton of comments’ semantic distribution. Using the tool, analysts gain detailed information about comments with profound insights. The tool leverages state-of-the-art large-scale language models, optimizing the speed and depth of analysis of substantial text data that may take a long time for marketers. To illustrate the practical application of CommentVis, we present a usage scenario that demonstrates its effectiveness in real-world marketing analysis. The tool’s impact and utility were further validated through a user study involving three marketing professionals in a global manufacturing company.
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research in full scale. Big data (BD) is becoming the main driver in analyzing these large-scale bioinformatics data. However...
详细信息
ISBN:
(纸本)9781665499569
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research in full scale. Big data (BD) is becoming the main driver in analyzing these large-scale bioinformatics data. However, this complicated process has become the system bottleneck, requiring an amalgamation of scalable approaches to deliver the needed performance and hide the deployment complexity. Utilizing cutting-edge scientific workflows can robustly address these challenges. This paper presents a Spark-based alignment workflow called SparkFlow for massive NGS analysis over singularity containers. SparkFlow is highly scalable, reproducible, and capable of parallelizing computation by utilizing data-level parallelism and load balancing techniques in HPC and Cloud environments. The proposed workflow capitalizes on benchmarking two state-of-art NGS workflows, i.e., BaseRecalibrator and ApplyBQSR. SparkFlow realizes the ability to accelerate large-scale cancer genomic analysis by scaling vertically (HyperThreading) and horizontally (provisions on-demand). Our result demonstrates a trade-off inevitably between the targeted applications and processor architecture. SparkFlow achieves a decisive improvement in NGS computation performance, throughput, and scalability while maintaining deployment complexity. The paper's findings aim to pave the way for a wide range of revolutionary enhancements and future trends within the High-performance data Analytics (HPDA) genome analysis realm.
Log-based anomaly detection is becoming more and more important for maintaining the availability of modern microservice systems. Existing supervised/semi-supervised log anomaly detection models require a large amount ...
详细信息
ISBN:
(纸本)9798350315943
Log-based anomaly detection is becoming more and more important for maintaining the availability of modern microservice systems. Existing supervised/semi-supervised log anomaly detection models require a large amount of human-labeled logs for training which are hard to collect in realworld systems. Unsupervised models often perform poorly without explicit anomaly labels. To improve the performance of unsupervised models, in this paper, we first make an empirical study of existing unsupervised models to tackle the reason why they often produce unsatisfied results. We find that anomaly detection results produced by existing unsupervised models are significantly affected by two key problems including Not-Cover (NC) problem and Suspicious-Noise (SN) problem. To solve these problems, we propose a novel augmentation framework called AFALog. AFALog leverages the idea of active learning to incorporate human knowledge so as to augment data quality. It can support almost all existing unsupervised models and improve their performance. Our experiments on two open datasets and one dataset collected from a real-world microservice system demonstrate that DALog improves the F1-score by an average of 6.61%, with only 5.9% labeled training data.
In order to plan collision-free and shortest routes for vehicles in large-scale off-road scenarios, it is necessary to have a sufficient understanding and utilization of scene information. This paper proposes an off-r...
详细信息
A crucial step in remedying faults within network infrastructure is to determine their root cause. However, the large-scale, complex and dynamic nature of modern architecture makes root cause analysis challenging. Sta...
详细信息
ISBN:
(纸本)9781665406017
A crucial step in remedying faults within network infrastructure is to determine their root cause. However, the large-scale, complex and dynamic nature of modern architecture makes root cause analysis challenging. Statistical approaches for causal inference are promising, however, their deployment has been historically limited due to their high time complexity. In this paper we propose a general framework for leveraging the concept of functional connectivity to reduce the computational overhead of causal inference algorithms. We demonstrate on synthetic data that our approach can achieve substantial speedups when combined with state-of-the-art causal discovery algorithms, with only a small cost in terms of loss of causal information in some cases.
Deep learning based models have excelled in many computer vision tasks and appear to surpass humans' performance. However, these models require an avalanche of expensive human labeled training data and many iterat...
详细信息
Deep learning based models have excelled in many computer vision tasks and appear to surpass humans' performance. However, these models require an avalanche of expensive human labeled training data and many iterations to train their large number of parameters. This severely limits their scalability to the real-world long-tail distributed categories, some of which are with a large number of instances, but with only a few manually annotated. Learning from such extremely limited labeled examples is known as Few-Shot Learning (FSL). Different to prior arts that leverage meta-learning or data augmentation strategies to alleviate this extremely data-scarce problem, this paper presents a statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the support of unlabeled instances for few-shot visual recognition. Typically, we repurpose the self-taught learning paradigm to predict pseudo-labels of unlabeled instances with an initial classifier trained from the few shot and then select the most confident ones to augment the training set to re-train the classifier. This is achieved by constructing a (Generalized) Linear Model (LM/GLM) with incidental parameters to model the mapping from (un-)labeled features to their (pseudo-)labels, in which the sparsity of the incidental parameters indicates the credibility of the corresponding pseudo-labeled instance. We rank the credibility of pseudo-labeled instances along the regularization path of their corresponding incidental parameters, and the most trustworthy pseudo-labeled examples are preserved as the augmented labeled instances. This process is repeated until all the unlabeled samples are included in the expanded training set. Theoretically, under the conditions of restricted eigenvalue, irrepresentability, and large error, our approach is guaranteed to collect all the correctly-predicted pseudo-labeled instances from the noisy pseudo-labeled set. Extensive experiments under two few-shot settings show the effe
Due to the rapid adoption of SG networks and the increasing number of devices and base stations (gNBs) connected to it, manually identifying malfunctioning machines or devices that cause a part of the networks to fail...
详细信息
ISBN:
(纸本)9781665406017
Due to the rapid adoption of SG networks and the increasing number of devices and base stations (gNBs) connected to it, manually identifying malfunctioning machines or devices that cause a part of the networks to fail becomes more challenging. Furthermore, data collected from the networks are not always sufficient. To overcome these two issues, we proposed a novel root cause analysis (RCA) framework that integrates graph neural networks (GNNs) with graph structure learning (GSL) to infer hidden dependencies from available data. The learned dependencies are the graph structure utilized to predict the root cause machines or devices. We found that despite the fact that the data is often incomplete, the GSL model can infer fairly accurate hidden dependencies from data with a large number of nodes and generate informative graph representation for GNNs to identify the root cause. Our experimental results showed that higher accuracy of identifying a root cause and victim nodes can be achieved when the number of nodes in an environment is increased.
Object detection is a core task for image analysis and interpretation and is broadly applied in applications relying on space- and airborne imagery. Like all supervised deep learning methods, training an object detect...
详细信息
ISBN:
(数字)9781665427920
ISBN:
(纸本)9781665427920
Object detection is a core task for image analysis and interpretation and is broadly applied in applications relying on space- and airborne imagery. Like all supervised deep learning methods, training an object detector generally requires a large amount of representative annotated data, which can be hard to acquire in practice. To overcome this challenge, generating synthetic data can be an option to alleviate a lack of real-world annotated data. One key influential factor for the quality of the synthetic data is the background. We show that the detectors' classifier especially depends severely on the background and has a large impact on the detection precision. Using real background is a natural option, however, we show that this naive approach has drawbacks such as a significant drop in recall. In this paper, we demonstrate that by using style transfer to match the synthetic foreground to the real background, the detector can mitigate these drawbacks and achieve a more balanced result in terms of precision and recall.
暂无评论