Palomar Transient Factory (PTF) is a comprehensive detection system for the identification and classification of transient astrophysical objects. In this paper, we make two significant contributions to the PTF pipelin...
详细信息
Palomar Transient Factory (PTF) is a comprehensive detection system for the identification and classification of transient astrophysical objects. In this paper, we make two significant contributions to the PTF pipeline. First, we present an experimental study that evaluates a novel implementation of the real-time classifier in GLADE -a parallel data processing system that combines the efficiency of a database with the extensibility of map-reduce. We show how each stage in the classifier maps optimally into GLADE tasks by taking advantage of the unique features of the system - range-based data partitioning, columnar storage, multi-query execution, and in-database support for complex aggregate computation. Second, we introduce a novel parallel similarity join algorithm for advanced transient classification. We implement this algorithm in GLADE and execute it on a massive supercomputer with more than 3,000 threads, achieving more than three orders of magnitude improvement over the PostgreSQL solution.
With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline query pr...
详细信息
With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline queryprocessing due to its power of multi-criteria decision making and analysis. Recently, there have been attempts to optimize multi-query processing in MapReduce. However, they are not appropriate to process multiple skyline queries efficiently and they also require modifications of the Hadoop internals. In this paper, we propose an efficient method for processingmulti-skyline queries with MapReduce without any modification of the Hadoop internals. Through various experiments, we show that our approach outperforms previous studies by orders of magnitude.
In a stream environment, differently from traditional databases, data arrive continuously, unindexed and potentially unbounded, whereas queries must be evaluated for producing results on the fly. In this article, we p...
详细信息
In a stream environment, differently from traditional databases, data arrive continuously, unindexed and potentially unbounded, whereas queries must be evaluated for producing results on the fly. In this article, we propose two new algorithms (called SLCAStream and ELCAStream) for processingmultiple keyword queries over XML streams. Both algorithms process keyword-based queries that require minimal or no schema knowledge to be formulated, follow the lowest common ancestor (LCA) semantics, and provide optimized methods to improve the overall performance. Moreover, SLCAStream, which implements the smallest LCA (SLCA) semantics, outperforms the state-of-the-art, with up to 49% reduction in response time and 36% in memory usage. In turn, ELCAStream is the first to explore the exclusive LCA (ELCA) semantics over XML streams. A comprehensive set of experiments evaluates several aspects related to performance and scalability of both algorithms, which shows they are effective alternatives to search services over XML streams. (C) 2016 Elsevier B.V. All rights reserved.
In this paper, we tackle the problem of processing various keyword-based queries over XML streams in a scalable way, improving recent multi-query processing approaches. We propose a customized algorithm, called MKStre...
详细信息
ISBN:
(纸本)9783319122069;9783319122052
In this paper, we tackle the problem of processing various keyword-based queries over XML streams in a scalable way, improving recent multi-query processing approaches. We propose a customized algorithm, called MKStream, that relies on parsing stacks designed for simultaneously matching several queries. Particularly, it explores the possibility of adjusting the number of parsing stacks for a better trade-off between processing time and memory usage. A comprehensive set of experiments evaluates its performance and scalability against the state-of-the-art, and shows that MKStream is the most efficient algorithm for keyword search services over XML streams.
暂无评论