the proceedings contain 36 papers. the special focus in this conference is on Data Mining, Data Streams, database Storage and Spatio-Temporal Data. the topics include: Leveraging homomorphisms and bitmaps to enable th...
ISBN:
(纸本)9783319181196
the proceedings contain 36 papers. the special focus in this conference is on Data Mining, Data Streams, database Storage and Spatio-Temporal Data. the topics include: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees;cold-start expert finding in community question answering via graph regularization;mining itemset-based distinguishing sequential patterns with gap constraint;adaptive grid-based k-median clustering of streaming data with accuracy guarantee;grouping methods for pattern matching in probabilistic data streams;fast similarity search of multi-dimensional time series via segment rotation;measuring the influence from user-generated content to news via cross-dependence topic modeling;a high-performance key-value store design for massive hybrid storage;an efficient design and implementation of multi-level cache for databasesystems;detecting hotspots from trajectory data in indoor spaces;on efficient passenger assignment for group transportation;effective and efficient predictive density queries for indoor moving objects;efficient trip planning for maximizing user satisfaction;accelerating search of protein sequence databases using CUDA-enabled GPU;fast subgraph matching on large graphs using graphics processors;process-driven configuration of federated cloud resources;an integrated tag recommendation algorithm towards weibo user profiling;a comparative study of team formation in social networks;inferring diffusion networks with sparse cascades by structure transfer;repairing functional dependency violations in distributed data;provenance-aware entity resolution: leveraging provenance to improve quality and privacy-preserving top-k spatial keyword queries over outsourced database.
the proceedings contain 74 papers. the special focus in this conference is on Outlier and Imbalanced Data Analysis, Probabilisstic and Uncertain Data, Data Mining II and Spatio-temporal Data II. the topics include: Fa...
ISBN:
(纸本)9783319181226
the proceedings contain 74 papers. the special focus in this conference is on Outlier and Imbalanced Data Analysis, Probabilisstic and Uncertain Data, Data Mining II and Spatio-temporal Data II. the topics include: Fast and scalable outlier detection with approximate nearest neighbor ensembles;efficient queries evaluation on block independent disjoint probabilistic databases;tracing errors in probabilistic databases based on the bayesian network;mining frequent spatial-textual sequence patterns;retaining rough diamonds;spatial keyword range search on trajectories;an efficient skyline algorithm for all seasons;towards order-preserving submatrix search and indexing;tree contraction for compressed suffix arrays on modern processors;pricing strategies for maximizing viral advertising in social networks;boosting financial trend prediction with twitter mood based on selective hidden Markov models;interactive, flexible, and generic what-if analyses using in-memory column stores;invariant event tracking on social networks;mining correlations on massive bursty time series collections;grouping methods for pattern matching in probabilistic data streams;the gaussian bloom filter;detecting hotspots from trajectory data in indoor spaces;effective and efficient predictive density queries for indoor moving objects;accelerating search of protein sequence databases using CUDA-enabled GPU;an efficient approach of overlapping communities search;scalable inclusion dependency discovery;minimizing user effort with diversity awareness;bichromatic reverse nearest neighbor query without information leakage and authentication of reverse k nearest neighbor query.
In this paper, we develop a throughput Oriented Framework (TOF) for efficient processing of spatiotemporal queries in multicore environment. Traditional approaches to spatial query processing were focused on reduction...
详细信息
ISBN:
(纸本)9783319181233;9783319181226
In this paper, we develop a throughput Oriented Framework (TOF) for efficient processing of spatiotemporal queries in multicore environment. Traditional approaches to spatial query processing were focused on reduction of query latency. In real world, most LBS applications emphasize throughput rather than query latency. TOF is designed to achieve maximum throughput. Instead of resorting to complex indexes, TOF chooses to execute a batch queries at each run, so it can maximize data locality and parallelism on multi-core platforms. Using TOF, we designed algorithms for processing range queries and kNN queries respectively. Experimental study shows that these algorithms outperform the existing approaches significantly in terms of throughput.
the infamous Wikileaks cables are a large-scale resource for analyzing international relationships. We use sentiment analysis on this dataset to extract opinion polarities in the international scenario. We use an unsu...
详细信息
ISBN:
(纸本)9783319223247;9783319223230
the infamous Wikileaks cables are a large-scale resource for analyzing international relationships. We use sentiment analysis on this dataset to extract opinion polarities in the international scenario. We use an unsupervised approach based on standard sentiment lexicon with modifiers to mine opinion polarities among the cables to and from embassies/consulates of USA. Sharp changes in opinion polarities are mapped to international events happening around the time of the cable at the location of the embassy/consulate, and a positive/negative correlation is drawn. the dataset consists of 232,410 cables from 1966 up to October 2009 concerning 272 embassies and consulates across the world. the top 28 of the spikes/dips in polarity changes coming from 20 embassies/consulates are then evaluated. Our results show that there is a strong correlation (76%) between our findings and sentiments surrounding actual events. For example, our study was able to correctly identify suicide terrorist attacks outside the American embassy in Casablanca. It could also highlight a cable that referred to a terrorist who was later arrested in New Delhi possessing secret documents related to Indian Army.
Withthe increasing development of indoor positioning technologies such as Wifi and RFID, indoor location based services (LBSs) has been a hot topic in recent years. Differing from GPS-based outdoor LBSs, we lack suff...
详细信息
ISBN:
(纸本)9783319223247;9783319223230
Withthe increasing development of indoor positioning technologies such as Wifi and RFID, indoor location based services (LBSs) has been a hot topic in recent years. Differing from GPS-based outdoor LBSs, we lack sufficient indoor maps which are the foundation of indoor LBSs. In this paper, we present a database approach to extract indoor spatial objects, e.g., rooms and doors, from CAD models, and then transform them into an indoor moving-object database. Withthis mechanism, we are able to efficiently generate indoor maps and support indoor-space queries. In addition, we implement a prototype system to demonstrate the feasibility of our proposal. It shows that our approach has a high precision on extracting indoor spatial objects and can support indoor spatial queries effectively.
Data in probabilistic databases may not be absolutely correct, and worse, may be erroneous. Many existing data cleaning methods can be used to detect errors in traditional databases, but they fall short of guiding us ...
详细信息
ISBN:
(纸本)9783319181233;9783319181226
Data in probabilistic databases may not be absolutely correct, and worse, may be erroneous. Many existing data cleaning methods can be used to detect errors in traditional databases, but they fall short of guiding us to find errors in probabilistic databases, especially for databases with complex correlations among data. In this paper, we propose a method for tracing errors in probabilistic databases by adopting Bayesian network (BN) as the framework of representing the correlations among data. We first develop the techniques to construct an augmented Bayesian network (ABN) for an anomalous query to represent correlations among input data, intermediate data and output data in the query execution. Inspired by the notion of blame in causal models, we then define a notion of blame for ranking candidate errors. Next, we provide an efficient method for computing the degree of blame for each candidate error based on the probabilistic inference upon the ABN. Experimental results show the effectiveness and efficiency of our method.
Modern databases tailored to highly distributed, fault tolerant management of information for big data applications exploit a classical data structure for reducing disk and network I/O as well as for managing data dis...
详细信息
ISBN:
(纸本)9783319181202;9783319181196
Modern databases tailored to highly distributed, fault tolerant management of information for big data applications exploit a classical data structure for reducing disk and network I/O as well as for managing data distribution: the Bloom filter. this data structure allows to encode small sets of elements, typically the keys in a key-value store, into a small, constant-size data structure. In order to reduce memory consumption, this data structure suffers from false positives which lead to additional I/O operations and are therefore only harmful with respect to performance. Withthis paper, we propose an extension to the classical Bloom filter construction which facilitates the use of floating point coprocessors and GPUs or additional main memory in order to reduce false positives. the proposed data structure is compatible withthe classical construction in the sense that the classical Bloom filter can be extracted in time linear to the size of the data structure and that the Bloom filter is a special case of our construction. We show that the approach provides a relevant gain with respect to the false positive rate. Implementations for Apache Cassandra, C++, and NVIDIA CUDA are given and support the feasibility and results of the approach.
the most energy-efficient configuration of a single-server DBMS is the highest performing one, if we exclusively focus on specific applications where the DBMS can steadily run in the peak-performance range. However, t...
详细信息
ISBN:
(纸本)9783319058108;9783319058092
the most energy-efficient configuration of a single-server DBMS is the highest performing one, if we exclusively focus on specific applications where the DBMS can steadily run in the peak-performance range. However, typical DBMS activity levels-or their average system utilization-are much lower and their energy use is far from being energy proportional. Built of commodity hardware, WattDB-a distributed DBMS-runs on a cluster of computing nodes where energy proportionality is approached by dynamically adapting the cluster size. In this work, we combine our previous findings on energy-proportional storage layers and query processing into a single, transactional DBMS. We verify our vision by a series of benchmarks running OLTP and OLAP queries with varying degrees of parallelism. these experiments illustrate that WattDB dynamically adjusts to the workload present and reconfigures itself to satisfy performance demands while keeping its energy consumption at a minimum.
暂无评论