It is desirable to find unusual data objects by Ramaswamy et al.'s distance-based outlier definition, because only a metric distance function between two objects is required. This definition does not need any neig...
详细信息
It is desirable to find unusual data objects by Ramaswamy et al.'s distance-based outlier definition, because only a metric distance function between two objects is required. This definition does not need any neighborhood distance threshold required by many existing algorithmsbased on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques, respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS, and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4-2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size. (C) 2010 Elsevier B.V. All rights reserved.
In this paper we present data structures and distributed algorithms for CSL model checking-based performance and dependability evaluation. We show that all the necessary computations are composed of series or sums of ...
详细信息
In this paper we present data structures and distributed algorithms for CSL model checking-based performance and dependability evaluation. We show that all the necessary computations are composed of series or sums of matrix-vector products. We discuss sparse storage structures for the required matrices and present efficient sequential and distributed disk-based algorithms for performing these matrix-vector products. We illustrate the effectivity of our approach in a number of case studies in which continuous-time Markov chains (generated in a distributed way from stochastic Petri net specifications) with several hundreds of millions of states are solved on a workstation cluster with 26 dual-processor nodes. We show details about the memory consumption, the solution times, and the speedup. The distributed message-passing algorithms have been implemented in a tool called PARSECS, that also takes care of the distributed Markov chain generation and that can also be used for distributed CTL model checking of Petri nets.
We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce...
详细信息
We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce two pruning algorithms. These algorithms differ on the extent of materialization of shortest path cost and on how the search space is pruned. By grouping shortest path queries properly, batch processing improves the performance of shortest path query evaluation. Extensive study is also done on fragment sizes, cache sizes and query types that we show that affect the performance of a disk-based shortest path algorithm. The performance and scalability of proposed techniques are evaluated with large road systems in the Eastern United States. To demonstrate that the proposed disk-based algorithms are viable, we show that their search times are significant better than that of main-memory Dijkstra's algorithm.
暂无评论