Paper introduces the 2-stage k-means algorithm which is faster than the standard 1-stage k-means algorithm. the main idea of the 2-stages is to move, in the first stage (fast), the centers of the clusters closer to th...
详细信息
ISBN:
(纸本)9781612843957
Paper introduces the 2-stage k-means algorithm which is faster than the standard 1-stage k-means algorithm. the main idea of the 2-stages is to move, in the first stage (fast), the centers of the clusters closer to their final locations. this will be done by using a small part of the data to achieve faster calculation. the next stage (slow) stage will start from the centers found during the first stage (fast). Different initial locations of the clusters have been used while testing the algorithms here. With bigger datasets, it is shown that the 2-stage clustering method achieves better speed-up.
the demand for so-called living or real-time data warehouses is increasing in many application areas such as manufacturing, event monitoring and telecommunications. In these fields, users normally expect short respons...
详细信息
the demand for so-called living or real-time data warehouses is increasing in many application areas such as manufacturing, event monitoring and telecommunications. In these fields, users normally expect short response times for their queries and high freshness for the requested data. However, meeting these fundamental requirements is challenging due to the high loads and the continuous flow of write-only updates and read-only queries that might be in conflict with each other. therefore, we present the concept of workload balancing by election (WINE), which allows users to express their individual demands on the quality of service and the quality of data, respectively. WINE exploits these information to balance and prioritize both types of transactions-queries and updates-according to the varying user needs. A simulation study shows that our proposed algorithm outperforms competing baseline algorithms over the entire spectrum of workloads and user requirements. (C) 2008 Elsevier B.V. All rights reserved.
We present work assisting, a novel scheduling strategy for mixing data parallelism (loop parallelism) with task parallelism, where threads share their current data-parallel activity in a shared array to let other thre...
详细信息
ISBN:
(纸本)9798400706202
We present work assisting, a novel scheduling strategy for mixing data parallelism (loop parallelism) with task parallelism, where threads share their current data-parallel activity in a shared array to let other threads assist. In contrast to most existing work in this space, our algorithm aims at preserving the structure of data parallelism instead of implementing all parallelism as task parallelism. this enables the use of self-scheduling for data parallelism, as required by certain data-parallel algorithms, and only exploits data parallelism if task parallelism is not sufficient. It provides full flexibility: neither the number of threads for a data-parallel loop nor the distribution over threads need to be fixed before the loop starts. We present benchmarks to demonstrate that our scheduling algorithm, depending on the problem, behaves similar to, or outperforms schedulers based purely on task parallelism.
overlays offer a convenient way to host an infrastructure that can scale to the size of the Internet and yet manageable. Current proposals, however, do not offer support for structuring data, other than assuming a dis...
详细信息
ISBN:
(纸本)0769521185
overlays offer a convenient way to host an infrastructure that can scale to the size of the Internet and yet manageable. Current proposals, however, do not offer support for structuring data, other than assuming a distributed hash table. In reality, both applications and users typically organize data in a structured form. One such popular structure is tree as employed in a file system, and a database. A naive approach such as hashing the pathname not only ignores locality in important operations such as file/directory lookup, but also results in uncontrollable, massive object relocations when rename on a path component occur. In this paper, we investigate policies and strategies that place a tree onto the flat storage space of P2P systems. We found that, in general, there exists a tradeoff between lookup performance and balanced storage utilization, and attempts to balance these two requirements calls for intelligent placement decision.
We consider the problem of succinctly encoding a static map to support approximate queries. We derive upper and lower bounds on the space requirements in terms of the error rate and the entropy of the distribution of ...
详细信息
ISBN:
(纸本)9780898716535
We consider the problem of succinctly encoding a static map to support approximate queries. We derive upper and lower bounds on the space requirements in terms of the error rate and the entropy of the distribution of values over keys: our bounds differ by a small constant factor. For the upper bound we introduce a novel data structure, the Bloom map, generalising the Bloom filter to this problem. the lower bound follows from an information theoretic argument.
the proceedings contain 30 papers. the topics discussed include: a worst-case and practical speedup for the RNA co-folding problem using the four-Russians idea;sparse estimation for structural variability;data structu...
ISBN:
(纸本)3642152937
the proceedings contain 30 papers. the topics discussed include: a worst-case and practical speedup for the RNA co-folding problem using the four-Russians idea;sparse estimation for structural variability;datastructures for accelerating Tanimoto queries on real valued vectors;sparsification of RNA structure prediction including pseudoknots;reconstruction of ancestral genome subject to whole genome duplication, speciation, rearrangement and loss;listing all sorting reversals in quadratic time;discovering kinship through small subsets;fixed-parameter algorithm for haplotype inferences on general pedigrees with small number of sites;haplotypes versus genotypes on pedigrees;haplotype inference on pedigrees with recombinations and mutations;identifying rare cell populations in comparative flow cytometry;design of an efficient out-of-core read alignment algorithm;and enumerating chemical organizations in consistent metabolic networks: complexity and algorithms.
Multidimensional data acquisition, processing and visualization system to analyze experimental data in nuclear physics is described. It includes a large number of sophisticated algorithms of the multidimensional spect...
详细信息
Multidimensional data acquisition, processing and visualization system to analyze experimental data in nuclear physics is described. It includes a large number of sophisticated algorithms of the multidimensional spectra processing, including background elimination, deconvolution, peak searching and fitting. (c) 2005 Elsevier B.V. All rights reserved.
In this paper we explore the topic of un-supervised learning in the presence of non-ignorable missing data with an unknown missing data mechanism. We discuss several classes of missing data mechanisms for categorical ...
详细信息
ISBN:
(纸本)097273581X
In this paper we explore the topic of un-supervised learning in the presence of non-ignorable missing data with an unknown missing data mechanism. We discuss several classes of missing data mechanisms for categorical data and develop learning and inference methods for two specific models. We present empirical results using synthetic data which show that these algorithms can recover boththe unknown selection model parameters and the underlying data model parameters to a high degree of accuracy. We also apply the algorithms to real data from the domain of collaborative filtering, and report initial results.
Architectural patterns and styles represent important design decisions and thus are valuable abstractions for architecture recovery. Recognizing them is a challenge because styles and patterns basically span several a...
详细信息
ISBN:
(纸本)0769514952
Architectural patterns and styles represent important design decisions and thus are valuable abstractions for architecture recovery. Recognizing them is a challenge because styles and patterns basically span several architectural elements and can be implemented in various ways depending on the problem domain and the implementation variants. Our approach uses source code structures as patterns and introduces an iterative and interactive architecture recovery approach built upon such lower-level patterns extracted from source code. Associations between extracted pattern instances and architectural elements such as modules arise which result in new and higher-level views of the software system. these pattern views provide information for a consecutive refinement of pattern definitions to aggregate and abstract higher-level patterns which finally enable the description of a software system's architecture.
the proceedings contain 44 papers. the topics discussed include: approximation algorithms and hardness for domination with propagation;a knapsack secretary problem with applications;an optimal bifactor approximation a...
详细信息
ISBN:
(纸本)9783540742074
the proceedings contain 44 papers. the topics discussed include: approximation algorithms and hardness for domination with propagation;a knapsack secretary problem with applications;an optimal bifactor approximation algorithm for the metric uncapacitated facility location problem;improved approximation algorithm for the spanning star forest problem;two randomized mechanisms for combinatorial auctions;improved approximation ratios for travelling salesperson tours and paths in directed graphs;stochastic Steiner tree with non-uniform inflation;on the approximation resistance of a random predicate;optimal resource augmentations for online knapsack;soft edge coloring;hardness of embedding metric spaces of equal size;maximum gradient embeddings and monotone clustering;on estimating frequency moments of data streams;on the randomness complexity of property testing;and on finding frequent elements in a data stream.
暂无评论