A parallelprocessing system is being developed. Unlike conventional parallelarchitectures, the parallelism is at the level of an algorithm. A sorting algorithm for this machine is presented. the architecture of the ...
详细信息
A parallelprocessing system is being developed. Unlike conventional parallelarchitectures, the parallelism is at the level of an algorithm. A sorting algorithm for this machine is presented. the architecture of the parallelprocessing system consists of an elevated single instruction multiple data (SIMD) model that works at the level of a basic algorithm. Viewed from another angle, the architecture is of the multiple instruction multiple data (MIMD) type, since synchronization is not at the instruction level. In reality, it is midway between SIMD and MIMD, which the author calls an SAMD (same algorithm multiple data) architecture. the philosophy is to divide the problem into a number of algorithms.
Searching frequent itemset in large size diverse database is one of the most important data mining problem and as existing algorithms are insufficient in mechanism that enables automatic parallelization, fault toleran...
详细信息
ISBN:
(纸本)9781728140421
Searching frequent itemset in large size diverse database is one of the most important data mining problem and as existing algorithms are insufficient in mechanism that enables automatic parallelization, fault tolerance and data distribution. Solution to this issue we design algorithm using MapReduce programming model. the overarching aim is to enhance the performance of parallel frequent itemset mining on Hadoop. Incorporating ultra-metric tress to improve more efficiency of mining frequent itemset and comparing Apriori algorithm and FP-Growth algorithm based on some parameters. We implement the algorithm with dataset of Market Basket Analytics
We focus on agent-based simulations where a large number of agents move in the space, obeying to some simple rules. Since such kind of simulations are computational intensive, it is challenging, for such a contest, to...
详细信息
ISBN:
(纸本)9780769543284
We focus on agent-based simulations where a large number of agents move in the space, obeying to some simple rules. Since such kind of simulations are computational intensive, it is challenging, for such a contest, to let the number of agents to grow and to increase the quality of the simulation. A fascinating way to answer to this need is by exploiting parallelarchitectures. In this paper, we present a novel distributed load balancing schema for a parallel implementation of such simulations. the purpose of such schema is to achieve an high scalability. Our approach to load balancing is designed to be lightweight and totally distributed: the calculations for the balancing take place at each computational step, and influences the successive step. To the best of our knowledge, our approach is the first distributed load balancing schema in this context. We present boththe design and the implementation that allowed us to perform a number of experiments, with up-to 1, 000, 000 agents. Tests show that, in spite of the fact that the load balancing algorithm is local, the workload distribution is balanced while the communication overhead is negligible.
Emerging trends in computer design attempt to include specific solutions for handling images also in general-purpose computers, because of the current spread of multimedia, image processing and computer graphics appli...
详细信息
ISBN:
(纸本)0818691948
Emerging trends in computer design attempt to include specific solutions for handling images also in general-purpose computers, because of the current spread of multimedia, image processing and computer graphics applications. In this context, this paper proposes hardware pre-fetching techniques specific for caching images: the main issue we state is that most algorithms working opt images exhibit a 2D spatial locality that is not taken into account in current cache organization and data access strategies. To this aim we propose an adaptive local pre-fetching for the image data type;this technique, mirroring the two-dimensional spatial locality of image processingalgorithms, results to be more efficient than other approaches, such as sequential pre-fetching and adaptive pre-fetching. Performance is evaluated on different classes of image processingalgorithms, namely raster-scan and propagative algorithms, common in computer vision and multimedia applications.
there are only three real "dimensions" to processor performance increases beyond Moore's law: clock frequency, superscalar instruction issue, and multiprocessing. the first two have been pushed to their ...
详细信息
ISBN:
(纸本)9783540729044
there are only three real "dimensions" to processor performance increases beyond Moore's law: clock frequency, superscalar instruction issue, and multiprocessing. the first two have been pushed to their logical limits and we must focus on multiprocessing. SMT (simultaneous multithreading) [1] and CMP(chip multiprocessing) [2] are two architectural approaches to exploit thread-level parallelism using available on-chip resources. SMT processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP (instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. the performance is evaluated using the NAS parallel benchmarks.(1)
Graphics processing units (GPUs) show very high performance when executing many parallel programs;however their use in solving linear recurrence equations is considered difficult because of the sequential nature of th...
详细信息
ISBN:
(纸本)9781450364942
Graphics processing units (GPUs) show very high performance when executing many parallel programs;however their use in solving linear recurrence equations is considered difficult because of the sequential nature of the problem. Previously developed parallelalgorithms, such as recursive doubling and multi-block processing, do not show high efficiency in GPUs because of poor scalability withthe number of threads. In this work, we have developed a highly efficient GPU-based algorithm for recurrences using a thread-level parallel (TLP) approach, instead of conventional thread-block level parallel (TBLP) methods. the proposed TLP method executes all of the threads as independently as possible to improve the computational efficiency and employs a hierarchical structure for inter-thread communication. Not only constant but also time-varying coefficient recurrence equations are implemented on NVIDIA GTX285, GTX580 and GTX TITAN X GPUs, and the performances are compared withthe results on single-core and multi-core SIMD CPU-based PCs.
A novel architecture for the H.264/AVC deblocking filter is proposed. It includes three filtering units to filter in parallelthe luma and chroma components. Also, a proper two dimensional filtering order for the luma...
详细信息
ISBN:
(纸本)9781479987481
A novel architecture for the H.264/AVC deblocking filter is proposed. It includes three filtering units to filter in parallelthe luma and chroma components. Also, a proper two dimensional filtering order for the luma edges and a one-dimensional filtering order for the chroma edges are presented. the architecture achieves 750 MHz in 560 MHz in 90nm and 130 nm, respectively, and requires 76 cycles to filter each MB. Compared to existing architectures, it outperforms them in frequency and throughput, while it is the only one that achieves over 60 Fps in 8K-UHD resolution.
the proceedings contain 4 papers. the topics discussed include: cache size in a cost model for heterogeneous skeletons;an efficient skew-insensitive algorithm for join processing on grid architectures;formally specify...
ISBN:
(纸本)9781450308625
the proceedings contain 4 papers. the topics discussed include: cache size in a cost model for heterogeneous skeletons;an efficient skew-insensitive algorithm for join processing on grid architectures;formally specifying and analyzing a parallel virtual machine for lazy functional languages using Maude;and type system for a safe execution of parallel programs in BSML.
In this paper we report on the recent progress in computing bivariate polynomial resultants on Graphics processing Units (GPU). Given two polynomials in Z[x, y], our algorithm first maps the polynomials to a prime fie...
详细信息
ISBN:
(纸本)9783642131189
In this paper we report on the recent progress in computing bivariate polynomial resultants on Graphics processing Units (GPU). Given two polynomials in Z[x, y], our algorithm first maps the polynomials to a prime field. then, each modular image is processed individually. the GPU evaluates the polynomials at a number of points and computes univariate modular resultants in parallel. the remaining "combine" stage of the algorithm is executed sequentially on the host machine. Porting this stage to the graphics hardware is an object of ongoing research. Our algorithm is based on an efficient modular arithmetic from [1]. Withthe theory of displacement structure we have been able to parallelize the resultant algorithm up to a very fine scale suitable for realization on the GPU. Our benchmarks show a substantial speed-up over a host-based resultant algorithm [2] from CGAL (***).
In this paper we present a novel and complete approach on how to encapsulate parallelism for relational database query execution that strives for maximum resource utilization for both CPU and disk activities. Its simp...
详细信息
ISBN:
(纸本)9783540695004
In this paper we present a novel and complete approach on how to encapsulate parallelism for relational database query execution that strives for maximum resource utilization for both CPU and disk activities. Its simple and robust design is capable of modeling intra- and inter-operator parallelism for one or more parallel queries in a most natural way. In addition, encapsulation guarantees that the bulk of relational operators can remain unmodified, as long as their implementation is thread-safe. We will show, that withthis approach, the problem of scheduling parallel tasks is generalized, so that it can be safely entrusted to the underlying operating system (OS) without suffering any performance penalties. On the contrary, relocation of all scheduling decisions from the DBMS to the OS guarantees a centralized and therefore near-optimal resource allocation (depending on the OS's abilities) for the complete system that is hosting the database server as one of its tasks. Moreover, withthis proposal, query parallelization is fully transparent on the SQL interface of the database system. Configuration of the system for effective parallel query execution can be adjusted by the DB administrator by setting two descriptive tuning parameters. A prototype implementation has been integrated into the Transbase (R) relational DBMS engine.
暂无评论