In this work, we propose some techniques for overlapping the communication by the computation in the parallelization of the one-sided Jacobi method for computing the eigenvalues and the eigenvectors of a real and symm...
详细信息
High volumes of data pose a challenge to the scalability of data mining algorithms. Dividing this data into equal partitions and processing it in parallel naturally becomes a choice. Peer-to-peer computing exposes a b...
详细信息
Sensor data sets are usually collected in a centralized sensor database system or replicated cached in a distributed system to speed up query evaluation. However, a high data refresh rate disallows the usage of tradit...
详细信息
ISBN:
(纸本)354040788X
Sensor data sets are usually collected in a centralized sensor database system or replicated cached in a distributed system to speed up query evaluation. However, a high data refresh rate disallows the usage of traditional replicated approaches with its strong consistency property. Instead we propose a combination of grid computing technology with sensor database systems. Each node holds cached data of other grid members. Since cached information may become stale fast, the access to outdated data may sometimes be acceptable if the user has knowledge about the degree of inconsistency if unsynchronized data are combined. the contribution of this paper is the presentation and discussion of a model for describing inconsistencies in grid organized sensor database systems.
During the last years, we have developed a photorefractive ultrasonic sensor. this sensor is used for non-destructive testing of materials and parts. It works with one measurement point and the laser beam scans the su...
详细信息
ISBN:
(纸本)1557527555
During the last years, we have developed a photorefractive ultrasonic sensor. this sensor is used for non-destructive testing of materials and parts. It works with one measurement point and the laser beam scans the surface of the object to characterize. One possibility to increase the evaluation speed is to develop a multichannel sensor. We will present the experimental implementation and characterization of such a sensor that simply derives from the single point sensor due to principle of holography, by the simple use of a line or matrix of detectors.
this paper presents the design, implementation, and application of ParaProf, a portable, extensible, and scalable tool for parallel performance profile analysis. ParaProf attempts to offer "best of breed" ca...
详细信息
We discuss a parallel implementation of an agent-based simulation. Our approach allows to adapt a sequential simulator for large-scale simulation on a cluster of workstations. We target discrete-time simulation models...
详细信息
ISBN:
(纸本)354040788X
We discuss a parallel implementation of an agent-based simulation. Our approach allows to adapt a sequential simulator for large-scale simulation on a cluster of workstations. We target discrete-time simulation models that capture the behavior of WWW. the real-world phenomena of emerged aggregated behavior of the Internet population is studied. the system distributes data among workstations, which allows large-scale simulations infeasible on a stand-alone computer. the model properties cause traffic between workstations proportional to partition sizes. Network latency is hidden by concurrent simulation of multiple users. the system is implemented in Mozart that provides multithreading, dataflow variables, component-based software development, and network-transparency. Currently we can simulate up to 10(6) Web users on 10(4) Web sites using a cluster of 16 computers, which takes few seconds per simulation step, and for a problem of the same size, parallel simulation offers speedups between 11 and 14.
Speculative parallelism refers to searching in parallel for a solution, such as finding a pattern in a data base, where finding the first solution terminates the whole parallel process. Different performance predictio...
详细信息
parallel/Distributed application development is a very difficult task for non-expert programmers, and support tools are therefore needed for all phases of the development cycle of these kinds of application. this stud...
详细信息
Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. these interactions are affected by system parameters such as the level of multiprogramming ...
详细信息
Inherent within complex instruction set architectures such as x86 are inefficiencies that do not exist in a simpler ISAs. Modem x86 implementations decode instructions into one or more micro-operations in order to dea...
详细信息
ISBN:
(纸本)0769518710
Inherent within complex instruction set architectures such as x86 are inefficiencies that do not exist in a simpler ISAs. Modem x86 implementations decode instructions into one or more micro-operations in order to deal withthe complexity of the ISA. Since these micro-operations are not visible to the compiler the stream of micro-operations can contain redundancies even in statically optimized x86 code. Within a processor implementation, however barriers at the ISA level do not apply, and these redundancies can be removed by optimizing the micro-operation stream. In this paper we explore the opportunities to optimize code at the micro-operation granularity. We execute these micro-operation optimizations using the rePLay Framework as a microarchitectural substrate. Using a simple set of seven optimizations, including two that aggressively and speculatively attempt to remove redundant load instructions, we examine the effects of dynamic optimization of micro-operations using a trace-driven simulation environment. Simulation reveals that across a sampling of SPECint 2000 and real x86 applications, rePLay is able to reduce micro-operation count by 21% and, in particular load micro-operation count by 22%. these reductions correspond to a boost in observed instruction-level parallelism on an 8-wide optimizing rePLay processor by 17% over a non-optimizing configuration.
暂无评论