We have developed the matrix distribution library for sparse matrix solvers. there are few libraries to realize matrix distribution and reordering. It is mainly because the data structure of the large sparse matrix ca...
详细信息
ISBN:
(纸本)0769524869
We have developed the matrix distribution library for sparse matrix solvers. there are few libraries to realize matrix distribution and reordering. It is mainly because the data structure of the large sparse matrix can not be specified to one matrix format. thus the present paper assumes the matrix format as distributed Compressed Row Storage (CRS) format which is used in many sparse linear solvers. We have developed the matrix distribution library based on this format. the input and output matrix format of our library is the same matrix format, so that it can be used for repeated matrix distribution and reordering. the present paper introduces the matrix format, and the implementation of the matrix distribution library, and discusses the efficiency of the various matrix distribution methods for sparse matrix solvers.
In this paper we present a multiple classifier system for script identification. Applying a Gabor filter analysis of textures on word-level, our system identifies Latin and non-Latin words in bilingual printed documen...
详细信息
ISBN:
(纸本)0769524206
In this paper we present a multiple classifier system for script identification. Applying a Gabor filter analysis of textures on word-level, our system identifies Latin and non-Latin words in bilingual printed documents. the classfier system comprises four different architectures based on nearest neighbors, weighted Euclidean distances, Gaussian mixture models, and support vector machines. We report results for Arabic, Chinese, Hindi, and Korean script. Moreover we show that combining informational confidence values using sum-rule can consistently outperform the best single recognition rate.
Load balancing is a key technique in parallel computer supported collaborative work (CSCW) systems, parallel database system and P2P system for instance, to boost performance and improve scalability. In order to reduc...
详细信息
ISBN:
(纸本)1846000025
Load balancing is a key technique in parallel computer supported collaborative work (CSCW) systems, parallel database system and P2P system for instance, to boost performance and improve scalability. In order to reduce total cost of ownership (TCO), adaptive/self-tuning administration techniques are gradually and extensively expected in the Cyberspace. In parallel database systems, adaptive load balancing techniques are proposed to face the change in data storage patterns and access patterns in a dynamic real environment. the techniques utilized in both shared-nothing and shared-disk parallel database systems are discussed, and a general flexible framework based on collaborative agents is studied to support these techniques in botharchitectures. the framework supports two kinds of load balancing - one is passively executing query statements balancedly, and the other one is proactively adjusting data placement and task execution scheme, by means of data and task migration, whenever load unbalance is detected. three categories of agents, scheduling agents, monitoring agents and task agents, are identified in the framework the collaboration protocols and scheduling algorithms to support adaptive load balancing are described. the framework also applies to other parallel systems such as P2P systems and shared file processing systems due to their underlying commonness.
We present an efficient and practical lock-free implementation of a concurrent deque that supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previousl...
详细信息
ISBN:
(纸本)3540273247
We present an efficient and practical lock-free implementation of a concurrent deque that supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm is based on a general lock-free doubly linked list, and only requires single-word compare-and-swap atomic primitives. It also allows pointers with full precision, and thus supports dynamic deque sizes. We have performed an empirical study using full implementations of the most efficient known algorithms of lock-free deques. For systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture. In addition, the proposed solution also implements a general doubly linked list, the first lock-free implementation that only needs the single-word compare-and-swap atomic primitive.
Within the parallel computing domain, field programmable gate arrays (FPGA) are no longer restricted to their traditional role as substitutes for application-specific integrated circuits-as hardware "hidden"...
详细信息
Within the parallel computing domain, field programmable gate arrays (FPGA) are no longer restricted to their traditional role as substitutes for application-specific integrated circuits-as hardware "hidden" from the end user. Several high performance computing vendors offer parallel re configurable computers employing user-programmable FPGAs. these exciting new architectures allow end-users to, in effect, create reconfigurable coprocessors targeting the computationally intensive parts of each problem. the increased capability of contemporary FPGAs coupled withthe embarrassingly parallel nature of the Jacobi iterative method make the Jacobi method an ideal candidate for hardware acceleration. this paper introduces a parameterized design for a deeply pipelined, highly parallelized IEEE 64-bit floating-point version of the Jacobi method. A Jacobi circuit is implemented using a Xilinx Virtex-II Pro as the target FPGA device. Implementation statistics and performance estimates are presented.
In this paper, we propose a technique combining loop distribution with loop fusion to improve the timing performance without increasing the code size of the transformed loops. We first develop the loop distribution th...
详细信息
In this paper, we propose a technique combining loop distribution with loop fusion to improve the timing performance without increasing the code size of the transformed loops. We first develop the loop distribution theorems that state the conditions distributing any two-level nested loop in the maximum way. Based on the loop distribution theorems, we design an algorithm to conduct maximum loop distribution. then we propose a technique of maximum loop distribution with direct loop fusion, which performs maximum loop distribution followed by direct loop fusion. the experimental results show that the execution time of the transformed loops by our technique is reduced 41.9% on average compared to the original loops without the increase of the code size.
Many of the key features of file transfer mechanisms like reliable file transferring and parallel transferring are developed as part of the service. It makes very hard to re-use the same code for the different systems...
详细信息
this paper examines data fusion and target tracking issues involved within Net Centric Publish and Subscribe architectures with respect to the Quality of Information (QOI) provided to the end user. these architectures...
详细信息
the model of moldable task (MT) was introduced some years ago and has been proven to be an efficient way for implementing parallel applications. It considers a target application at a larger level of granularity than ...
详细信息
Most PDE-based image segmentation algorithms employ an explicit scheme to solve the system equations. However, an explicit scheme has a time step constraint and also when parallelized, data dependency is unavoidable a...
详细信息
ISBN:
(纸本)0889865280
Most PDE-based image segmentation algorithms employ an explicit scheme to solve the system equations. However, an explicit scheme has a time step constraint and also when parallelized, data dependency is unavoidable at the boundary of the region assigned to processors, which requires communication between neighboring processors to share the boundary information. Additive Operator Splitting is a semi-implicit scheme that effectively decomposes a multidimensional system into a series of independent one dimensional systems, each composed of multiple tridiagonal systems. Functional parallelism is made possible by this decomposition and within each one dimensional processing step, data parallelism is achieved by solving the independent tridiagonal systems, resulting in a nested parallelism. thus, implementation of parallelism is straightforward, and the, parallel program will be subject to less communication overhead than explicit schemes. In this paper, we employ the AOS scheme for a level set formulation of the segmentation problem, and OpenMP on a shared memory machine for its parallelization. Test results show that parallelization with OpenMP on a shared memory system with8 processors gives improved computational time with speedup of over 3 for 2-phase image segmentation.
暂无评论