this paper analyze the parallelcomputing environment overhead of OpenMP for loop with multi-core processors including the case of data-race. the different solutions of data-race are discussed in present paper, such a...
详细信息
Many attempts have been made to optimize the median filter from the software and hardware approach. An architectural design of hardware capable of performing real-time median filtering is presented. the architecture u...
详细信息
ISBN:
(纸本)9780889868205
Many attempts have been made to optimize the median filter from the software and hardware approach. An architectural design of hardware capable of performing real-time median filtering is presented. the architecture uses the histogram approach to calculate the median, while optimizing the sliding window method to reuse all its calculations. Data is output row by row and every input pixel is processed only once. the design is independent of window size or image size, and supports adding more processing elements to support wider images. the control unit design is minimized to enable self-adjustment of plug-and-play processing elements. the architecture is implemented in VHDL and synthesized to a Virtex-2 Pro FPGA. the architecture's performance as well as operation is compared to previous work.
Markov clustering is becoming a key algorithm within bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as canc...
详细信息
ISBN:
(纸本)9780769542652
Markov clustering is becoming a key algorithm within bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, Graphics Processing (GPU) computing, which uses a massively parallelcomputing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. this paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallelcomputing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. the key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilising ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope withthe sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallelcomputing environments, such as Message Passing Interface (MPI). Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published (relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4 times and 9 were obtained, respectively. On the Human Protein Reference Database, the
this paper proposes Message Flow Simulator (MFS), which evaluates the communication algorithms for inter- connection network of large-scale parallel computer. MFS calculates communication time from the amount of messa...
详细信息
ISBN:
(纸本)9780889868205
this paper proposes Message Flow Simulator (MFS), which evaluates the communication algorithms for inter- connection network of large-scale parallel computer. MFS calculates communication time from the amount of message flow on communication links. To show the characteristics of MFS, we presented the run time of MFS and the estimated virtual communication time on fat-tree networks including up to 3456 nodes for all-to-all communication. We compared MFS with Booksim 2.0 developed at Stanford University, which is an existing flit-level simulator. the ratio of the throughput of the network estimated by MFS differs from that estimated by Booksim (500 flits/packet) by 2.1% on average (3.6% at maximum and 1.2% at minimum). the simulation results of Booksim were close to the results of MFS when a packet consisted of many flits. We conclude that MFS provides the simulation results that reflect characteristics of the communication algorithms.
In two-sided channel routing on a VLSI chip it is often convenient to represent signal nets by trapezoids. In this representation the four corners of the trapezoids are the rightmost and left-most terminals on the upp...
详细信息
ISBN:
(纸本)9780889868205
In two-sided channel routing on a VLSI chip it is often convenient to represent signal nets by trapezoids. In this representation the four corners of the trapezoids are the rightmost and left-most terminals on the upper side and lower side of the channel respectively. the maximum set of nonintersecting trapezoids is of particular interest since corresponding signal nets can be safely assigned to the same layer in the channel routing. Although a sequential algorithm to compute maximum independent set of trapezoids is known, the sweep line approach employed by the sequential algorithm is incremental in nature and does not yield itself to a parallel solution. In this paper we use three new ideas to find the maximum independent set in parallel. First, for every comparable pair of trapezoids we introduce a new unique 'in-between' trapezoid. Next, the trapezoids are mapped to their canonical box representation, and finally, a new parallel operation called 'corner stitching' is applied on boxes to construct chains of boxes which define the independent set. the algorithm presented here is deterministic and is designed to run on a Concurrent Read Concurrent Write parallel random access machine(CRCW-PRAM). the algorithm runs in O(log n) time with O(n2) processors.
We have developed a network (called TPNET) which is adaptable for any parallel processing systems. It consists of several core processors and a router. A process element in a parallel processing system is a processor ...
详细信息
ISBN:
(纸本)9780889868205
We have developed a network (called TPNET) which is adaptable for any parallel processing systems. It consists of several core processors and a router. A process element in a parallel processing system is a processor called TPCORE2, which has been developed by the authors' group. Since this core processor can execute full set of the transputer instruction set, we can describe a software system using the parallel processing language occam. Occam is based on theoretically a model called Communicating Sequential Processes (CSP). If a parallel system can be described in occam language, and work fine, it will be regarded as free from any deadlocks or livelocks which will be intrinsically hidden in a parallel system. We can construct simply a secure parallel processing system in this way. Each processor can be connected to a router, and we can achieve a dynamic configuration of the network topology by controlling the router. the basic communication protocol in TPNET is IEEE 1355. An assured and efficient network can be constructed despite the structural simplicity of the protocol. With characteristics discussed above and with an efficient interrupt processing system in TPCORE2, we propose this TPNET as a basic framework for high performance embedded systems used widely in various industrial fields.
High level context recognition and situation detection are enabling technologies for unobtrusive mobile computing systems. Significant progress has been made in processing and managing context information, leading to ...
详细信息
ISBN:
(纸本)9781424477425
High level context recognition and situation detection are enabling technologies for unobtrusive mobile computing systems. Significant progress has been made in processing and managing context information, leading to sophisticated frameworks, middlewares, and algorithms. Despite great improvements, context aware systems still require a significantly increased recognition accuracy for high-level context information on uncertain sensor data to enable the robust execution of context-aware applications. Recently Adaptable Pervasive Work-flows (APF)s have been presented as innovative programming paradigm for mobile context-aware applications. We propose a novel Flow Context System (FlowCon) that builds upon APFs. FlowCon uses structural information from the APF to increase accuracy of uncertain high-level context information up to 49%. this way we make an important step to enable robust execution of mobile context-aware applications.
Many medical applications utilise distributed/parallelcomputing in order to cope with demands of large data or computing power requirements. In this paper, we present a new version of the XtremWeb-CH (XWCH) platform,...
详细信息
ISBN:
(纸本)9781607505839
Many medical applications utilise distributed/parallelcomputing in order to cope with demands of large data or computing power requirements. In this paper, we present a new version of the XtremWeb-CH (XWCH) platform, and demonstrate two medical applicationsthat run on XWCH. the platform is versatile in a way that it supports direct communication between tasks. When tasks cannot communicate directly, warehouses are used as intermediary nodes between "producer" and "consumer" tasks. New features have been developed to provide improved support for writing powerfull distributedapplications using an easy API.
In this paper, we propose an algorithm for parallel sorting on Recursive Dual-Net with an m-cube (Qm) as its base network. the Recursive Dual-Net RDNκ(Qm) for k > 0 has 2 2κm+2κ-1 nodes and m+ κ links per node....
详细信息
A typical group mutual exclusion algorithm among m groups makes use of anm-group coterie, which determines the performance of the algorithm. there are two main performance measures: the availability is the probability...
详细信息
ISBN:
(纸本)9780889868205
A typical group mutual exclusion algorithm among m groups makes use of anm-group coterie, which determines the performance of the algorithm. there are two main performance measures: the availability is the probability that an algorithm tolerates process crash failures, and the concurrency is the number of processes that it allows simultaneous access to the resources. Since non-dominated (ND, for short) m-group coteries (locally) maximize the availability and their degrees roughly correspond to the concurrency, methods to construct ND m-group coteries with large degrees are looked for. Nevertheless, only a few naive methods have been proposed. this paper presents three methods to construct desirable m-group coteries. the first method constructs an ND m-group coterie from a dominated one using the transversal composition. the second one constructs an ND (m - 1)-group coterie from an ND m-group coterie. the last one uses the coterie join operation to produce an ND m-group coterie from an ND coterie and another ND mgroup coterie. these methods preserve the degrees of the original m-group coteries.
暂无评论