Concentrators and generalized-concentrators are interconnection networks that provide respectively pairwise vertex-disjoint directed paths and trees to satisfy interconnection requests. An interconnection network is n...
详细信息
Concentrators and generalized-concentrators are interconnection networks that provide respectively pairwise vertex-disjoint directed paths and trees to satisfy interconnection requests. An interconnection network is non-blocking in the strict sense if every compatible interconnection request can be satisfied by a path regardless of any existing interconnections. We present an interconnection property equivalent to the generalized-concentration with constrained network capacity and request multiplicity in the strictly non-blocking context, and show a polynomial-time computational complexity result for deciding the strictly non-blocking generalized-concentration properties with constrained network parameters, by using b-matching techniques.
Many naive parallel processing schemes were not successful as many researchers thought, because of the heavy cost of communication and synchronization resulting from parallelization. In this paper, we will identify th...
详细信息
Many naive parallel processing schemes were not successful as many researchers thought, because of the heavy cost of communication and synchronization resulting from parallelization. In this paper, we will identify the reasons for the poor performance and the compiler requirements for performance improvement. We realized that the decisions for parallelizing should be derived by the overhead information. We added this idea to the automatic parallelizing compiler, SUIF. We substitute the original backend of SUIF with our backend using MPI, and gave it the capability of validating of parallelization decisions based on overhead parameters. This backend converts shared-memory based parallel program into distributed-memory based parallel program with MPI function calls without excessive parallelization that causes performance degradation.
This paper presents a randomized parallel algorithm for the Maximal Independent Set problem. Our algorithm uses a BSP-like computer with p processors and requires that n+m/p = Ω(p) for a graph with n vertices and m e...
详细信息
This paper presents a randomized parallel algorithm for the Maximal Independent Set problem. Our algorithm uses a BSP-like computer with p processors and requires that n+m/p = Ω(p) for a graph with n vertices and m edges. Under this scalability assumption, and after a preprocessing phase, it computes a maximal independent set after O(log p) communication rounds, with high probability, each round requiring linear computation time O(n+m/p). The preprocessing phase is deterministic and important in order to ensure that degree computations can be implemented efficiently. For this, we give an optimal parallel BSP/CGM algorithm to the p-quantiles search problem, which runs in O(m log p/p) time and a constant number of communication rounds, and could be of interest in its own right, as shown in the text.
The main problem of fractal image compression is the long search time of the domain pool. For this reason, the dedicated ASIC architecture for fractal image coding is needed. In this paper, we propose an efficient par...
详细信息
The main problem of fractal image compression is the long search time of the domain pool. For this reason, the dedicated ASIC architecture for fractal image coding is needed. In this paper, we propose an efficient parallel architecture for fractal image coding which is based on fixed-size full-search algorithm. One of the main features of this architecture is that it uses only local communication such that each processor has a range and a domain block which is shifted to the next processor. Another main feature is that it has very regular interconnections and data flow. Domain blocks are formed from the range blocks in processors and the encoding procedure is performed by the regular data flow of domain blocks into the other processors. Each processor performs the fast isometric transformations which are calculated by one full rotation around the center.
We present two parallel algorithms for computing the nearest neighbors of an n/spl times/n binary image on the Bulk-Synchronous parallel (BSP) model. The first algorithm is for weighted distance, and the second algori...
详细信息
We present two parallel algorithms for computing the nearest neighbors of an n/spl times/n binary image on the Bulk-Synchronous parallel (BSP) model. The first algorithm is for weighted distance, and the second algorithm is for L/sub p/ distance. Both algorithms run in O(n/sup 2//p+L) computation time and O(g/sup n///spl radic/p+L) communication time using p (1/spl les/p/spl les/n) processors and in O(n/sup 2//p+(d+L)log p/n/log(d+1)) computation time and in O(gn//spl radic/p+(gd+L)log p/n/log(d+1)) communication time using p (n
Constrained reconfigurable meshes are one type of parallel computing model which takes the reconfigurability of hardware into account. With these meshes, a practical assumption is given on the communication power such...
详细信息
Constrained reconfigurable meshes are one type of parallel computing model which takes the reconfigurability of hardware into account. With these meshes, a practical assumption is given on the communication power such that a signal is propagated through a constant number of processing elements (PEs) say k PEs, at one unit of time. We present algorithms for the fundamental problem of computing the sum of multiple integers. For the problem of summing n binary values, two show an optimal O(n/k)-time algorithm on a constrained reconfigurable mesh of size m/spl times/n, where m=/spl Theta/(log/sup 2/ k/log log k). For the problem of summing n d-bit integers, we present an O((d+/spl radic/dmm)/k) time algorithm on a constrained reconfigurable mesh of size /spl radic/dmn/spl times//spl times//spl radic/dmn.
The main problem of fractal image compression is the long search time of the domain pool. For this reason, the dedicated ASIC architecture for fractal image coding is needed. In this paper, we propose an efficient par...
详细信息
The main problem of fractal image compression is the long search time of the domain pool. For this reason, the dedicated ASIC architecture for fractal image coding is needed. In this paper, we propose an efficient parallel architecture for fractal image coding which is based on fixed-size full-search algorithm. One of the main features of this architecture is that it uses only local communication such that each processor has a range and a domain block which is shifted to the next processor. Another main feature is that it has very regular interconnections and data flow. Domain blocks are formed from the range blocks in processors and the encoding procedure is performed by the regular data flow of domain blocks into the other processors. Each processor performs the fast isometric transformations which are calculated by one full rotation around the center.
We propose an adaptive parallel grouping algorithm that may reduce the number of collisions and increase overall throughput of B-WLL (Broadband Wireless Local Loop) MAC (Media Access Control) protocol. Also, we presen...
详细信息
We propose an adaptive parallel grouping algorithm that may reduce the number of collisions and increase overall throughput of B-WLL (Broadband Wireless Local Loop) MAC (Media Access Control) protocol. Also, we present a method for independently executing the adaptive parallel grouping algorithm to reduce the signaling traffic overheads.
Recursive Diagonal Torus (RDT) is a class of interconnection network consisting of recursively overlaid two-dimensional square diagonal tori for massively parallel computers with up to 216 nodes. Connection structures...
详细信息
Recursive Diagonal Torus (RDT) is a class of interconnection network consisting of recursively overlaid two-dimensional square diagonal tori for massively parallel computers with up to 216 nodes. Connection structures of the RDT vary according to the assignment of upper rank diagonal tori into a node. Although traditional simple assignment called RDT(2, 4, 1)/α shows enough performance under the uniform traffic, the congestion of low rank tori degrades the performance when local communication is dominant. In this paper, RDT(2, 4, 1)/β torus assignment is proposed, focusing on improving the performance for local communication. With a simplified simulation algorithm, results shows that RDT(2, 4, 1)/β improves the average distance compared with RDT(2, 4, 1)/α assignment when considering local area.
In this paper, a new network structure called generalized Hierarchical Completely-Connected networks (HCC) is proposed, and its properties and features are evaluated. A set of the HCCs constructed by the proposed meth...
详细信息
In this paper, a new network structure called generalized Hierarchical Completely-Connected networks (HCC) is proposed, and its properties and features are evaluated. A set of the HCCs constructed by the proposed method includes some conventional hierarchical networks, then it is called generalized one. The construction of an HCC is started from a basic block (a level-1 block) which consists of n nodes with a constant degree. Then a level-h (h ≥ 2) block is constructed recursively by interconnecting any pair of macro nodes (n level-(h - 1) blocks) completely. An HCC has the constant node degree regardless of increasing its size (the number of nodes). Furthermore, since an HCC has the hierarchically structured character and the feature of uniformity, a wide variety of inter-cluster connections are possible.
暂无评论