Results are presented on the effectiveness of several algorithms for partitioning graphs with weighted nodes and edges. Such graphs can represent a number of interesting program structures, such as processes using mes...
详细信息
Results are presented on the effectiveness of several algorithms for partitioning graphs with weighted nodes and edges. Such graphs can represent a number of interesting program structures, such as processes using message passing or dataflow graphs and other distributed software. The graph partitions can be used in software tools to control the allocation of program units to distributed processes in ways which minimize communication cost, completion time or utilization. It is shown that, in situations in which a good partition is crucial, simulated annealing is the most effective algorithm. It is, however, an expensive algorithm to run, and when this is important good results can still be obtained by using a variant of a greedy algorithm. A number of other algorithms that can be useful in particular situations are described. Those can be of practical use to distributed system designers.< >
Circuit partitioning is a key phase in the VLSI design and partitioning algorithm is of great importance. Two styles of genetic algorithms based on different encoding strategies for circuit partitioning are presented....
详细信息
ISBN:
(纸本)0780384032
Circuit partitioning is a key phase in the VLSI design and partitioning algorithm is of great importance. Two styles of genetic algorithms based on different encoding strategies for circuit partitioning are presented. The first adopts the form of 0-1 encoding, and the second uses integer encoding based on modules number. Meanwhile, the corresponding fitness function and genetic operators are designed for each method. Then these two algorithms are implemented to test standard benchmark circuits. Compared with the traditional F-M algorithm, partition results by the two genetic algorithms are markedly improved.
Summary form only given. Many applications, especially in image processing, have variable execution times according to the nature of data to process. The implementation of low level processing in image processing (for...
详细信息
Summary form only given. Many applications, especially in image processing, have variable execution times according to the nature of data to process. The implementation of low level processing in image processing (for example the detection of contours, labeling) in embedded architectures often uses specialized systems. We consider an architecture composed of a RISC processor linked to a dynamically reconfigurable circuit through a memory interface. To help to distribute processing with variable execution time on this architecture we present a partitioning approach based on a genetic algorithm.
This work presents Workload partitioning and Scheduling (WPS), a novel algorithm for evenly partitioning the computational workload of large implicitly-defined work-list-based applications on distributed/shared-memory...
详细信息
This work presents Workload partitioning and Scheduling (WPS), a novel algorithm for evenly partitioning the computational workload of large implicitly-defined work-list-based applications on distributed/shared-memory systems. In WPS, a stratified sampling technique estimates the number of work items that will be processed in each step of the target application. Then WPS uses this estimation to evenly partition and distribute the computational workload. An empirical evaluation on large applications -- Iterative-Deepening A* (IDA*) applied to (4 × 4)- and (5 × 5)-Sliding-Tile Puzzles, Delaunay Mesh Generation, and Delaunay Mesh Refinement -- shows that WPS is applicable to a range of applications. A coordination between WPS and existing work-stealing schedulers for intra-node load balancing yields additional speedups in the range of 18% to 40% compared to that achieved with the existing work-stealing schedulers alone. Such a coordination also outperforms an existing workload-partitioning scheme intended specifically for IDA* algorithms by 17% to 36%.
Graph partitioning is an enabling technology for parallel processing as it allows for the effective decomposition of unstructured computations whose data dependencies correspond to a large sparse and irregular graph. ...
详细信息
ISBN:
(纸本)9781424400546
Graph partitioning is an enabling technology for parallel processing as it allows for the effective decomposition of unstructured computations whose data dependencies correspond to a large sparse and irregular graph. Even though the problem of computing high-quality partitionings of graphs arising in scientific computations is to a large extent well-understood, this is far from being true for emerging HPC applications whose underlying computation involves graphs whose degree distribution follows a power-law curve. This paper presents new multilevel graph partitioning algorithms that are specifically designed for partitioning such graphs. It presents new clustering-based coarsening schemes that identify and collapse together groups of vertices that are highly connected. An experimental evaluation of these schemes on 10 different graphs show that the proposed algorithms consistently and significantly outperform existing state-of-the-art approaches.
One of the most important and difficult tasks in multi-FPGA systems design is partitioning. The main problems are related to the I/O pins and logic capacity of FPGAs. The number of pins available is a critical problem...
详细信息
One of the most important and difficult tasks in multi-FPGA systems design is partitioning. The main problems are related to the I/O pins and logic capacity of FPGAs. The number of pins available is a critical problem, because FPGA devices have such a reduced number of them compared with their logic capacity. In addition we must reserve some of the pins to interconnect parts of the circuit placed on non-adjacent FPGAs. Most of the previous works have been adapted from other VLSI areas, and hence, they disregard the specific features of these kind of circuit. A new method for solving the partitioning and placement problem in multi-FPGA systems is presented. We use graph theory to describe the circuit, then a classical genetic algorithm (GA) is applied with a problem-specific encoding. The algorithm preserves the original structure of the circuit and by means of a fuzzy technique it evaluates the I/O-pins consumption due to direct and indirect connections between FPGAs. We have used the partitioning93 benchmarks described with the Xilinx Netlist Format (XNF). The results obtained show how genetic algorithms are capable of accomplishing successfully the partitioning and placement tasks while respecting the board constraints.
Enterprise data warehouse system maintains large amounts of data for enabling business analysis. The queries imposed on such system involves complex joins, aggregation and filter operations. Hence in order to enhance ...
详细信息
Enterprise data warehouse system maintains large amounts of data for enabling business analysis. The queries imposed on such system involves complex joins, aggregation and filter operations. Hence in order to enhance query performance the data warehouse needs to be tuned by optimization techniques such as partitioning. Referential horizontal partitioning performs better for data warehouse where the fact table is partitioned based on the dimension table. The number of fragments or partitions that is generated by horizontal partitioning might be very large to be managed in the underlying database. In literature few fragmentation selection algorithms have been proposed in order to choose optimal set of fragments. In this paper we provide a summary of different fragmentation selection algorithms and provide comparative analysis between them.
State-of-the-art video compression technologies, such as MPEG-4 AVC/H.264 or the new HEVC standard being developed by ISO MPEG and ITU-T VCEG, make use of tree-structured block partitioning for motion compensation. Su...
详细信息
State-of-the-art video compression technologies, such as MPEG-4 AVC/H.264 or the new HEVC standard being developed by ISO MPEG and ITU-T VCEG, make use of tree-structured block partitioning for motion compensation. Such motion partitioning only captures horizontal and vertical motion boundaries. To better match actual motion frontiers, Geometry adaptive block partitioning (GEO) has been explored for several years. GEO enables splitting a block using non-horizontal or non-vertical line. Although noticeable coding efficiency gains can be obtained, GEO involves a significant increase of the number of modes to be tested with therefore a high impact on encoding complexity. This paper presents fast algorithms aiming at controlling the complexity while saving the coding efficiency gains of GEO. Experimental results are provided on top of the HEVC standard, demonstrating first the efficiency of the GEO tool. Simplified versions, offering noticeable complexity reduction with limited rate-distortion performance loss, are also demonstrated and compared.
Circuit partitioning generally formulated as graph partitioning problem is an important step in physical design of circuits. The use of Evolutionary techniques is increasingly used to solve NP complete problems i.e. A...
详细信息
Circuit partitioning generally formulated as graph partitioning problem is an important step in physical design of circuits. The use of Evolutionary techniques is increasingly used to solve NP complete problems i.e. Applications for logic minimization and simulation heuristics. This paper explores the evolutionary approach of genetic algorithm and propose a hybrid technique involving the strengths of the existing techniques resulting in a better partitioning and placement of circuits. It can further be extended to the Hardware/Software boundary of algorithms and can be applied to real world physical design problems.
The problem of partitioning task graphs in its general form is known to be NP-complete and it is extremely difficult to come up with simple but effective and fast heuristics too. In this paper, the tree task graphs ar...
详细信息
The problem of partitioning task graphs in its general form is known to be NP-complete and it is extremely difficult to come up with simple but effective and fast heuristics too. In this paper, the tree task graphs are considered which arise from many important programming paradigms such as divide and conquer branch and bound, etc. The target architecture considered is shared memory architecture as it typifies a wide range of research prototypes and commercial products of multiprocessors. Optimal sequential and parallel algorithms for partitioning tree task graphs such that the bottleneck is minimized "with minimum number of processors are developed. The bandwidth minimization problem is NP-complete even for trees. Three effective, simple 2-pass heuristics and their parallel versions are given. The effectiveness and efficiency of those heuristics are validated through extensive simulations.
暂无评论