This paper studies and compares the domain partitioning algorithms presented by Farhat, Al-Nasra and Nguyen, Malone, and Simon/Hsieh et al. , for load balancing in parallel finite element analysis. Both the strengths ...
详细信息
Data- parallel applications running on heterogeneous high-performance computing platforms require a nonuniform distribution of the workload between available processes. Data partitioning algorithms are formulated as a...
详细信息
Data- parallel applications running on heterogeneous high-performance computing platforms require a nonuniform distribution of the workload between available processes. Data partitioning algorithms are formulated as an optimization problem. Departing from the computational performance models of the processes, the goal is to find the partition that minimizes the communication cost. Traditionally, communication volume is the metric used to guide the partitioning. This metric, however, is unable to capture the complexity of current heterogeneous systems, which show uneven communication channels and execute applications with different communication patterns. In this paper, we discuss the role of analytical communication performance models as a metric in partitioning algorithms. First, we describe a method to programmatically predict the communication cost of a data-parallel kernel based on the tau -Lop analytical model. We show that this figure better captures the communication features of applications and platforms. We present results showing that this approach builds partitions that equal or improve the performance of data parallel applications on heterogeneous platforms with respect to previous volume-based strategies.
Hierarchical virtual partitioning facilitates sharing of a resource, such as a link, by multiple customers, each with multiple service classes. Calls of each service class have distinctive bandwidth requirements, arri...
详细信息
Hierarchical virtual partitioning facilitates sharing of a resource, such as a link, by multiple customers, each with multiple service classes. Calls of each service class have distinctive bandwidth requirements, arrival rates and mean holding times. The quality of service is in terms of call blocking probabilities. The algorithms allow the desired mix of fairness, robustness and multiplexing efficiency at each of two levels, namely, customers and the various services of each customer. The scheme relies on nominal allocations of capacities to each customer and service, and priorities in the call admission process, which are implemented by dynamic trunk reservations. An approximate method of analysis based on fixed point equations is given. A reward/penalty paradigm is devised to reflect the issues in virtual private networking and the performance of our algorithm compared to the optimal solution. The results show that the scheme is fair, efficient and robust.
A graph partitioning greedy algorithm is presented. This algorithm avoids the hard-constraints of others similar approaches such as the impossibility for some regions to grow after certain step of the algorithm and th...
详细信息
ISBN:
(纸本)0769507506
A graph partitioning greedy algorithm is presented. This algorithm avoids the hard-constraints of others similar approaches such as the impossibility for some regions to grow after certain step of the algorithm and the uniqueness of the solution. Nevertheless, it allows attaining global results by local approximations using a generalised concept of not over-segmentation, which includes an energy function, and eliminating the not sub-segmentation criterion using a probabilistic criterion similar to that of annealing. The high-variability region problems such as borders are also eliminated identifying them and distributing their pixels among the other neighbour regions. Thus, it is possible to keep the time complexity of usual graph partitioning greedy algorithm and avoiding its high variability region problems, obtaining better results.
partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. partitioning requires intermediate results of computations of a specific block of the partition to be...
详细信息
partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. partitioning requires intermediate results of computations of a specific block of the partition to be stored in an intermediate memory. Furthermore a decomposition of the global interconnection structure of the algorithm is necessary. The authors outline a design methodology for the intermediate memories which perform the data rearrangements according to the interconnection relation and that consist of locally connected synchronous modules. Additionally procedures for deriving control signals for the intermediate memory are presented, which can serve as a basis for control minimization.< >
This paper addresses the problem of estimating the number of sources, the directions of narrowband signals and the signals emitted by a number of sources. The proposed solution is based on the reformulation of the pro...
详细信息
This paper addresses the problem of estimating the number of sources, the directions of narrowband signals and the signals emitted by a number of sources. The proposed solution is based on the reformulation of the problem in a way so that the measurement equation can be expressed as a non-linear function of a vector consisting of the locations as well as the signals emitted by the sources. Simulations illustrate that the proposed method selects the correct dimension of this augmented location vector (i.e. it correctly selects the number of sources) while at the same time successfully identifying the source locations and signals, even for a sufficiently small number of snapshots, Furthermore, the method is adaptive in the sense that it has the ability to track changes in the model structure in real time, successfully. Finally, an interesting feature of the proposed algorithm is its natural decoupled form, which makes it amenable for parallel implementations.
The problem of partitioning a large computer network into clusters in order to reduce the amount of network resources consumed by the routing algorithm is addressed. The clustering problem is formulated as a general g...
详细信息
The problem of partitioning a large computer network into clusters in order to reduce the amount of network resources consumed by the routing algorithm is addressed. The clustering problem is formulated as a general graph partitioning problem. It is shown that the problem of partitioning a graph into a minimum number of clusters with unit weight vertices and a given weight bound on the cluster size is NP-complete if each cluster is required to be internally connected. It is also shown that if a diameter bound is imposed on the cluster instead of the weight bound, then the problem is NP-complete, even when cluster connectivity is not required. An optimum partitioning algorithm is presented for the latter problem when the graph is a tree. An optimum partitioning algorithm is presented for another problem in which each cluster is required to contain exactly one of a set of specified vertices called cluster heads.< >
This paper presents the performance analysis of several well-known partitioning scheduling algorithms in real-time and fault-tolerant multiprocessor systems. Both static and dynamic scheduling algorithms are analyzed....
详细信息
This paper presents the performance analysis of several well-known partitioning scheduling algorithms in real-time and fault-tolerant multiprocessor systems. Both static and dynamic scheduling algorithms are analyzed. partitioning scheduling algorithms, which are studied, are heuristic algorithms that are formed by combining any of the bin-packing algorithms with any of the schedulability conditions for the rate-monotonic (RM) and earliest-deadline-first (EDF) policies. A tool is developed which enables to experimentally evaluate the performance of the algorithms from the graph of tasks. The results show that among several partitioning algorithms evaluated, the RM-small-task (RMST) algorithm is the best static algorithm and the EDF-best-fit (EDF-BF) is the best dynamic algorithm, for non fault-tolerant systems. For fault-tolerant systems which require about 49% more processors, the results show that the RM-first-fit decreasing utilization (RM-FFDU) is the best static algorithm and the EDF-BF is the best dynamic algorithm. To decrease the number of processors in fault-tolerant systems, the RMST is modified. The results show that the modified RMST decreases the number of required processors between 7% and 78% in comparison with the original RMST, the RM-FFDU and other well-known static partitioning scheduling algorithms
The Internet of Things (IoT) links the physical world to computing systems, and blockchain presents an opportunity to address the issues of weak interoperability and security flaws within IoT. However, blockchain face...
详细信息
The Internet of Things (IoT) links the physical world to computing systems, and blockchain presents an opportunity to address the issues of weak interoperability and security flaws within IoT. However, blockchain faces the challenge of low throughput and scalability. Sharding is a promising solution, but it divides the blockchain into multiple committees, making the attack cost of malicious nodes lower. Sharding also leads to a large number of cross-committee transactions, which degrades the system's performance. In this article, we propose the NSshard sharding framework that provides secure and low-cross-committee scaling. NSshard consists of network sharding and state sharding. We first propose a reputation score-based network sharding, which assigns each node a reputation score to reward its honest verification of transactions and penalizes its malicious behavior. This network sharding uses a random but balanced distribution of reputation scores, thereby decreasing the risk of collusion. We also propose a graph-based account partitioning scheme for state partitioning. To reduce the amount of cross-committee transactions, the scheme uses an undirected weighted graph to depict accounts and transactions. We design two algorithms based on edge splitting and overlapping community discovery, respectively. We also propose a dynamic sharding method to handle new transactions. We conduct extensive experiments to evaluate the efficiency of the proposed framework based on Ethereum transaction data. The experimental results show that our proposed framework can reduce the number of cross-committee transactions by 34.8% at 128 committees compared to the Metis algorithm.
暂无评论