In tree-based adaptive mesh refinement, elements are partitioned between processes using a space-filling curve. The curve establishes an ordering between all elements that derive from the same root element, the tree. ...
详细信息
In tree-based adaptive mesh refinement, elements are partitioned between processes using a space-filling curve. The curve establishes an ordering between all elements that derive from the same root element, the tree. When representing complex geometries by connecting several trees, the roots of these trees form an unstructured coarse mesh. We present an algorithm to partition the elements of the coarse mesh such that (a) the fine mesh can be load-balanced to equal element counts per process regardless of the element-to-tree map, and (b) each process that holds fine mesh elements has access to the meta data of all relevant trees. As an additional feature, the algorithm partitions the meta data of relevant ghost (halo) trees as well. We develop in detail how each process computes the communication pattern for the partition routine without handshaking and with minimal data movement. We demonstrate the scalability of this approach on up to 917e3 MPI ranks and 371e9 coarse mesh elements, measuring run times of one second or less.
In this paper, an image sharpening method using integral image representation and Laplacian operator is presented. First, a parallel algorithm is proposed to compute the integral image of the original image. Then, the...
详细信息
In this paper, an image sharpening method using integral image representation and Laplacian operator is presented. First, a parallel algorithm is proposed to compute the integral image of the original image. Then, the integral image is used to compute the Laplacian image by subtracting the center pixel from its surround average in a rectangular window. This method can achieve a constant number of operations per rectangle. Next, the sharpened image is obtained by adding the Laplacian image to the original image. Finally, one numerical example is demonstrated to show the effectiveness of the proposed image sharpening approach.
This paper concerns with the synchronization of infrastructure impoverished sensor networks under harsh conditions. It suggests three novel asynchronous, decentralized, energyefficient time synchronization protocols. ...
详细信息
We consider tensors in the Hierarchical Tucker format and suppose the tensor data to be distributed among several compute nodes. We assume the compute nodes to be in a one-to-one correspondence with the nodes of the H...
详细信息
We consider tensors in the Hierarchical Tucker format and suppose the tensor data to be distributed among several compute nodes. We assume the compute nodes to be in a one-to-one correspondence with the nodes of the Hierarchical Tucker format such that connected nodes can communicate with each other. An appropriate tree structure in the Hierarchical Tucker format then allows for the parallelization of basic arithmetic operations between tensors with a parallel runtime that grows like log(d), where d is the tensor dimension. We introduce parallel algorithms for several tensor operations, some of which can be applied to solve linear equations AX=B directly in the Hierarchical Tucker format using iterative methods such as conjugate gradients or multigrid. We present weak scaling studies, which provide evidence that the runtime of our algorithms indeed grows like log(d). Furthermore, we present numerical experiments in which we apply our algorithms to solve a parameter-dependent diffusion equation in the Hierarchical Tucker format by means of a multigrid algorithm.
Today's Data Centers networks depend on optical switching to overcome the scalability limitations of traditional architectures. All optical networks most often use slotted Time Division Multiple Access (TDMA) oper...
详细信息
Today's Data Centers networks depend on optical switching to overcome the scalability limitations of traditional architectures. All optical networks most often use slotted Time Division Multiple Access (TDMA) operation;their buffers are located at the optical network edges and their organization relies on effective scheduling of the TDMA frames to achieve efficient sharing of the network resources and a collision-free network operation. Scheduling decisions have to be taken in real time, a process that becomes computationally demanding as the network size increases. Accelerators provide a solution and the present paper proposes a scheduler accelerator to accommodate a data center network divided into points of delivery (pods) of racks and exploiting hybrid electro-optical top-of-rack (ToR) switches that access an all-optical inter-rack network. The scheduler accelerator is a parallel scalable architecture with application specific processing engines. Case studies of 2, 4, 8, 16 processors configuration are presented for the processing of all the transfer TDMA time slot requests for the cases of 512 and 1024 ToR network nodes. The architecture is realized on a Xilinx VC707 board to validate the results.
We describe algorithms for computing maximal determinants of binary circulant matrices of small orders. Here "binary matrix" means a matrix whose elements are drawn from {0, 1} or {−1, 1}. We describe effici...
详细信息
—Periodic gossip algorithms have generated a lot of interest due to their ability to compute the global statistics by using local pairwise communications among nodes. Simple execution, robustness to topology changes,...
详细信息
Louvain algorithm is a well-known and efficient method for detecting communities or clusters in social and information networks (graphs). The emergence of large network data necessitates parallelization of this algori...
详细信息
ISBN:
(纸本)9781538675199
Louvain algorithm is a well-known and efficient method for detecting communities or clusters in social and information networks (graphs). The emergence of large network data necessitates parallelization of this algorithms for high performance computing platforms. There exist several shared-memory based parallel algorithms for Louvain method. However, those algorithms do not scale to a large number of cores and large networks. Distributed memory systems are widely available nowadays, which offer a large number of processing nodes. However, the existing only MPI (message passing interface) based distributed-memory parallel implementation of Louvain algorithm has shown scalability to only 16 processors. In this paper, we implement both shared- and distributed-memory based parallel algorithms and identify issues that hinder scalability. In our shared-memory based algorithm using OpenMP, we get 4-fold speedup for several real-world networks. However, this speedup is limited only by the physical cores available to our system. We then design a distributed-memory based parallel algorithms using message passing interface. Our results demonstrate an scalability to a moderate number of processors. We also provide an empirical analysis that shows how communication overhead poses the most crucial threat for deisgning scalable parallel Louvain algorithm in a distributed-memory setting.
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension ...
详细信息
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be used as a black box to speed up linear programming solvers dramatically, by two orders of magnitude in our experiments. We present worst-case guarantees on the quality of the solution and the speedup provided by the algorithm, showing that the framework provides an approximately optimal solution while running the original solver on a much smaller problem. The framework can be used to accelerate exact solvers, approximate solvers, and parallel/distributed solvers. Further, it can be used for both linear programs and integer linear programs.
During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such...
详细信息
暂无评论