This paper addresses the problem of transcoding proxy placement for coordinated en-route web caching for tree networks. We propose a model for this problem by considering all the nodes among the network in a coordinat...
详细信息
ISBN:
(纸本)0769521355
This paper addresses the problem of transcoding proxy placement for coordinated en-route web caching for tree networks. We propose a model for this problem by considering all the nodes among the network in a coordinated way and formulate this problem as an optimization problem. We implement our dynamic programming-based algorithm and evaluate our model on different performance metrics through extensive simulation experiments. The implementation results show that our model outperforms the placement model for linear topology.
Threads provides a mechanism for simulating the execution of parallel algorithms on a simplified model of a shared-memory multiprocessor. The algorithms can be expressed in a high-level block-structured language, whic...
详细信息
Threads provides a mechanism for simulating the execution of parallel algorithms on a simplified model of a shared-memory multiprocessor. The algorithms can be expressed in a high-level block-structured language, which supports multiple threads of execution within a common body of program code. Results show an ability to achieve good speedup for small problems using algorithms derived by simple modifications of sequential algorithms. As well, a sibling thread synchronisation feature provides the basis for the synchronous execution of threads. k -parallel algorithms tailored to the machine size and implemented as synchronously executing iterations, can provide near linear speedup as the problem size is increased. The techniques described in this paper seem to promise an effective synchronous execution mode for shared-memory MIMD architectures.
This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for othe...
详细信息
ISBN:
(纸本)9781450301190
This paper evaluates features of graph coloring algorithms implemented on graphics processing units (GPUs), comparing coloring heuristics and thread decompositions. As compared to prior work on graph coloring for other parallelarchitectures, we find that the large number of cores and relatively high global memory bandwidth of a GPU lead to different strategies for the parallel implementation. Specifically, we find that a simple uniform block partitioning is very effective on GPUs and our parallel coloring heuristics lead to the same or fewer colors than prior approaches for distributed-memory cluster architecture. Our algorithm resolves many coloring conflicts across partitioned blocks on the GPU by iterating through the coloring process, before returning to the CPU to resolve remaining conflicts. With this approach we get as few color (if not fewer) than the best sequential graph coloring algorithm and performance is close to the fastest sequential graph coloring algorithms which have poor color quality.
With the rapid development of Internet and the continuous rise of network users, the network traffic in various regions is increasing rapidly. In the face of a large number of high speed and high throughput of the net...
详细信息
ISBN:
(纸本)9781538694039
With the rapid development of Internet and the continuous rise of network users, the network traffic in various regions is increasing rapidly. In the face of a large number of high speed and high throughput of the network environment, traditional packet capture methods and processing capabilities cannot reach the corresponding speed, which results in severe packet loss. This paper focuses on a high-performance packet acquisition and distribution method to break through the performance bottleneck of universal servers and network cards. This paper studies a packet capture method based on DPDK platform, and uses the processing of hash value in RSS to improve the efficiency of data packet distribution, which realizes the process from performance acquisition to efficiently multi-core parallel processing. This method can effectively reduce packet loss and improve the data packet processing rate. It can also reduce resource waste and network overhead for traffic capture and distribution. Preliminary experiments show that DPDK-based traffic processing has obvious advantages over PF-RING and Netmap in data processing speed.
The longest common subsequence (LCS) problem is one of the most useful algorithms being applied in various research areas. This problem is known to be NP-hard for arbitrary data. In this paper, we present a parallel L...
详细信息
ISBN:
(纸本)9781538634417
The longest common subsequence (LCS) problem is one of the most useful algorithms being applied in various research areas. This problem is known to be NP-hard for arbitrary data. In this paper, we present a parallel LCS algorithm using the GPU-based OpenACC model, which is based on the existing dynamic approach and parallel anti-diagonal scheme that is applied in order to eliminate the data dependencies. The proposed algorithm in this paper has been benchmarked using four different computing models: OpenMPI, OpenMP, hybrid OpenMPI & OpenMP, and OpenACC model. The parallel LCS algorithm has been implemented using Swiss-Prot databases over these computing models, so that their execution times, speed-ups and speed-ratios have been measured and analogized among them extensively. Our experimental results reveal that the computation of our algorithm on OpenACC (on GPU) is around 16 times faster than the execution on a single CPU, and around 2 times faster than on the octa-core processor systems. The performance of the OpenACC model stands out among the four tested models in solving the LCS problem.
作者:
Grosz, LMassey Univ
Inst Informat & Math Sci N Shore Mail Ctr Auckland New Zealand
We consider the algebraic multilevel iteration (AMLI) for the solution of systems of linear equations as they arise from a finite-difference discretization on a rectangular grid. Key operation is the matrix-vector pro...
详细信息
We consider the algebraic multilevel iteration (AMLI) for the solution of systems of linear equations as they arise from a finite-difference discretization on a rectangular grid. Key operation is the matrix-vector product, which can efficiently be executed on vector and parallel-vector computer architectures if the nonzero entries of the matrix are concentrated in a few diagonals. In order to maintain this structure for all matrices on all levels coarsening in alternating directions is used. In some cases it is necessary to introduce additional dummy grid hyperplanes. The data movements in the restriction and prolongation are crucial, as they produce massive memory conflicts on vector architectures. By using a simple performance model the best of the possible vectorization strategies is automatically selected at runtime. Examples show that on a Fujitsu VPP300 the presented implementation of AMLI reaches about 85% of the useful performance, and scalability with respect to computing time can be achieved.
Since applications of computing systems permeated in every aspects of our daily life, the efficiency of execution of parallel programs on distributed systems has become a critical issue in the research field of high-p...
详细信息
Since applications of computing systems permeated in every aspects of our daily life, the efficiency of execution of parallel programs on distributed systems has become a critical issue in the research field of high-performance computing systems. In recent years, more and more researchers have recognized the fact that parallel algorithms, scheduling and architectures play an important role in improving the efficiency of computing systems, and hence continuously present their valuable research results in this field.
In this special issue, we selected some excellent papers from the third international symposium on parallel architectures, algorithms and programming (PAAP 2010), which was held in Dalian, China, December 18-20, 2010. In addition, we invited and selected some representative research papers in the broad area of parallel algorithms, scheduling and architectures.
The paper titled "A Novel Differential Evolution with Uniform Design for Continuous Global Optimization" presents a uniform-differential evolution algorithm (UDE) which incorporates uniform design initialization method into differential evolution to accelerate its convergence speed and improve the stability.
The paper titled "Leveraging 1-hop Neighborhood Knowledge for Connected Dominating Set in Wireless Sensor Networks" proposes an algorithm leveraging 1-hop neighborhood knowledge for connected dominating set, aiming to get a small connected dominating set, meanwhile, to minimize the consumption of energy and time.
The paper titled “An Internet Traffic Identification Approach Based on GA and PSO-SVM" proposes an internet traffic identification approach which selects the best feature subset using Genetic Algorithm, and then calculate the correspondence weight of each feature selected by Particle Swarm Optimization (PSO). In addition, the traditional SVM algorithm is optimized by PSO algorithm.
The paper titled "Efficient and Scalable Thread-level parallel algorithms for Sorting Multise
With the evolution of High Performance Computing, multi-core and many-core systems are a common feature of new hardware architectures. The required programming efforts induced by the introduction of these architecture...
详细信息
With the evolution of High Performance Computing, multi-core and many-core systems are a common feature of new hardware architectures. The required programming efforts induced by the introduction of these architectures are challenging due to the increasing number of cores. parallelprogramming models based on the data flow model and the task programming paradigm intend to fix this issue. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) fail to scale on shared-memory architectures with large number of cores. Multi-level domain decomposition (DDML) preconditioners recently introduced seem to be both numerically robust and scalable on emerging architectures because of their parallel nature. This paper proposes a parallel implementation of these preconditioners using the task programming paradigm with a data flow model. This approach is validated on linear systems extracted from realistic petroleum reservoir simulations. This shows that, given an appropriate coarse operator in such preconditioners, the method has good convergence rates while our implementation ensures interesting scalability on multi-core architectures. (C) 2019 Elsevier B.V. All rights reserved.
In practice, various techniques are used to speed up the reasoning in logic programming and parallel machines. Three major approaches have generally been adapted to solve this problem. The most common approach involve...
详细信息
In practice, various techniques are used to speed up the reasoning in logic programming and parallel machines. Three major approaches have generally been adapted to solve this problem. The most common approach involves some methods of the development of AND and OR parallelism, as in Parlog[Clark84], Concurrent-Prolog[Shapiro83] and IDIOM[Guptas&Hermenegildo]. In these schemes, the three main forms of implicit parallelism-Independent AND-parallelism, Dependent AND-parallelism and OR-parallelism are exploited. The second approach is to build parallelarchitectures to execute different level parallelism inherent in inference, such as DADO[Stolfo84, Miranker90], NON-VON[Hillyer86] and PSM[Gupta87]. The third approach is to develop faster match and search algorithms, as in Rete[Forgy82] and Treat[Miranker87]. The bottle-neck in inference systems is the match phase. Around 90% of execution rime is consumed in this phase[Gupta87]. In this paper, we present algorithms to realize the connection method on systolic arrays. The algorithms try to partition the paths in connections matrices for parallel inference. Firstly, parallelism in reasoning is discussed;then the parallel inference on systolic arrays and algorithms for partition of paths are introduced. Finally, the correctness and completeness of the algorithms is shown. The paper consists of five sections. The connection method is presented and parallel inference algorithms on systolic arrays are designed after introduction. The third section describes an example in partition of the paths in the connection method, the example executing on normal systolic and tree systolic models are shown. The fourth section discusses the analysis of the algorithms. The final section works out conclusions and related work.
programmingparallelarchitectures using a hierarchical point of view is becoming today's standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP...
详细信息
ISBN:
(数字)9781665488020
ISBN:
(纸本)9781665488020
programmingparallelarchitectures using a hierarchical point of view is becoming today's standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP bridging model. This model extends BSP and proposes a structured way of programming multi-level architectures. In the context of parallelprogramming we, now need to manage new concerns such as memory coherency, deadlocks and safe data communications. To do so, we propose a typing system for MULTI-ML, a ML-like programming language based on the MULTI-BSP model. This type system introduces data locality using type annotations and effects to be able to detected wrong uses of multi-level architectures. We thus ensure that "Well-typed programs cannot go wrong" on hierarchical architectures.
暂无评论