the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for rout...
ISBN:
(纸本)9783319271361
the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for route guidance in urban traffic networks;optimization of binomial option pricing on intel MIC heterogeneous system;stencil computations on HPC-oriented ARMv8 64-bit multi-core processor;a particle swarm optimization algorithm for controller placement problem in software defined network;a streaming execution method for multi-services in mobile cloud computing;economy-oriented deadline scheduling policy for render system using IaaS cloud;towards detailed tissue-scale 3D simulations of electrical activity and calcium handling in the human cardiac ventricle;task parallel implementation of matrix multiplication on multi-socket multi-core architectures;refactoring for separation of concurrent concerns;exploiting scalable parallelism for remote sensing analysis models by data transformation graph;resource-efficient vibration data collection in cyber-physical systems;a new approach for vehicle recognition and tracking in multi-camera traffic system;a scalable distributed fingerprint identification system;energy saving and load balancing for SDN based on multi-objective particle swarm optimization;pre-stack kirchhoff time migration on hadoop and spark;a cyber physical system with GPU for CNC applications;a solution of the controller placement problem in software defined networks;parallel column subset selection of kernel matrix for scaling up support vector machines;real-time deconvolution with GPU and spark for big imaging data analysis and parallel kirchhoff pre-stack depth migration on large high performance clusters.
In recent years, withthe development of machine learning, plenty of personal data have been utilized in the training process of the models which incurs severe privacy leakage in the field. Current regulations mandate...
详细信息
Withthe increasing concern for environmental protection and resource optimization, efficient waste sorting has become a serious challenge today. In this paper, we propose a new offloading control problem that aims to...
详细信息
the proceedings contain 33 papers. the topics discussed include: smart content delivery on the Internet;parallel query processing in databases on multicore architectures;evaluation of a novel load-balancing algorithm ...
详细信息
ISBN:
(纸本)9783540695004
the proceedings contain 33 papers. the topics discussed include: smart content delivery on the Internet;parallel query processing in databases on multicore architectures;evaluation of a novel load-balancing algorithm with variable granularity;a static multiprocessor scheduling algorithm for arbitrary directed task graphs in uncertain environments;architecture aware partitioning algorithms;a simple and efficient fault-tolerant adaptive routing algorithm for meshes;fault tolerance in the biswapped network;a general approach to predict the performance order of TSP family problems;examining the feasibility of reconfigurable models for molecular dynamics simulation;parallel simulated annealing for materialized view selection in data warehousing environments;an optimization service for DSP multicomputers;and a non-blocking multithreaded architecture with support for speculative threads.
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time due to its high computational cost;hence the working efficiency of the oil industry is affected. the general purpose Graphic processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) developed by NVIDIA have provided a new solution to this problem. In this study, we have proposed a parallel algorithm of the Kirchhoff pre-stack depth migration and an optimization strategy based on the CUDA technology. Our experiments indicate that for large data computations, the accelerated algorithm achieves a speedup of 8 similar to 15 times compared with NVIDIA GPU.
parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 ...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. the computational difficulty of this problem is the basis for constructing secure post-quantum public...
详细信息
ISBN:
(纸本)9783030602451;9783030602444
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. the computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop hardware architecture. the main advantage of the proposed approach is eliminating the overhead of the data transfer between a CPU and a hardware accelerator. the authors believe that this is the first such architecture reported in the literature to date and predict to achieve up to 8 times higher throughput when compared to a multi-core high-performance CPU. Presented methods can be adapted for other sieving algorithms hard to implement in FPGAs due to the communication and memory bottleneck.
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applie...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. these algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show that sequential optimizations yield up to 8x speedup for scenarios with larger data.
Welcome to the proceedings of the 8th international conference on algorithms and architectures for parallel processing (ica3pp2008). ica3pp 2008 consist of two keynote addresses, seven technical sessions, and one tut...
详细信息
ISBN:
(数字)9783540695011
ISBN:
(纸本)9783540695004
Welcome to the proceedings of the 8th international conference on algorithms and architectures for parallel processing (ica3pp2008). ica3pp 2008 consist of two keynote addresses, seven technical sessions, and one tutorial. Included in these proceedings are papers whose authors are from Australia, Brazil, Canada, China, Cyprus, France, India, Iran, Israel, Italy, Japan, Korea, Germany, Greece, Mexico, Poland, Portugal, Romania, Spain, Switzerland, Taiwan, Tunisia, UAE, UK, and USA. Each paper was rigorously reviewed by at least three Program Committee members and/or external revi- ers, and the acceptance ratio is 35%. these papers were presented over seven technical sessions. Based on the paper review results, three papers were selected as the best papers. We would like to thank the many people who helped make this conference a successful event. We thank all authors who submitted their work to ica3pp 2008, and all Program Committee members and additional reviewers for their diligent work in the paper review process ensuring a collection of high-quality papers. We are grateful to Hong Shen University of Adelaide, Australia and Kleanthis Psarris University of Texas at San Antonio, United States, for their willingness to be the keynote speakers. Our thanks go to Hai Jin and George Papapodoulos, the conference General Co-chairs, and Andrzej Goscinski, W- lei Zhou and Yi Pan, the conference Steering Committee Co-chairs for help in many aspects of organizing this conference. Finally, we thank all the conference participants for traveling to Cyprus.
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm...
详细信息
ISBN:
(纸本)9783642246494
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N-2). three implementations that span a wide range of parallel hardware are developed. the first is based on shared-memory architecture, using the OpenMP programming model. the second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. the third implementation is based on the data parallel programming model mapped on Graphics processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.
暂无评论