parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 ...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.
To improve the intelligent image recognition abilities of edge devices, a parallel-optimization-based framework called POWER is introduced in this paper. With FPGA (Field-Programmable Gate Array) as its hardware modul...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
To improve the intelligent image recognition abilities of edge devices, a parallel-optimization-based framework called POWER is introduced in this paper. With FPGA (Field-Programmable Gate Array) as its hardware module, POWER provides well extensibility and flexible customization capability for developing intelligent firmware suitable for different types of edge devices in various scenarios. through an actual case study, we design and implement a firmware prototype following the specification of POWER and explore its performance improvement using parallel optimization. Our experimental results show that the firmware prototype we implement exhibits good performance and is applicable to substation inspection robots, which also validate the effectiveness of our POWER framework in designing edge intelligent firmware modules indirectly.
parallel memory modules can be used to increase memory bandwidth and feed a processor withthe required access patterns of data. the parallel storage mechanism organized and managed by multiple storage modules can sui...
详细信息
ISBN:
(纸本)9783030050573;9783030050566
parallel memory modules can be used to increase memory bandwidth and feed a processor withthe required access patterns of data. the parallel storage mechanism organized and managed by multiple storage modules can suit applications of images and videos. Previous investigation into data storage schemes can be used to achieve continuous conflict free access by rows, columns or blocks, however it is not only satisfied with some sliding window applications in video and image processingalgorithms (including convolutional neural networks, sub-pixel difference, 2D filtering, etc.) which need non-conflicting access by steps in computation, but also there is a different demand for horizontal and vertical strides in computing sub-processes. this paper presents a storage scheme that support for row access without collision alignment, and non-aligned block-with-stride access storage modes beginning at any address. theoretical proofs and experiments verify the correct ness of the module address (module number to which the address is mapped). And in hardware design, it was found that in the typical case there was no path violation and with less area overhead. It suitable for application of CNN to improve performance in algorithm in convolutional.
the proceedings contain 52 papers. the special focus in this conference is on Big Data and Its Applications. the topics include: Preference-aware HDFS for hybrid storage;urban traffic congestion prediction using float...
ISBN:
(纸本)9783319271217
the proceedings contain 52 papers. the special focus in this conference is on Big Data and Its Applications. the topics include: Preference-aware HDFS for hybrid storage;urban traffic congestion prediction using floating car trajectory data;a metadata cooperative caching architecture based on SSD and DRAM for file systems;parallel training GBRT based on kmeans histogram approximation for big data;an intelligent clustering algorithm based on mutual reinforcement;an effective method for gender classification with convolutional neural networks;a highly efficient indexing and retrieving method for astronomical big data of time series images;specialized FPGA-based accelerator architecture for data-intensive k-means algorithms;effectively identifying hot data in large-scale I/O streams with enhanced temporal locality;a search-efficient hybrid storage system for massive text data;enhancing parallel data loading for large scale scientific database;tradeoff between the price of distributing a database and its collusion resistance based on concatenated codes;a mapreduce reinforced distributed sequential pattern mining algorithm;a fast documents classification method based on simhash;identification of natural images and computer generated graphics using multi-fractal differences of PRNU;enriching document representation withthe deviations of word co-occurrence frequencies;big data analytics and visualization with spatio-temporal correlations for traffic accidents;a novel app recommendation method based on SVD and social influence;a segmentation hybrid index structure for temporal data and a refactored content-aware host-side SSD cache.
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and sp...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size. this paper proposes a parallel multi-level solution for all-pair SimRank similarity computing on large graphs. We partition the objective graph first withthe idea of modularity maximization and get a collapsed graph based on the blocks. then we compute the similarities between verteices inside a block as well as the similarities between the blocks. In the end, we integrate these two types of similarities and calculate the approximate SimRank simlarities between all vertex pairs. the method is implemented on Spark platform and it makes an improvement on time efficiency while maintaining the effectiveness compared to SimRank.
the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for rout...
ISBN:
(纸本)9783319271361
the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for route guidance in urban traffic networks;optimization of binomial option pricing on intel MIC heterogeneous system;stencil computations on HPC-oriented ARMv8 64-bit multi-core processor;a particle swarm optimization algorithm for controller placement problem in software defined network;a streaming execution method for multi-services in mobile cloud computing;economy-oriented deadline scheduling policy for render system using IaaS cloud;towards detailed tissue-scale 3D simulations of electrical activity and calcium handling in the human cardiac ventricle;task parallel implementation of matrix multiplication on multi-socket multi-core architectures;refactoring for separation of concurrent concerns;exploiting scalable parallelism for remote sensing analysis models by data transformation graph;resource-efficient vibration data collection in cyber-physical systems;a new approach for vehicle recognition and tracking in multi-camera traffic system;a scalable distributed fingerprint identification system;energy saving and load balancing for SDN based on multi-objective particle swarm optimization;pre-stack kirchhoff time migration on hadoop and spark;a cyber physical system with GPU for CNC applications;a solution of the controller placement problem in software defined networks;parallel column subset selection of kernel matrix for scaling up support vector machines;real-time deconvolution with GPU and spark for big imaging data analysis and parallel kirchhoff pre-stack depth migration on large high performance clusters.
One-sided communication mechanism of Messaging Passing Interface (MPI) has been extended by remote memory access (RMA) from several aspects, including interface, language and compiler, etc. Coarray Fortran (CAF), as a...
详细信息
ISBN:
(纸本)9783030050634;9783030050627
One-sided communication mechanism of Messaging Passing Interface (MPI) has been extended by remote memory access (RMA) from several aspects, including interface, language and compiler, etc. Coarray Fortran (CAF), as an emerging syntactic extension of Fortran to satisfy one-sided communication, has been freely supported by the open-source and widely used GNU Fortran compiler, which relies on MPI-3 as the transport layer. In this paper, we present the potential of RMA to benefit the communication patterns in Cannon algorithm. EVENTS, a safer implementation of atomics to synchronize different processes in CAF, are also introduced via classic Fast Fourier Transform (FFT). In addition, we also studied the performance of one-sided communication based on different compilers. In our tests, one-sided communication outperforms two-sided communication only when the data size is large enough (in particular, inter-node transfer). CAF is slightly faster than the simple one-sided routines without optimization by compiler in MPI-3. EVENTS are capable of improving the performance of parallel applications by avoiding the idle time.
the association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the ...
详细信息
ISBN:
(纸本)9783030050573;9783030050566
the association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the processing of large-scale distributed data. Developing appropriate distributed algorithm for Spark is essential to decrease the processing time of distributed recommendation. the existing FP-Growth in Spark is a popular parallel recommendation method but getting the best performance only when the memory of machines can accommodate all immediate Resilient Distributed DataSets (RDDs). However, memory of many practice data centers is still not large enough for large data sets. therefore, in this paper, a caching-based parallel FP-Growth is proposed which consists of an integer-based sorting and an RDD-caching strategy to improve the efficiency. Experimental results show that the proposal decreases the execution time by 32.37% on average compared withthe existing parallel FP-Growth in Spark. Furthermore, impacts of some important parameters upon the performance of the proposal are analyzed by numerous realistic experiments in Spark.
the proceedings contain 58 papers. the special focus in this conference is on parallel and Distributed architectures. the topics include: parallelizing block cryptography algorithms on speculative multicores;performan...
ISBN:
(纸本)9783319271187
the proceedings contain 58 papers. the special focus in this conference is on parallel and Distributed architectures. the topics include: parallelizing block cryptography algorithms on speculative multicores;performance characterization and optimization for intel xeon phi coprocessor;an extended MDS code to improve single write performance of disk arrays for correcting triple disk failures;a distributed location-based service discovery protocol for vehicular ad-hoc networks;unified virtual memory support for deep CNN accelerator on SoC FPGA;dynamic time slice scheduler for virtual machine monitor;memory-aware NoC application mapping based on adaptive genetic algorithm;a study on non-volatile 3D stacked memory for big data applications;parallel implementation of dense optical flow computation on many-core processor;a power-conserving online scheduling scheme for video streaming services;prevent deadlock and remove blocking for self-timed systems;improving the memory efficiency of in-memory mapreduce based HPC systems;dual best-first search mapping algorithm for shared-cache multicore processors;energy efficient network-on-chip router with heterogeneous virtual channels;availability and network-aware mapreduce task scheduling over the internet;an optimized algorithm based on CRS codes in big data storage systems;quantum computer simulation on multi-GPU incorporating data locality;query execution optimization based on incremental update in database distributed middleware;coding-based cooperative caching in data broadcast environments;usage history-directed power management for smartphones;a mobile application distribution method and a clustering algorithm based on rough sets for the recommendation domain in trust-based access control.
the proceedings contain 59 papers. the special focus in this conference is on Software Systems, Programming Models, Performance Modeling and Evaluation. the topics include: A scalable fault-tolerance programing model ...
ISBN:
(纸本)9783319271392
the proceedings contain 59 papers. the special focus in this conference is on Software Systems, Programming Models, Performance Modeling and Evaluation. the topics include: A scalable fault-tolerance programing model on MIC cluster;multi-chunk redundant array of independent SSDS with improved performance;an energy efficient storage system for astronomical observation data on dome a;parallel aware hybrid solid-state storage;automatic optimization of software transactional memory through linear regression and decision tree;a data-centric tool to improve the performance of multithreaded program on NUMA;a light-weight hot data identification scheme via grouping-based LRU lists;towards interactive programming withparallel linear algebra in r;enhancing i/o scheduler performance by exploiting internal parallelism of SSDS;a performance and scalability analysis of the MPI based tools utilized in a large ice sheet model executing in a multicore environment;an efficient algorithm for a generalized LCS problem;on exploring a quantum particle swarm optimization method for urban traffic light scheduling;identifying repeated interleavings to improve the efficiency of concurrency bug detection;global reliability evaluation for cloud storage systems with proactive fault tolerance;performance evaluation and optimization of Wi-Fi display on android;a novel scheduling algorithm for file fetch in transparent computing;joint power and reduced spectral leakage-based resource allocation for d2d communications in 5G and an optimization strategy of energy consumption for data transmission based on optimal stopping theory in mobile networks.
暂无评论