Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a.k.a. progressive) sampling algorith...
详细信息
ISBN:
(纸本)9783030294007;9783030293994
Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a.k.a. progressive) sampling algorithms on multi-threaded shared-memory machines. Our basic algorithmic technique requires no synchronization except for atomic load-acquire and store-release operations. It does, however, require O(n) memory per thread, where n is the size of the sampling state. We present variants of the algorithm that either reduce this memory consumption to O(1) or ensure that deterministic results are obtained. Using the KADABRA algorithm for betweenness centrality (a popular measure in network analysis) approximation as a case study, we demonstrate the empirical performance of our techniques. In particular, on a 32-core machine, our best algorithm is 2.9x faster than what we could achieve using a straightforward OpenMP-based parallelization and 65.3x faster than the existing implementation of KADABRA.
An algorithm called SHORTP has been developed to enhance the performance of a single-stage interconnection network for parallelprocessing systems. Some other techniques for enhancing the performance of interconnectio...
详细信息
ISBN:
(纸本)0818607807
An algorithm called SHORTP has been developed to enhance the performance of a single-stage interconnection network for parallelprocessing systems. Some other techniques for enhancing the performance of interconnection networks are also presented. Simulation results are reported for all the techniques described.
Dynamic load balancing is an important technique when developing applications with unpredictable load distribution on distributed memory multicomputers. A tool, Dynamo, that can be used to utilize dynamic load balanci...
详细信息
The massive use of ontologies generates a large amount of semantic data. To facilitate their management, persistent solutions for storing and querying these semantic data loads have been proposed. This gave rise to a ...
详细信息
ISBN:
(纸本)9781643681955;9781643681948
The massive use of ontologies generates a large amount of semantic data. To facilitate their management, persistent solutions for storing and querying these semantic data loads have been proposed. This gave rise to a new type of databases, called ontology-based databases (OBDB). In recent years, the need for data and real-time services has increased significantly in a large number of applications. However, the OBDB does not implement any mechanism to address real-time applications which are characterized, not only by handling large amounts of data, but also by temporal constraints, to which can be submitted data and treatments. As well, geographically extended applications, requiring using real-time databases that manage data and distributedprocessing are increasingly *** applications are managed by distributed Real-Time DataBase Management System (DRT-DBMS). Like any system, the DRTDBMS, often go through overload phases, due to the unpredictable arrival of transactions submitted by users. In order to better manage Quality of Service (QoS) in these systems by facing instability periods, approaches based on distributed Feedback Control Scheduling (DFCS) were proposed. These approaches does not address the use of ontological data. In this paper, we propose an approach aiming to enhance QoS in DRTDBMS based on data replication. It consists in extending the DFCS architecture by the manipulation of ontological data as well as handling the execution of accessing transactions. In the extension we propose, we study the applicability of different data replication policies. The proposed architecture is then called Replication-Based-distributed Feedback Control Scheduling Architecture for Real-Time Ontology (Replication-Based-DFCS-RTO). We also show the contribution provided by our approach through simulation results.
The proceedings contain 155 papers. The special focus in this conference is on End-User applications of HPCN. The topics include: High performance integer optimization for crew scheduling;simulating synthetic polymer ...
ISBN:
(纸本)3540658211
The proceedings contain 155 papers. The special focus in this conference is on End-User applications of HPCN. The topics include: High performance integer optimization for crew scheduling;simulating synthetic polymer chains in parallel;real-time signal processing in a collision avoidance radar system using parallel computing;the impact of workload on simulation results for distributed transaction processing;computer simulation of ageing with an extended penna model;the scenario management tool SMARTFED for real-time interactive high performance networked simulations;an HPCN architecture for distributed component-based real-time simulations;airport simulation using CORBA and DIS;intelligent routing for global broadband satellite internet;adaptive scheduling strategy optimizer for parallel rolling bearing simulation;MPI-based parallel implementation of a lithography pattern simulation algorithm;parallelizing an high resolution operational ocean model;weather and climate forecasts and analyses at MHPCC;high performance parallel FEM for solid earth;elastic matching of very large digital images on high performance clusters;data intensive distributed computing;utilizing HPC technology in 3D cardiac modeling;a parallel algorithm for 3D reconstruction of angiographic images;a diffraction tomography method for medical imaging implemented on high performance computing environment;heterogeneous distribution of computations while solving linear algebra problems on networks of heterogeneous computers;modeling and improving locality for irregular problems and parallelization of sparse cholesky factorization on an SMP cluster.
The algorithm detailed below extends previous work on inversion of block tridiagonal matrices from the Hermitian/symmetric case to the general case and allows for varying sub-block sizes. The blocks of the matrix are ...
详细信息
ISBN:
(纸本)9783319780245;9783319780238
The algorithm detailed below extends previous work on inversion of block tridiagonal matrices from the Hermitian/symmetric case to the general case and allows for varying sub-block sizes. The blocks of the matrix are evenly distributed across p processes. Local sub-blocks are combined to form a matrix on each process. These matrices are inverted locally and the inverses are combined in a pairwise manner. At each combination step, the updates to the global inverse are represented by updating "matrix maps" on each process. The matrix maps are finally applied to the original local inverse to retrieve the block tridiagonal elements of the global inverse. This algorithm has been implemented in Fortran with MPI. Calculated inverses are compared with inverses obtained using the well known libraries ScaLAPACK and MUMPS. Results are given for matrices arising from DFT applications.
Practice shows that increasing the amount of instruction level parallelism (ILP) offered by an architecture (like adding instruction slots to VLIW instructions) does not necessary lead to significant performance gains...
详细信息
Practice shows that increasing the amount of instruction level parallelism (ILP) offered by an architecture (like adding instruction slots to VLIW instructions) does not necessary lead to significant performance gains. Instead, high hardware costs and inefficient use of this hardware may occur. Mapping embedded applications onto multiprocessor systems forms a very interesting extension to ILP. In this paper we describe our approach to the mapping of embedded programs written in ANSI C onto a pipeline of application specific processors. An efficient algorithm for functional pipelining of loops is presented. To validate its applicability the frequency tracking system is used as a case study. This typical embedded application is mapped onto a two-processor system delivering speedup of 1.88 in comparison with a highly optimized single core solution.
The mathematical basis of mathematical morphology is set theory, which is widely used in the field of image processing, and distributed computing methods will require significant computing resources case that is broke...
详细信息
The proceedings contain 36 papers. The topics discussed include: massively parallel skyline computation for processing-in-memory architectures;data motifs: a lens towards fully understanding big data and ai workloads;...
ISBN:
(纸本)9781450359863
The proceedings contain 36 papers. The topics discussed include: massively parallel skyline computation for processing-in-memory architectures;data motifs: a lens towards fully understanding big data and ai workloads;performance extraction and suitability analysis of multi- and many-core architectures for next generation sequencing secondary analysis;synergistic cache layout for reuse and compression;an efficient graph accelerator with parallel data conflict management;revealing parallel scans and reductions in recurrences through function reconstruction;compiler assisted coalescing;stencil codes on a vector length agnostic architecture;maximizing system utilization via parallelism management for co-located parallelapplications;a portable, automatic data quantizer for deep neural networks;mage: online interference-aware scheduling in multi-scale heterogeneous systems;and towards concurrency race debugging: an integrated approach of constraint solving and dynamic slicing.
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high...
详细信息
ISBN:
(纸本)9780819492791
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e. g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
暂无评论