the proceedings contains 66 papers. the following topics are dealt with: object-oriented distributed systems;interconnection architectures;remote invocation mechanisms;tree- and cube-connected architectures;load distr...
详细信息
ISBN:
(纸本)081860865X
the proceedings contains 66 papers. the following topics are dealt with: object-oriented distributed systems;interconnection architectures;remote invocation mechanisms;tree- and cube-connected architectures;load distribution;analysis of bus structures;operation system facilities for multiprocessors;distributed system services;distributed system algorithms;concurrent programming languages;performance studies;monitoring and debugging;distributed software systems;protocols for reliable real-time systems;distributed system applications;reliable multicast communication and replication;improving data availability;load balancing;distributed query processing;and multiprocessor system modeling.
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the ...
详细信息
ISBN:
(纸本)0769515126
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the previous serial program, the computation of curvature, the first-order and the second-order difference were determined to be the main objects of parallelization. Some processing techniques were applied to convert the serial program into parallel program, such as the strategy of "Divide and Conquer", the balance of the loading distribution. Numerical simulation computation of the parallel program results in a great increase of computing speed of the non-ideal 3-D space detonation wave propagation.
We present the analysis of approaches to solve an author gender identification task for Russian-language texts with gender deception, using different Data-Driven models based on conventional machine learning (Support ...
详细信息
We present the analysis of approaches to solve an author gender identification task for Russian-language texts with gender deception, using different Data-Driven models based on conventional machine learning (Support Vector Classifier, Decision Tree, Gradient Boosting) and neuronet algorithms (convolutional layers, long short-term memory layers, etc.) the source of training and testing data are collections of texts from the Gender Imitation corpus, expanded by crowd-sourcing and supplemented with files of RusProfiling and RusPersonality corpora. the reached accuracy of this task milestone is presented and discussed. (C) 2018 the Authors. Published by Elsevier Ltd. this is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/3.0/)Peer-review under responsibility of the scientific committee of the 8th Annual internationalconference on Biologically Inspired Cognitive architectures
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. this paper presents an analysis of a texture segmentation application containing ...
详细信息
ISBN:
(纸本)0818656026
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. this paper presents an analysis of a texture segmentation application containing a 96x96 convolution. Sequential execution required several hours on single processors systems with over 99% of the time spent performing the large convolution. 70% to 75% of execution time is attributable to cache misses within the convolution. We implemented the same application on CM-5, iPSC/860 and PVM distributed memory multicomputers, tailoring the parallelalgorithms to each machine's architectures. parallelization significantly reduced execution time, taking 49 second on a 512 node CM-5 and 6.5 minutes on a 32 node iPSC/860.
Work-efficient task-parallelalgorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bo...
详细信息
ISBN:
(纸本)9781728136134
Work-efficient task-parallelalgorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bottlenecks. Speculatively relaxing order of tasks across cores using the Galois framework shows promise as false dependencies generated by strict queuing constraints are mitigated to unlock task parallelism. However, relaxed ordering results in redundant work, for which Galois relies on static measures to improve work-efficiency. this paper proposes a dynamic multi-level parent-child task dependency checking mechanism in Galois to prune redundant work by exploiting monotonic properties of shared data values. Evaluation on a 40-core Intel Xeon multicore shows an average of 2x performance improvements over state-of-the-art ordered and relax ordered graph algorithms.
processing of big scale-free graphs on parallelarchitectures with high parallelization opportunities connected with a lot of overheads. Due to skewed degree distribution each thread receives different amount of compu...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
processing of big scale-free graphs on parallelarchitectures with high parallelization opportunities connected with a lot of overheads. Due to skewed degree distribution each thread receives different amount of computational workload. In this paper we present a method devoted to address this challenge by modificating CSR data structure and redistributing work across threads. the method was implemented in breadth-first search and single source shortest pathalgorithms for GPU architecture.
this is an overview of the material to be discussed in the invited keynote presentation by H. J Siegel;it summarizes our research in [2, 16, and 17]. the resources in parallel computer systems (including heterogeneous...
详细信息
ISBN:
(纸本)0769525091
this is an overview of the material to be discussed in the invited keynote presentation by H. J Siegel;it summarizes our research in [2, 16, and 17]. the resources in parallel computer systems (including heterogeneous clusters) should be allocated to the computational applications in a way that maximizes some system performance measure. However, allocation decisions and associated performance prediction are often based on estimated values of application and system parameters. the actual values of these parameters may differ from the estimates;for example, the estimates may represent only average values, the models used to generate the estimates may have limited accuracy, and there may be changes in the environment. thus, an important research problem is the development of resource management strategies that can guarantee a particular system performance given such uncertainties. To address this problem, we have designed a model for deriving the degree of robustness of a resource allocation-the maximum amount of collective uncertainty in system parameters within which a user-specified level of system performance (QoS) can be guaranteed. the model will be presented and we will demonstrate its ability to select the most robust resource allocation from among those that otherwise perform similarly (based on the primary performance criterion). the model's use in allocation heuristics also will be demonstrated. this model is applicable to different types of computing and communication environments, including parallel, distributed cluster, grid, Internet, embedded, and wireless.
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We ...
详细信息
ISBN:
(纸本)0769511538
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We implemented the original and adaptive parallel DJ algorithms on a network of Alpha workstations using the parallel Virtual Machine (PVM). We analyzed the performance of the algorithms, and compared it withthat of the parallel Hybrid-Hash (KH) join algorithms. Our results show that the parallel DJ algorithms perform comparably withthe parallel HH join algorithms over the entire range of the number of processors used and for different join selectivities. A significant advantage of the parallel DJ algorithms is that they can easily support non-equijoin operations.
GPUs (Graphics processing Units) are designed to solve large data-parallel problems encountered in the fields of age processing, scene rendering, video playback, and gaming. CPUs are therefore designed to handle a hig...
详细信息
ISBN:
(纸本)9789380544120
GPUs (Graphics processing Units) are designed to solve large data-parallel problems encountered in the fields of age processing, scene rendering, video playback, and gaming. CPUs are therefore designed to handle a higher degree of parallelism as compared to conventional CPUs. GPGPU (General Purpose computing on Graphics processing Units) enables users to do parallel computing on the graphics hardware commonly available on current personal computers. these days' systems are available with multi-core GPUs that provide the necessary hardware infrastructure, thereby enabling high performance computing on personal computers. NVIDIA's CUDA (Compute Unified Device Architecture) and the industry standard OpenCL (Open Computing Language) provides the software platform required to utilize the graphics hardware to solve computational problems using parallelalgorithms, otherwise solvable mostly in supercomputing environments. this paper presents two parallel CREW (Concurrent Read Exclusive Write) PRAM algorithms for optimal coloring of general graphs on stream processingarchitectures such as the CPU. the algorithms are implemented using OpenCL. the first algorithm presents the techniques for computing vertex independent sets on the GPU and then assigns colors to them. the second algorithm focuses on the optimization of the vertex independent set computation for edge-transitive graphs by taking advantage of the structures of such graphs and then assigns color to each of the normalized independent sets.
In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be ...
详细信息
ISBN:
(纸本)9781538610428
In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be significant. In particular, for the bipartite-graph partial coloring (BGPC) and distance-2 graph coloring (D2GC) problems, which have various use-cases within the scientific computing and numerical optimization domains, the coloring overhead can be in the order of minutes with a single thread for many real-life graphs. In this work, we propose parallelalgorithms for bipartite-graph partial coloring on shared-memory architectures. Compared to the existing shared-memory BGPC algorithms, the proposed ones employ greedier and more optimistic techniques that yield a better parallel coloring performance. In particular, on 16 cores, the proposed algorithms are more than 4x faster than their counterparts in the ColPack library which is, to the best of our knowledge, the only publicly-available coloring library for multicore architectures. In addition to BGPC, the proposed techniques are employed to devise parallel distance-2 graph coloring algorithms and similar performance improvements have been observed. Finally, we propose two costless balancing heuristics for BGPC that can reduce the skewness and imbalance on the cardinality of color sets (almost) for free. the heuristics can also be used for the D2GC problem and in general, they will probably yield a better color-based parallelization performance especially on many-core architectures.
暂无评论