Iterating over all the elements of a set is a very common problem in highly parallel systems. In hardware, this is typically realized by either storing the set-membership of each element in a fixed-length bit-vector o...
详细信息
ISBN:
(纸本)9798350305760
Iterating over all the elements of a set is a very common problem in highly parallel systems. In hardware, this is typically realized by either storing the set-membership of each element in a fixed-length bit-vector or by storing just the indices of the members in a dynamically sized queue. However, the former solution is only efficient in terms of memory and runtime if each element is a member of the set with approximately 50% probability, whereas the latter is only efficient if the set is extremely sparse. We propose an alternative asynchronous, concurrent, distributed data structure based on a binary tree topology that is more efficient for sets with sparsity in between these two extremes. The proposed structure allows us to construct a set by adding individual elements one at a time in arbitrary order, and to iterate over all these elements exactly once, clearing the set in the process. We analyzed this data structure, simulated its behavior in CHP, synthesized it into an asynchronous digital circuit, optimized the circuits, and performed SPICE simulation to evaluate our design. The results confirm that our proposed structure offers a low-latency, low-power solution for moderately sparse data, and may thus prove useful for asynchronous and neuromorphic systems.
The BMI Eigenvalue Problem is one of optimization problems and is to minimize the greatest eigenvalue or a bilinear matrix function. This paper proposes a parallel algorithm to compute the ϵ-optimal solution of the BM...
详细信息
The BMI Eigenvalue Problem is one of optimization problems and is to minimize the greatest eigenvalue or a bilinear matrix function. This paper proposes a parallel algorithm to compute the ϵ-optimal solution of the BMI Eigenvalue Problem on parallel and distributedcomputing systems. The proposed algorithm performs a parallel branch and bound method to compute the e-optimal solution using the Master-Worker paradigm. The performance evaluation results on PC clusters and a Grid computing system showed that the proposed algorithm reduced computation time of the BMI Eigenvalue problem to 1/91 of the sequential computation time on. a PC cluster with 128CPUs and reduced that to 1/7 on a Grid computing system. The results also showed that tuning of the computational granularity on a worker was required to achieve the best performance on a Grid computing system.
With the ever-growing network traffic and the vast amount of abnormal traffic being created, anomaly detection methods have attracted close attention in the cybersecurity domain. Generative adversarial networks (GANs)...
详细信息
The proceedings contain 33 papers. The topics discussed include: on availability for blockchain-based systems;a horizontally scalable and reliable architecture for location-based publish-subscribe;automated fine tunin...
ISBN:
(纸本)9781538616796
The proceedings contain 33 papers. The topics discussed include: on availability for blockchain-based systems;a horizontally scalable and reliable architecture for location-based publish-subscribe;automated fine tuning of probabilistic self-stabilizing algorithms;reconfiguring parallel state machine replication;a statistical framework on software aging modeling with continuous-time hidden Markov model;hybrid-RC: flexible erasure codes with optimized recovery performance and low storage overhead;correlation-aware stripe organization for efficient writes in erasure-coded storage systems;optimal storage under unsynchronized mobile byzantine faults;pulp: achieving privacy and utility trade-off in user mobility data;CausalSpartan: causal consistency for distributed data stores using hybrid logical Clocks;DottedDB: anti-entropy without Merkle trees, deletes without tombstones;optimal cyber-defense strategies for advanced persistent threats: a game theoretical analysis;AutoFlowLeaker: circumventing web censorship through automation services;optimal network reconfiguration for software defined networks using shuffle-based online MTD;a greedy-based method for modified condition/decision coverage testing criterion;performance modeling of PBFT consensus process for permissioned blockchain network (hyperledger fabric);and detecting TCP-Based DDoS attacks in Baidu cloud computing data centers.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatoria...
详细信息
ISBN:
(纸本)0769523129
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. It is proven that the list scheduling algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed super computing with large applications. It is also observed that communication heterogeneity does have significant impact on schedule lengths.
This is an overview of the robust resource allocation research efforts that have been and continue to be conducted by the CSU robustness in computer systems group. parallel and distributedcomputing systems, consistin...
详细信息
This is an overview of the robust resource allocation research efforts that have been and continue to be conducted by the CSU robustness in computer systems group. parallel and distributedcomputing systems, consisting of a (usually heterogeneous) set of machines and networks, frequently operate in environments where delivered performance degrades due to unpredictable circumstances. Such unpredictability can be the result of sudden machine failures, increases in system load, or errors caused by inaccurate initial estimation. The research into developing models and heuristics for parallel and distributedcomputing systems that create robust resource allocations is presented.
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallelcomputing in CS/CE curriculum has become almost mandatory nowadays. A course on parallelcomputing Systems (PCS) has...
详细信息
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallelcomputing in CS/CE curriculum has become almost mandatory nowadays. A course on parallelcomputing Systems (PCS) has been designed to provide an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallelcomputing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. An activity based learning approach was adopted for teaching the course and several parallel programming paradigms and technologies such OpenMP, MPI, and CUDA have been covered. This course was offered as a required course to graduate students. This paper describes the implementation of the course at Thiagarajar College of Engineering. Evaluation of the implementation of the course reveals that for students who have not been exposed to parallel and distributedcomputing, i) activity based learning results in better knowledge gain compared to the traditional approach, ii) learning OpenMP was much easier than MPI or CUDA, iii) some parallel and distributedcomputing (PDC) concepts such as false sharing were harder to grasp compared to basic concepts, and iv) it is essential to introduce parallelcomputing in the undergraduate curriculum.
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of...
详细信息
ISBN:
(纸本)159593717X
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has been regarded as the performance bottleneck in most systems and applications. On the other side, the multicore processor, emerged as a solution for the clock-speed limitation of the modern CPUs, has been growingly prevalent. Leveraging the parallelism provided by the multicorere source to speedup the software execution is becoming the trend of the software development. In this paper, we present a parallel processing model for the XML document. The model is not designed just for a specific XML processing task, instead, it is a general model, by which we are able to explore various parallel XML document processing. The kernel of the model is a stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together. To show the feasibility and effectiveness of our approaches, we present our C# implementation of parallel XML serialization in this paper. Our empirical study shows our parallel XML serialization algorithm can improved the XML serializing performance significantly on a multicore machine. Copyright 2007 ACM.
The proliferation of high performance workstations and the emergence of high speed networks have attracted a lot of interest in parallel and distributedcomputing (PDC). The authors envision that PDC environments with...
详细信息
The proliferation of high performance workstations and the emergence of high speed networks have attracted a lot of interest in parallel and distributedcomputing (PDC). The authors envision that PDC environments with supercomputing capabilities will be available in the near future. However, a number of hardware and software issues have to be resolved before the full potential of these PDC environments can be exploited. The presented research has the following objectives: (1) to characterize the message-passing primitives used in parallel and distributedcomputing; (2) to develop a communication protocol that supports PDC; and (3) to develop an architectural support for PDC over gigabit networks.< >
暂无评论