We address a significant problem in parallelprocessing research, namely, how to port existing sequential programs to run efficiently on parallel machines (the 'dusty deck' problem). Conventional domain-indepe...
详细信息
ISBN:
(纸本)0818667052
We address a significant problem in parallelprocessing research, namely, how to port existing sequential programs to run efficiently on parallel machines (the 'dusty deck' problem). Conventional domain-independent techniques are inadequate for solving this problem because they miss significant opportunities of parallelism. We present experimental evidence to support our claim, analyze why current techniques are inadequate, and propose a knowledge-based reverse engineering approach for attacking this problem.
This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We ...
详细信息
ISBN:
(纸本)9781728165820
This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We propose an automatable methodology to parallelize scientific applications designed with a purely sequential programming mindset, thus possibly using global variables, aliasing, random number generators, and stateful functions. We demonstrate the methodology by way of an astrophysical application, where we model at the same time the kinematic profiles of 30 disk galaxies with a Monte Carlo Markov Chain (MCMC), which is sequential by definition. The parallel code exhibits a 12 times speedup on a 48-core platform.
parallel systems are being used with increasing frequency to solve high-volume and/or high-complexity problems in both business and academia [1]. The cost-effectiveness of using many independent workstations over a hi...
详细信息
ISBN:
(纸本)1892512416
parallel systems are being used with increasing frequency to solve high-volume and/or high-complexity problems in both business and academia [1]. The cost-effectiveness of using many independent workstations over a high-speed network has become an attractive alternative to the high cost of purchasing and maintaining a traditional mainframe environment [2]. Simulation models, be they high-end commercial packages such as OPNET Modeler or specialized implementations such as that developed in conjunction with [3], should be capable of providing accurate and detailed information on these types of parallel systems. However, this accuracy is often at odds with the sort of flexibility required to consider a wide array of differing systems, or even individual systems with special characteristics. Without the ability to effectively extend the capabilities of a simulation package to incorporate the sort of characteristics that make a parallel system worth investigating, the result is nothing more than a ballpark estimate hindered by the limitations of the existing model. Im this paper, we discuss some of the key design decisions and techniques of use in the development of extensible, event-based simulation modeling tools for the investigation of parallel systems.
Nowadays parallel software system is very common and practical. However, it is difficult to test parallel software, because the state space of parallel software is very large. Therefore, a parallel model simplificatio...
详细信息
ISBN:
(纸本)9781538637906
Nowadays parallel software system is very common and practical. However, it is difficult to test parallel software, because the state space of parallel software is very large. Therefore, a parallel model simplification method based on CPN (Color Petri Net) is proposed. Based on the original CPN, the CPN model for the tested behavior(Tested Behavior of CPN, TBoCPN) is proposed. The target of the test is described as the tested behavior. The relevant behavior is described as the behavior related to the tested behavior, then, the homogeneous concurrent branch group and the selection branch set are divided. Finally, the branches of the concurrent branch group and the selected branch set, which satisfied the condition of algorithm, are sequentially processed by the inhibitor arcs. The experiment shows that the reduction rate is at least 60%, and before and after the reduction, the full coverage test path generated by the tested behavior is not affected, thus proving that the method is an effective test method.
A codesign is the simultaneous design of hardware and software subsystems. In our codesign, we exploit the highly parallel nature of matrix multiplication which cannot be exploited in our purely software implementatio...
详细信息
ISBN:
(纸本)1892512416
A codesign is the simultaneous design of hardware and software subsystems. In our codesign, we exploit the highly parallel nature of matrix multiplication which cannot be exploited in our purely software implementation. The hardware part of our codesign system is responsible for performing the arithmetic operations. This includes the matrix multiplier, which performs concurrent multiplication and addition operations of matrix multiplication. Our matrix multiplier is modeled in VHDL and runs on an ARC-PCI FPGA board. The purpose of the software part of our codesign system is to provide I/O to the hardware. This part is implemented on a PC with a C program and a device driver to communicate with the board. We present the performance comparison of our codesign and purely software implementation, as well as the performance comparison of existing parallel implementations. Examples of applications that require large, fast matrix multiplication are bipartite graph determination (non-existence of odd cycles), Economics (Leontief input-output model), power-invariant transformations (power systems), Cryptography, and genetics modeling (Markov chains).
We propose an improved Lustre-like distributed file system: High-throughput and Scalable parallel File System (HTSPFS). It has some new features that Lustre is short of, such as automatic adaptive file striping policy...
详细信息
ISBN:
(纸本)1932415602
We propose an improved Lustre-like distributed file system: High-throughput and Scalable parallel File System (HTSPFS). It has some new features that Lustre is short of, such as automatic adaptive file striping policy based on file size, access pattern and storage servers status;using multiple network adapters synchronously to speedup network access, prefetching read-request and writing behind write-request to enable a higher throughput in HTSPFS;Mirroring the striped data between two groups of server nodes to achieve loading balance and Fault-Tolerant etc. We have implemented a prototype system and got some nicety experiment results.
This paper presents an efficient multithreaded computation model for parallel approximate string matching with k-mismatches. The proposed approach combines the efficiency of the hardware algorithm with the flexibility...
详细信息
ISBN:
(纸本)1932415262
This paper presents an efficient multithreaded computation model for parallel approximate string matching with k-mismatches. The proposed approach combines the efficiency of the hardware algorithm with the flexibility of the multithreading. Unlike other high performance software algorithms, the approach does not use preprocessing and table lookups. Thus it is space and time efficient. The computation model is suitable for both message passing and shared memory implementation. For the practice, the computation model is implemented and tested using Java Threads that provides many benefits including the popularity and the portability. Time complexity of the proposed model is O(((n-m)/d) + m), where n and m represent lengths of reference and pattern strings respectively, and d is the degree of parallelism that is explicitly controllable. In the case of nm, O(n/d) time is expected.
The strategy behind parallel computation is that a given problem is partitioned into multiple independent tasks with appropriate grain size and these tasks are scheduled over multiprocessors for execution to achieve h...
详细信息
ISBN:
(纸本)1932415262
The strategy behind parallel computation is that a given problem is partitioned into multiple independent tasks with appropriate grain size and these tasks are scheduled over multiprocessors for execution to achieve high performance. Task partitioning and scheduling can have a significant impact on the performance characteristics of a large parallel system. Based on our previous studies on 3D optical image reconstruction, a task partitioning and analysis framework has been proposed to perform both coarse-grained and fine grained decompositions. The task dependency is represented as a directed acyclic graph (DAG) which simplifies both the mapping between tasks and multiprocessors, and the task scheduling of parallel computing with master-slave architecture. The framework may provide a guideline for developing a parallel optical image reconstruction system (POIRS).
Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. In this context multi-agent systems are often used to model and simulate complex systems. Dep...
详细信息
ISBN:
(纸本)9781467387767
Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. In this context multi-agent systems are often used to model and simulate complex systems. Depending on the characteristics of the modelled system, methods used to represent the system may vary. Whatever the modelling techniques used, increasing the size and the precision of a model increases the amount of computation needed, requiring the use of parallel systems when it becomes too large. Usually, to efficiently run on parallel resources, the model must be adapted to be distributed. In this paper, we propose a new modelling approach, based on nested graphs, that allows the design of large, complex and multi scale multi-agent models which can be efficiently distributed on parallel resources. A PDMAS (parallel and distributed Multi Agent Platform) that supports this approach and efficiently run parallel multi-agent models is introduced.
distributed Shared Memory (DSM) has many advantages in heterogeneous environments, such as geographically distant clusters or the Grid. These include locality utilization and replication transparency. The fact that pr...
详细信息
ISBN:
(纸本)1932415262
distributed Shared Memory (DSM) has many advantages in heterogeneous environments, such as geographically distant clusters or the Grid. These include locality utilization and replication transparency. The fact that processes communicate indirectly through memory rather than directly, is giving DSM these advantages. This paper describes a DSM experiment targeted at a small grid environment. This environment consists of two geographically distant clusters located in Scandinavia. All inter-cluster communication go through a dedicated gigabit line that is about 300km (186 miles) long. The DSM system named Global Post-Set (GPS) [1] has been used in this experiment. GPS has been specially designed for grid environments, by means of locality utilization, replication and the ability to migrate consistency control.
暂无评论