This paper describes a performance evaluation study of data replication employing SQL Server with Windows 2000 server operating system in a networked distributed system. The simulated environment involves an isolated ...
详细信息
ISBN:
(纸本)1892512416
This paper describes a performance evaluation study of data replication employing SQL Server with Windows 2000 server operating system in a networked distributed system. The simulated environment involves an isolated LAN with 100/10 Mb bandwidth and the deployment of 16 Windows 2000 Servers as database publishers with another Windows 2000 Server as a central subscriber. The set up of transactional replication of SQL Server databases provides the predefined channels for data flow from each publisher to central subscriber. A JAVA multi-thread program is written to conduct the concurrent data publishing to the central subscriber database. Up to two millions data transactions benchmarks the performance of the SQL Server database in data replication. It also reports the maximum data flow rate and the issues regarding the data replication in the distributed network system.
Many real-time control systems in industry are designed today for single processor architectures. At the same time, more functionality needs to be integrated into the software system. In order to enable correct timely...
详细信息
ISBN:
(纸本)1892512459
Many real-time control systems in industry are designed today for single processor architectures. At the same time, more functionality needs to be integrated into the software system. In order to enable correct timely execution of the control and protection applications, designers may need to optimize application code aggressively. Unwanted simplifications of algorithms or low sampling frequencies of the environment may be the result. Functionality In a system, which already has a degree of concurrency, may enable the system to scale onto a multiprocessor environment. This paper discusses and presents results from a study, which separates a substation automation real-time I/O communication system from application level threads in order to exploit existing concurrency. Within the system model described here, as well as in many other system models, it is possible to execute communication mechanisms and applications in parallel. The motivation for this work Is let parallel execution of the I/O System and the application enable higher performance for application functionality. The result Is more flexibility for the application designers. By describing a model of the real-time substation automation I/O System and extending that model with a mechanism to enable execution in a multiprocessor architecture, we contribute to the understanding of both the composition and the performance issues concerning parallel execution In such industrial systems. Measurements and results originate from execution in an existing system and from the multiprocessor system created.
Given an n x n binary image of white and black pixels, we present an optimal parallel algorithm for computing the distance transform and the nearest feature transform using the Euclidean metric. The algorithm employs ...
详细信息
Given an n x n binary image of white and black pixels, we present an optimal parallel algorithm for computing the distance transform and the nearest feature transform using the Euclidean metric. The algorithm employs the systolic computation to achieve O(n) running time on a linear array of n processors.
The increasing complexity of modern and future computing systems makes it challenging to develop applications that aim for maximum performance. Hybrid parallel programming models offer new ways to exploit the capabili...
详细信息
ISBN:
(纸本)9781728165820
The increasing complexity of modern and future computing systems makes it challenging to develop applications that aim for maximum performance. Hybrid parallel programming models offer new ways to exploit the capabilities of the underlying infrastructure. However, the performance gain is sometimes accompanied by increased programming complexity. We introduce an extension to PyCOMPSs, a high-level task-based parallel programming model for Python applications, to support tasks that use MPI natively as part of the task model. Without compromising application's programmability, using Native MPI tasks in PyCOMPSs offers up to 3x improvement in total performance for compute intensive applications and up to 1.9x improvement in total performance for 110 intensive applications over sequential implementation of the tasks.
HPC systems and parallelapplications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In thi...
详细信息
ISBN:
(纸本)9781728165820
HPC systems and parallelapplications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In this paper we describe a performance analysis method and we apply it to four complex HPC applications. We perform our study on a pre-production HPC system powered by the latest Arm-based CPUs for HPC, the Marvell ThunderX2. For each application we spot inefficiencies and factors that limit their scalability. The results show that in several cases the bottlenecks do not come from the hardware but from the way applications are programmed or the way the system software is configured.
distributed systems consisting of clusters of computing nodes are becoming increasingly popular for solving long running applications. Checkpoint and recovery is a common technique for providing fault tolerance to suc...
详细信息
ISBN:
(纸本)1892512416
distributed systems consisting of clusters of computing nodes are becoming increasingly popular for solving long running applications. Checkpoint and recovery is a common technique for providing fault tolerance to such applications. In non-cluster based distributed systems, the entire system employs a single checkpoint and recovery protocol. However, in a cluster based system, the constituent clusters may employ different checkpoint and recovery protocols for fault tolerance inside the cluster boundary, for reasons of administrative policy or resource constraints. In this paper we investigate the problem of co-existence of different checkpoint and recovery protocols in different clusters. The problem of employing coordinated and message logging protocols, two of the most popular checkpoint and recovery protocols, in different clusters is discussed. A protocol is presented to provide a consistent checkpoint and recovery based fault tolerance for the entire system, when the individual clusters are running either coordinated or message logging protocols.
Much research has been done in the area of software transactional memory. (STM) as a new programming paradigm to help ease the implementation of parallelapplications. While most research has been invested for answeri...
ISBN:
(纸本)9781424437511
Much research has been done in the area of software transactional memory. (STM) as a new programming paradigm to help ease the implementation of parallelapplications. While most research has been invested for answering the question of how STM should be implemented, there is less work about how to use STM efficiently. This paper is focused on the challenge of how to use STM for efficient and scalable implementations of non-trivial applications. We present a fine-grained STM-based concurrent binary heap, an application of STAT for a data structure that is notoriously difficult to parallelize. We describe extensions to the basic STM approach and also the benefits of our proposal. Our results show that the fine-grained STM-based binary heap provides very good scalability compared to the naive approach. Nevertheless, rye reach a point where the complexity of some fine-grained techniques do not justify its use for the increase in performance that can be obtained.
The real-time scheduling schemes proposed for RT CORBA are mostly priority-based, soft real-time scheduling schemes. The problem of the previous scheme is that the priority giving and the request allocating procedure ...
详细信息
ISBN:
(纸本)1892512459
The real-time scheduling schemes proposed for RT CORBA are mostly priority-based, soft real-time scheduling schemes. The problem of the previous scheme is that the priority giving and the request allocating procedure are considered as two different things. In the worst case, the tasks of imminent deadlines can be allocated on the same sever and the continuous deadline violations can occur. In general real-time system, the punctuality of deadline is more emphasized than the task throughput. Therefore, a modified scheduling algorithm is required, which takes the priority distribution into account when allocating a request. Our scheduling scheme, Priority-based RR tries to evenly distribute the task priorities on local severs by controlling the Round-Robin scheduling order according to the task urgency. Simulation says that Priority-based RR distribution can show the cost effective performance when the system load isn't too high.
This paper presents the StreamGen load generator, which is targeted at distributed information flow applications. These include the event streaming services used in wide-area publish/subscribe systems or in operationa...
详细信息
ISBN:
(纸本)0769521975
This paper presents the StreamGen load generator, which is targeted at distributed information flow applications. These include the event streaming services used in wide-area publish/subscribe systems or in operational information systems, the data streaming services used in remote visualization or collaboration, and the continuous data streams occurring in download services. Running across heterogeneous distributed platforms, these services are implemented by computational component that capture, manipulate, and produce information streams and are linked via overlay topologies. StreamGen can be used to produce the distributed computational and communication loads imposed by these applications. Dynamic application behaviors can be created with mathematical specifications or with behavior traces collected from application-level traces. An interesting set of traces presented in this paper is derived from long-term observations of the FTP download patterns observed at the Linux mirror site being run by the CERCS research center at the Georgia Institute of Technology. Two different flow-based applications are created and evaluated with StreamGen. The first emulates the data streaming behavior in a distributed scientific collaboration, where a scientific simulation (i.e., a molecular dynamics code) produces simulation data sent to and displayed for multiple, interactive remote users. The second emulates portions of the event-streaming behavior of an operational information system used by a large U.S. corporation. Parametric studies with StreamGen's FTP traces applied to these applications are used to evaluate different load balancing strategies for the cluster machines manipulating these applications' data streams.
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even thou...
详细信息
ISBN:
(纸本)9781728116440
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even though very efficient algorithms have been defined to solve such a subgraph isomorphisms problem, the complexity of current real biological networks make their sequential execution time prohibitive. On the other hand, parallel architectures, from multi-core to many-core, have become pervasive to deal with the problem of the data size. Nevertheless, the sequential nature of the graph searching algorithms makes their implementation for parallel architectures very challenging. This paper presents three different parallel solutions for the graph searching problem. The first two target the exact search for multi-core CPUs and many-core GPUs, respectively. The third one targets the approximate search for GPUs, which handles node, edge, and node label mismatches. The paper shows how different techniques have been developed in all the solutions to reduce the search space complexity. The paper shows the performance of the proposed solutions on representative biological networks containing antiviral chemical compounds and protein interactions networks.
暂无评论