We address the problem of soft Quality of Service (QoS) requirements for multimedia applications (e.g., distance education, telemedicine, electronic commerce). These applications need to be able to co-exist with more ...
详细信息
ISBN:
(纸本)0769507719
We address the problem of soft Quality of Service (QoS) requirements for multimedia applications (e.g., distance education, telemedicine, electronic commerce). These applications need to be able to co-exist with more traditional applications for transaction and data processing and have soft real-time requirements. Unlike most other work in QoS management, we provide a framework that does not require users or application developers to have detailed knowledge of the resources needed and resource scheduling and allocation techniques in use. These underlying details are effectively hidden. In this paper, we describe our strategy, an architecture of services to support the strategy and a prototype.
In the past 20 years, researchers were able to demonstrate that the task of developing distributedapplications can be separated into two orthogonal parts: the computation and the coordination parts. Component-based s...
详细信息
ISBN:
(纸本)1892512459
In the past 20 years, researchers were able to demonstrate that the task of developing distributedapplications can be separated into two orthogonal parts: the computation and the coordination parts. Component-based software development (CBSD) adds another facet of this orthogonality. The main idea of CBSD, where (off-the-shelf) components are put together with minimum effort, is fundamentally a coordination problem - coordination is what allows components to be arranged into large ensembles. The task of developing component-based systems requires extended coordination mechanisms that are not available in any current coordination model. This paper describes how an extension to the LINDA coordination model, called LOGOP, can be used to provide a better mechanism for the engineering of component-based software.
distributed Query processing requires the transmission of data between computers in a distributed system. In view of this fact, we describe a method to minimize the response time and the transmission cost for the dist...
详细信息
ISBN:
(纸本)1932415602
distributed Query processing requires the transmission of data between computers in a distributed system. In view of this fact, we describe a method to minimize the response time and the transmission cost for the distributed query processing problem. We show that the distributed query processing problem can be transformed to a query graph and an optimal cost join program is that of finding a set of cuts to that graph. The algorithm dynamically searches for the global optimal solution on a minimum solution tree, which is a subset of the solution tree of the traditional branch-and-bound method. The optimality criteria are the equality of data transmission and minimum transmission cost.
In broadcast disks environment, conventional concurrency control mechanisms are not applicable because of asymmetric communication bandwidth between the server and mobile clients. Moreover, most of applications in bro...
详细信息
ISBN:
(纸本)1932415262
In broadcast disks environment, conventional concurrency control mechanisms are not applicable because of asymmetric communication bandwidth between the server and mobile clients. Moreover, most of applications in broadcast disks environment require real-time processing. In this paper, we propose a new concurrency control algorithm that can not only exploit the feature of broadcast disks environment but also acquire the correctness and the urgency of transactions. The algorithm allows the mobile client to validate and commit locally the read-only transactions. Furthermore, it can make the clients early detect the data conflict of update client transactions that would be happened at the server. These desirable features help client transactions to meet their deadlines.
The implementation of large-scale Monte Carlo computation on the grid benefits front state-of-the-art approaches to accessing a computational grid and requires scalable parallel random number generators with good qual...
详细信息
ISBN:
(纸本)1892512459
The implementation of large-scale Monte Carlo computation on the grid benefits front state-of-the-art approaches to accessing a computational grid and requires scalable parallel random number generators with good quality. The Globus software toolkit facilitates the creation and utilization of a computational grid for large distributed computational jobs. The Scalable parallel Random Number Generators (SPRNG) library is designed to generate practically infinite number of random number streams with favorable statistical properties for parallel and distributed Monte Carlo applications. Taking advantage of the facilities of the Globus toolkit and the SPRNG library, we implemented a tool we refer to as the Grid-Computing Infrastructure for Monte Carlo applications (GCIMCA). GCIMCA implements services specific to grid-based Monte Carlo applications, including the Monte Carlo subtask schedule service using the N-out-of-M strategy, the facilities of application-level checkpointing, the partial result validation service, and the intermediate value validation service. Based on these facilities, GCIMCA intends to provide a trustworthy grid-computing infrastructure for large-scale and high-performance distributed Monte Carlo computations.
Programmability and IEEE-standard floating point arithmetic makes the latest commodity graphics processors (GPUs) an attractive plaform for general parallel computing. In this paper we describe the implementation of t...
详细信息
ISBN:
(纸本)1932415262
Programmability and IEEE-standard floating point arithmetic makes the latest commodity graphics processors (GPUs) an attractive plaform for general parallel computing. In this paper we describe the implementation of the Warshall-Floyd algorithm on a class of GPUs. All-pairs shortest paths problem is relevant to many practical applications. Efficient GPU implementation of the Warshall-Floyd algorithm is challenging due to the algorithm's dynamic nature as well as limited GPU instruction set. GPU specific data organization, parallelization, and experimental results for several graphics accelerators are discussed Algorithm implementation on the GPU utilizes interpolators, vertex and fragment pipelines, as well as vector operations to maximize performance. Speed ups of up to 3x over a CPU implementation were achieved.
In this paper we propose a protocol for service discovery in decentralized environments. Our protocol guarantees that service descriptions can be found by any node in the network. It is meant to be used by application...
详细信息
ISBN:
(纸本)1932415610
In this paper we propose a protocol for service discovery in decentralized environments. Our protocol guarantees that service descriptions can be found by any node in the network. It is meant to be used by applications which cannot afford central components, e.g. due to ad-hoc formation. The protocol makes use of the structured overlay network Chord. Chord has logarithmic performance for looking up keys with respect to the number of nodes and guarantees to retrieve any entry stored in the network. This would not be possible in an unstructured overlay network without flooding the whole network. Service descriptions are decomposed into portions which can be efficiently distributed and retrieved. Though this implies additional costs for publishing of service descriptions, it improves efficiency of lookup operations. We implemented a Java prototype of our protocol as a proof of concept based on TCP/IP.
Sorting is a fundamental algorithm used extensively in computer science as an intermediate step in many applications. The performance of sorting algorithms is heavily influenced by the type of data being sorted, and t...
详细信息
ISBN:
(纸本)1932415262
Sorting is a fundamental algorithm used extensively in computer science as an intermediate step in many applications. The performance of sorting algorithms is heavily influenced by the type of data being sorted, and the machine being used. To assist in obtaining portable performance for sorting algorithms, we propose an install-time system for automatically constructing sequential and parallel sorts that are highly tuned for the target architecture. Our system has two steps: first a hybrid sequential divide-and-conquer sort is constructed and then this algorithm is parallelized using a shared work-queue model. To evaluate our system, we compare automatically generated sorting algorithms to sequential and parallel versions of the C++STL sort. The generated sorts are shown to be competitive with STL sort on sequential systems and to outperform the parallel STL sort on a 4 processor Xeon server.
An ongoing work is presented for accurately predicting the performance of distributedapplications in heterogeneous systems. We are developing dPerf, a tool built using the Rose framework for performing static analysi...
详细信息
ISBN:
(纸本)9780769543284
An ongoing work is presented for accurately predicting the performance of distributedapplications in heterogeneous systems. We are developing dPerf, a tool built using the Rose framework for performing static analysis and an automatic instrumentation on the input source code of programs written in C, C++ or Fortran. The accuracy in predicting program computation time resides in using hardware counters, as well as in applying two block benchmarking techniques that we propose in this paper. The current work makes use of a network simulator in order to calculate the communication time used in our approach. Afterwards, the computation and communication times are being summed up obtaining an estimation of the distributed application execution time. The approach is proven experimentally using NAS Integer Sort benchmark, the communications being simulated with SimGrid.
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this co...
详细信息
ISBN:
(纸本)9781467387767
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.
暂无评论