parallel, multithreaded Java applications such as Web servers, database servers, and scientific applications are becoming increasingly prevalent. Most of them have high object instantiation rates through the new bytec...
详细信息
parallel, multithreaded Java applications such as Web servers, database servers, and scientific applications are becoming increasingly prevalent. Most of them have high object instantiation rates through the new bytecode that is implemented in a garbage collection subsystem typically. For aforementioned applications, traditional garbage collectors are often the bottleneck that limits program performance and processor utilization on multiprocessor systems. They suffer from long garbage collection pauses (stop-the-world mark-sweep algorithm) or inability of collecting cyclic garbage (reference counting approach). Generational garbage collection, however, is based only on the weak generational hypothesis that most objects die young. In this paper, a new multithreaded concurrent generational garbage collector (MCGC) based on mark-sweep with the assistance of reference counting is proposed. The MCGC can take advantage of multiple CPUs in an SMP system and the merits or light weight processes. Furthermore, the long garbage collection pause can be reduced and the garbage collection efficiency can be enhanced.
In this paper, we present an in-depth analysis of the memory system performance of the DSS commercial workloads on two state-of-the-art multiprocessors: the SGI Origin 2000 and the HP V-Class. Our results show that a ...
详细信息
In this paper, we present an in-depth analysis of the memory system performance of the DSS commercial workloads on two state-of-the-art multiprocessors: the SGI Origin 2000 and the HP V-Class. Our results show that a single query process takes almost the same amount of cycles in both machines. However; when multiple query processes run simultaneously on the system, the execution time tends to increase more in SGI Origin 2000 than in HP V-Class due to the more expensive communication overhead in SGI Origin 2000. We also show how the rate at which the number of data cache misses, context switches and the overall execution time increases when more query processes run simultaneously.
In this paper we focus on the temporary return of data values that are incorrect for given transactional semantics and could have catastrophic effects similar to those in parallel and discrete event simulation. In man...
详细信息
In this paper we focus on the temporary return of data values that are incorrect for given transactional semantics and could have catastrophic effects similar to those in parallel and discrete event simulation. In many applications using on-line transaction processing (OLTP) environments, for instance, it is best to delay the response to a transaction's read request until it is either known or unlikely that a write message from an older update transaction will not make the response incorrect. Examples of such applications are those where aberrant behavior is too costly, and those in which precommitted data are visible to some reactive entity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporally correct (RFMVTC) concurrency control algorithm. We discuss the algorithm, its implementation and report on the performance results of simulation models using a cluster of workstations.
The dQUOB system conceptualization of data streams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if no...
详细信息
The dQUOB system conceptualization of data streams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not thousands of data streams. Experience with dQUOB has shown the need for more aggressive memory management to achieve the scalability we desire. This paper addresses the problem with a two-fold solution. The first one is replacement of the existing first-come first-served scheduling algorithm with an earliest job first algorithm which we demonstrate to yield better average service time. The second one is an introspection algorithm that sets and adapts the sizes of join windows in response to the knowledge acquired at runtime about event rates. In addition to the potential for significant improvements in memory utilization, the algorithm presented here also provides a means by which the user can reason about join window sizes. Wide area measurements demonstrate the adaptive capability required by the introspection technique.
The paper describes Shaman, our distributed architectural simulator of shared memory multiprocessors (SMP). The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction leve...
详细信息
The paper describes Shaman, our distributed architectural simulator of shared memory multiprocessors (SMP). The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction level behavior of a target multiprocessor in parallel and a back-end node to simulate the target memory system. The front-end also simulates the logical behavior of the shared memory using a software DSM (distributed shared memory) technique and generates memory references to drive the back-end. A remarkable feature of our simulator is reference filtering to reduce the amount of the references transferred from the front-end to the back-end utilizing the DSM mechanism and coherent cache simulation on the front-end. This technique and our sophisticated DSM implementation give an extraordinary performance to the Shaman simulator. We achieved 335 million and 392 million simulation clock per second for LU decomposition and FFT in SPLASH-2 kernel benchmarks respectively, when we used 16 front-end nodes to simulate a 16-way target SMP.
In order to provide fast and timely answers to queries in the context of spatial databases and GIS, we present our solution for effective data migration and tuning strategies in shared-nothing parallel spatial databas...
详细信息
ISBN:
(纸本)9781581134438
In order to provide fast and timely answers to queries in the context of spatial databases and GIS, we present our solution for effective data migration and tuning strategies in shared-nothing parallel spatial databases. Our purpose is to improve the performance of the indexes. Our approach has the following features. First, our scheme is self-tuning, dynamic as well as query-centric and it can adapt to dynamically changing user access patterns. second, a global distributed R-tree-based indexing method is employed to facilitate effective data migration. Third, unlike traditional partitioning strategies where each processing element (PE) contains data from a single region of space, we allow each PE to store data from multiple and disjoint regions. This minimizes overlap in regions as well as coverage. We implemented the proposed scheme and conducted an extensive performance study on Fujitsu's AP3000 machine with 32 workstations using real datasets. Our experimental results show that our load-balancing strategy can distribute the load effectively across the PEs in the system, thereby reducing response times of incoming queries.
A new parallel semiconductor device simulation using the dynamic load balancing approach is presented. This semiconductor device simulation based on adaptive finite volume error estimation, and monotone iterative meth...
详细信息
暂无评论