this paper presents T&D-Bench, an integrated suite of tools for modeling and simulating state-of-the-art processors, which is composed of two main parts. SimPL is an object-oriented methodology for modeling the be...
详细信息
this paper presents T&D-Bench, an integrated suite of tools for modeling and simulating state-of-the-art processors, which is composed of two main parts. SimPL is an object-oriented methodology for modeling the behavior of an instruction set, with precise information on the timing of basic instruction steps. the methodology is general and allows easy modeling of various architecture types. the second part of the suite is CSPSim, an open set of visualization tools that communicate with any number of SimPL models based on a client-server architecture. T&D-Bench gathers the main advantages of teaching environments, with a rich user interface, and design environments, with resources for modeling any complex processor architectures.
the classical methods of fragmentation in a database distributed system helps, to a great extent, to make the information retrieval faster. this is particularly true for applications where the specifications are well ...
详细信息
the classical methods of fragmentation in a database distributed system helps, to a great extent, to make the information retrieval faster. this is particularly true for applications where the specifications are well known in advance at the time of the creation of the tables that compose the database which in some form or other, influenced the design and definition of the type of fragmentation and their distribution on different sites of processing. Nevertheless, the above characteristics cannot be used in applications where the distributed management cannot do inferences that help it to know in what sites the data with some specific characteristics are located. Under these conditions the time and amount of work used by the participants in the query solution can be highly increased. In this paper we show an approach called the virtual fragmentation method. It works as an alternative way that allows us to diminish the response time consumed by queries using horizontal fragmented tables.
this paper examines the issue of dynamically scheduling applications on a wide-area network computing system. We construct a simulation model for wide-area task allocation problem and study the performance of the prop...
详细信息
this paper examines the issue of dynamically scheduling applications on a wide-area network computing system. We construct a simulation model for wide-area task allocation problem and study the performance of the proposed algorithm under different conditions. the simulation results indicate that the wide-area scheduling algorithm is sensitive to several parameters including machine failure rates, the local queuing policies, and arrival rates.
We show how novel active memory system research and system networking trends can be combined to realize hardware distributed shared memory on clusters of industry-standard workstations. Our active memory controller ex...
详细信息
Projects like SETI@home have demonstrated the tremendous capabilities of Internet-connected commodity resources. the rapid improvement of commodity components makes the global computing platform increasingly viable fo...
详细信息
Projects like SETI@home have demonstrated the tremendous capabilities of Internet-connected commodity resources. the rapid improvement of commodity components makes the global computing platform increasingly viable for other large-scale data and compute-intensive applications. In this paper, we study how global computing can accommodate new types of applications. We describe a global computing model that captures resource characteristics and instantiate this model with data from several surveys and studies. We propose performance metrics for global computing applications and evaluate two scheduling mechanisms in simulation. We then draw conclusions concerning the development and enhancement of global computing systems.
Query scheduling plays an important role when systems are faced with limited resources and high workloads. It becomes even more relevant for servers applying multiple query optimization techniques to batches of querie...
Communication optimizations play a crucial role in performance of parallel applications which are compiled and executed on distributed memory machines. Multithreaded architectures can support multiple threads of execu...
详细信息
Communication optimizations play a crucial role in performance of parallel applications which are compiled and executed on distributed memory machines. Multithreaded architectures can support multiple threads of execution on each processor, with low-cost thread initiation, low-overhead communication, and efficient data transfer and synchronization between threads on different processors. these mechanisms can be used for achieving an effective overlap between communication and computation, and therefore, good performance on communication intensive parallel applications. We focus on generating correct and efficient multithreaded code for array based programs that involve different classes of communication patterns. We consider producer-consumer, scalar reductions, and near-neighbor communication patterns. We describe multithreaded programming methodologies suitable for handling loops with each of these patterns. We further show how a compiler can generate threaded code for loops with such patterns. We present experimental results from two benchmark programs, CG, and Tomcatv. Our results show that: 1) the compiler generated multithreaded code achieves highperformance, not previously seen from distributed memory compilers, and 2) the performance of compiler generated code is comparable to the performance of hand-written multithreaded codes.
Skewed-associativity is a technique that reduces the miss ratios of CPU caches by applying different indexing functions to each way of an associative cache. Even though it showed impressive hit/miss statistics, the sc...
详细信息
Skewed-associativity is a technique that reduces the miss ratios of CPU caches by applying different indexing functions to each way of an associative cache. Even though it showed impressive hit/miss statistics, the scheme has not been welcomed by the industry, presumably because implementation of the original version is complex and might involve access-time penalties among other costs. this paper presents a simplified, easy to implement variant that we call "minimally-skewed-associativity" (MSkA). We show that MSkA caches, for many cases, should not have penalties in either access time or power consumption when compared to set-associative caches of the same associativity. Hit/miss statistics were obtained by means of trace-driven simulations. Miss ratios are not as good as those for full skewing, but they are still advantageous. Minimal-skewing is thus proposed as a way to improve the hit/miss performance of caches, often without producing access-time delays or increases in power consumption as other techniques do (for example, using higher associativities).
Information services are an integral part of the grid architecture. It is the foundation of how resources are defined and their state known. More importantly, the user of the Grid gets a perspective of what a grid loo...
详细信息
Information services are an integral part of the grid architecture. It is the foundation of how resources are defined and their state known. More importantly, the user of the Grid gets a perspective of what a grid looks like, how it performs and what capabilities it has from information services. the Accelerated Strategic computing Initiative (ASCI) has designed and deployed a set of grid services within the context of the ASCI program. We deploy information services by augmenting the Globus toolkit in order to meet the unique aspects of the ASCI grid. We describe the decisions made and processes developed to run a grid information service in the ASCI grid.
the goal of this research is to develop performance profiles of parallel and distributed applications in order to predict their execution time under different network conditions. this paper measures the resource requi...
详细信息
the goal of this research is to develop performance profiles of parallel and distributed applications in order to predict their execution time under different network conditions. this paper measures the resource requirements of the NAS benchmark programs and characterizes their performance in a shared heterogeneous environment. the programs in the benchmark suite were executed on a controlled testbed and their usage of CPU, bandwidth, and memory were measured. the performance of the benchmark programs was also measured under controlled sharing of CPU and bandwidth. the results are used to characterize the behavior of the NAS benchmark programs with resource sharing. the paper demonstrates that the core system activity of a program can be accurately measured by passive probing, and that this measured system activity is the key to the prediction of program performance when resources must be shared. Our methods rely on system level measurements alone, and therefore, application knowledge or access to the source code, is not required. Hence, the techniques apply across programming languages and models. this paper is an important step towards building an automated framework to infer execution characteristics and estimate performance on shared networks. Such a framework has an important role in resource selection in shared clusters and grid computing environments.
暂无评论