In this paper parallel approach to the algorithm of discrete exact state estimation is presented. the novel idea of double-layer parallel decomposition of such computational algorithm is explained and results of its f...
详细信息
ISBN:
(纸本)9780889867048
In this paper parallel approach to the algorithm of discrete exact state estimation is presented. the novel idea of double-layer parallel decomposition of such computational algorithm is explained and results of its full implementation are showed. Numerical tests of the computational algorithm with discrete exact state estimator in the context of its scalability are included. All of presented results were obtained by the use of MPI library in the Linux cluster.
this paper develops a parallelcomputing system based on open standards, such as Extensible Markup Language (XML), Simple Object Access Protocol (SOAP)1 and Common Language Runtime (CLR) 2. To date parallelsystems ba...
详细信息
this paper develops a parallelcomputing system based on open standards, such as Extensible Markup Language (XML), Simple Object Access Protocol (SOAP)1 and Common Language Runtime (CLR) 2. To date parallelsystems based on clustered computers have been primarily restricted to the Unix and Linux family of operating systems. However, the popularity of the Microsoft Windows platform and the richness of reusable modules provided by the *** Framework makes it an attractive platform. Motivated by this, a new parallelcomputing System based *** (***) is described. *** provides higher level abstractions for message passing than the widely used Message Passing Interface (MPI).
Modern cloud-based applications (e.g., Facebook, Dropbox) serve a wide range of edge clients (e.g., laptops, smartphones). the clients' characteristics vary significantly in terms of hardware, operating systems, n...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
Modern cloud-based applications (e.g., Facebook, Dropbox) serve a wide range of edge clients (e.g., laptops, smartphones). the clients' characteristics vary significantly in terms of hardware, operating systems, network connections, and software versions, just to name a few. Unfortunately, due to misconfiguration, outdated software, faulty hardware, or other reasons, many edge systems operate at suboptimal performance. Identifying poor performance and root causes is extremely challenging for the client of the cloud system. In this paper, we propose a novel troubleshooting service that leverages such heterogeneity to identify and debug performance problems on edge devices. First, by looking at many runs across many different clients, the service groups clients into different clusters based on performance. Next, the service enables logging on remote clients to collect run time traces, and subsequently identifies the root cause by analyzing logs automatically. We leverage high level features such as machine/OS type along with more low level kernel level statistics such as I/O rate and system calls. To demonstrate our system, we first introduce a configuration bug that was artificially injected in a recently built cluster by changing the TCP buffer size. Next, we present two real-life case studies, one related to I/O inefficiency on Android platform, and another misconfiguration bug in VirtualBox, that were identified using our tool.
gLite is one of the largest distributedcomputing infrastructures in operation. It provides access to hundreds of different clusters - all installed and maintained in different ways. this paper analyses the difficulti...
详细信息
ISBN:
(纸本)9780769539393
gLite is one of the largest distributedcomputing infrastructures in operation. It provides access to hundreds of different clusters - all installed and maintained in different ways. this paper analyses the difficulties which users typically experience when moving from their own workstation via clusters or supercomputers to the grid. Based on that analysis, this paper presents tools, which helps to overcome this gap and introduces an advanced commandline interface to the grid.
We study the problem of scheduling tasks in a distributed system where the data (and code) for a program may reside on a processor different from the one where it will be executed. the scheduling of the tasks is compl...
详细信息
ISBN:
(纸本)9780889867741
We study the problem of scheduling tasks in a distributed system where the data (and code) for a program may reside on a processor different from the one where it will be executed. the scheduling of the tasks is complex as one must balance execution and communications times. We present an off-line polynomial time approximation algorithm for the case when the processors can be split into storage (client) and processing (server) nodes. Our algorithm is the first constant ratio approximation algorithm for this problem. then we discuss generalization of our problem as well as the on-line version of our problem.
Although distributed networks and systems are widely deployed, an effective overall benchmark architecture is still lacking. Recently we defined and studied the fat-stack network architecture and found that it is both...
详细信息
Every mainstream processor vendor provides an optimized BLAS implementation for its CPU, as BLAS is a fundamental math library in scientific computing. the Loongson 3A CPU is a general-purpose 64-bit MIPS64 quad-core ...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
Every mainstream processor vendor provides an optimized BLAS implementation for its CPU, as BLAS is a fundamental math library in scientific computing. the Loongson 3A CPU is a general-purpose 64-bit MIPS64 quad-core processor, developed by the Institute of computing Technology, Chinese Academy of Sciences. To date, there has not been a sufficiently optimized BLAS on the Loongson 3A CPU. the purpose of this research is to optimize level 3 BLAS performance on the Loongson 3A CPU. We analyzed the Loongson 3A architecture and built a performance model to highlight the key point, L1 data cache misses, which is different from level 3 BLAS optimization on the mainstream x86 CPU. therefore, we employed a variety of methods to avoid L1 cache misses in single thread optimization, including cache and register blocking, the Loongson 3A 128-bit memory accessing extension instructions, software prefetching, and single precision floating-point SIMD instructions. Furthermore, we improved parallel performance by reducing bank conflicts among multiple threads in the shared L2 cache. We created an open source BLAS project, OpenBLAS, to demonstrate the performance improvement on the Loongson 3A quad-core processor.
High performance computingsystems and cluster computers are becoming so cost-effective that even small research groups can afford them. Hence, efforts to take advantage of these widely distributed resources are becom...
详细信息
High performance computingsystems and cluster computers are becoming so cost-effective that even small research groups can afford them. Hence, efforts to take advantage of these widely distributed resources are becoming popular. Although recent projects provide resource management and job scheduling to support groups of computational resources across the country working together on massive problems, they have not yet fully addressed how distributedparallel programs will communicate. therefore, we propose a new paradigm to support cluster-to-cluster (C2C) communications, which handles run-time communications between parallel programs running on distributed clusters.
暂无评论