this paper presents a reflective decision control mechanism for dealing with synchronization and scheduling issues ill distributedcomputing environment. We present a component-based reflective architecture for assist...
详细信息
ISBN:
(纸本)0769510655
this paper presents a reflective decision control mechanism for dealing with synchronization and scheduling issues ill distributedcomputing environment. We present a component-based reflective architecture for assisting distributed objects in decision making, it enables autonomous entities to support reflective computation that requires a flexible means for managing the course of computation, resource allocation and scheduling. Such capability is critical to the successful implementation of distributed software that supports real-rime and reactive/adaptive applications (e.g., robotics, manufacturing, and military systems). In many instances, these systems must be designed to monitor and manage complex systems in dynamic settings.
A communication model of distributed embedded system was presented though introducing a multiple buses example. Analyze the communication delay under interaction of computation and communication, allocating inter- pro...
详细信息
ISBN:
(纸本)9780769533483
A communication model of distributed embedded system was presented though introducing a multiple buses example. Analyze the communication delay under interaction of computation and communication, allocating inter- processor communication links, and schedule communication. the complex system can be divided into several parts and then the algorithm is applied to all the parts respectively. Some numerical results are also presented for illustration.
One of the challenges in high-performance computing is to provide users with reliable, remote data access in a distributed, heterogeneous environment. the increasing popularity of high-speed wide area networks and cen...
详细信息
ISBN:
(纸本)0780390741
One of the challenges in high-performance computing is to provide users with reliable, remote data access in a distributed, heterogeneous environment. the increasing popularity of high-speed wide area networks and centralized data repositories lead to the possibility of direct high-speed access to remote data sets from within a parallel application. In this paper we describe SEMPLAR, a library for remote, parallel I/O that combines the standard programming interface of MPI-IO withthe remote storage functionality of the SDSC Storage Resource Broker (SRB). SEMPLAR relies on parallel TCP streams to maximize the remote data throughput in a design that preserves the parallelism of the access all the way from the storage to the application. We have provided I/O performance resultsfor a high-performance computing work-load on three different clusters. On the NCSA TeraGrid cluster the ROMIO perf benchmark attained an aggregate read bandwidth of 291Mbps with 18 processors. the NAS btio benchmark achieved an aggregate write bandwidth of 74Mbps with 16 processors. the benchmark results are encouraging and show that SEMPLAR provides applications with scalable, high-bandwidth I/O across wide area networks.
Summary form only given. the issue of ease of using shared data in a data-intensive parallelcomputing environment is discussed. An approach is investigated for transparently supporting data sharing in a loosely coupl...
详细信息
ISBN:
(纸本)0818619155
Summary form only given. the issue of ease of using shared data in a data-intensive parallelcomputing environment is discussed. An approach is investigated for transparently supporting data sharing in a loosely coupled parallelcomputing environment, where a moderate to a large number of individual computing elements are connected via a high-bandwidth network without necessarily physically sharing memory. A system called VOYAGER is discussed which serves as the underlying system facility that supervises the distributed shared virtual memory. VOYAGER allows shared-data parallel applications to take advantage of parallel and distributed processing with relative ease. the application program merely maps the shared data onto its virtual address space and replicates itself on distributed machines and spawns appropriate execution threads;the threads would automatically be given coordinated access to the shared data distributed in the network. Multiple computation threads migrate and populate the processors of a number of computing elements, making use of the multiple processors to achieve a high degree of parallelism. the low-level resource management chores are made available once and for all in the underlying facility VOYAGER, usable by many different data-intensive applications.
Parameter studies, genetic algorithms and Monte Carlo type calculations are examples of pleasantly parallel computational tasks. Pleasantly parallel computational tasks can be effectively calculated in computer cluste...
详细信息
Parameter studies, genetic algorithms and Monte Carlo type calculations are examples of pleasantly parallel computational tasks. Pleasantly parallel computational tasks can be effectively calculated in computer clusters or grids. In this work, we consider a weight minimization problem of a laminated composite structure in the post-buckling region. the design variables are the number of layers and the layer orientations given in a discrete set of allowable angles for layer orientations. Optimization is carried out using a deterministic search process, where the lay-up configurations are generated iteratively in the design space from the selected design points of the population at the preceding cycle. Computation is performed using NorduGrid grid computing platform. In this work, we briefly go through some general grid concepts and the use of grid in optimization of laminated composite structures. (C) 2005 Elsevier Ltd. All rights reserved.
the aIOLi project aims at optimizing the I/O accesses within the cluster by providing a simple POSIX API, thus avoiding the constraints to use a dedicated parallel I/O library. this paper introduces an extension of aI...
详细信息
ISBN:
(纸本)1424403073
the aIOLi project aims at optimizing the I/O accesses within the cluster by providing a simple POSIX API, thus avoiding the constraints to use a dedicated parallel I/O library. this paper introduces an extension of aIOLi to address the issue of disjoint accesses generated by different concurrent applications in a cluster In such a context, performance,fairness and response time are the criteria for which good tradeoffs have to be assessed. A test composed of two concurrent IOR benchmarks showed improvements on read accesses by a factor ranging from 3.5 to 35 with POSIX calls and from 3.3 to 5 with ROMIO.
Stream data are often transmitted over a distributed network, but in many cases, are too voluminous to be collected in a central location. Instead, we must perform distributed computations, guaranteeing high quality r...
详细信息
ISBN:
(纸本)9783540747666
Stream data are often transmitted over a distributed network, but in many cases, are too voluminous to be collected in a central location. Instead, we must perform distributed computations, guaranteeing high quality results in real-time even as new data arrive. In this paper, firstly, we formalize the problem of continuous outlier detection over distributed evolving data streams. then, two novel outlier measures and algorithms are proposed which can identify outliers in a single pass. Furthermore, our experiments with synthetic and real data show that the proposed methods are both efficient and effective compared with existing outlier detection algorithms.
this short paper describes the cooperative caching architecture of pCFS [5], a shared disk cluster file system (CFS) which aims to achieve high performance in a broad spectrum of I/O intensive applications ranging fro...
详细信息
ISBN:
(纸本)1424403073
this short paper describes the cooperative caching architecture of pCFS [5], a shared disk cluster file system (CFS) which aims to achieve high performance in a broad spectrum of I/O intensive applications ranging from computational access to large data sets to video streaming and databases, and includes an extended API for parallel I/O access. pCFS is targeted at small to medium sized clusters where data is stored in Fibre Channel shared devices on a Storage Area Network (SAN) and exploits two interconnect fabrics: a SAN to access on-disk data, and a LAN, used both for the exchange of control information (related to locking and cache management) and for cooperative caching dataflow.
the proceedings contain 3 papers. the topics discussed include: expanding the scope of artifact evaluation at HPC conferences: experience of SC21;reproducible experiments for internet systems;and managing randomness t...
ISBN:
(纸本)9781450393133
the proceedings contain 3 papers. the topics discussed include: expanding the scope of artifact evaluation at HPC conferences: experience of SC21;reproducible experiments for internet systems;and managing randomness to enable reproducible machine learning.
there is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (A...
详细信息
ISBN:
(纸本)9781509036820
there is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (ADMM) arises as a method that can exploit distributed resources like the dual ascent method and has the robustness and improved convergence of the augmented Lagrangian method. Traditional approaches to accelerate the ADMM using multiple cores are problem-specific and often require multi-core programming. By contrast, we propose a problem-independent scheme of accelerating the ADMM that does not require the user to write any parallel code. We show that this scheme, an interpretation of the ADMM as a message-passing algorithm on a factor-graph, can automatically exploit fine-grained parallelism both in GPUs and shared-memory multi-core computers and achieves significant speedup in such diverse application domains as combinatorial optimization, machine learning, and optimal control. Specifically, we obtain 10-18x speedup using a GPU, and 5-9x using multiple CPU cores, over a serial, optimized C-version of the ADMM, which is similar to the typical speedup reported for existing GPU-accelerated libraries, including cuFFT (19x), cuBLAS (17x), and cuRAND (8x).
暂无评论