Response time is a key factor of any e-Commerce application, and a set of solutions have been proposed to provide low response time despite network congestions or failures. Being them mostly based on caching of Web ob...
详细信息
The simulation of many-body, many-particle system has a wide range of applications in areas such as biophysics, chemistry, astrophysics, etc. It is known that the force calculation contributes ninety percent of the si...
详细信息
The simulation of many-body, many-particle system has a wide range of applications in areas such as biophysics, chemistry, astrophysics, etc. It is known that the force calculation contributes ninety percent of the simulation time. This is mainly due to the fact that the total number of interactions in the force is O(N2) where N is the number of particles in the system. The fast multipole algorithm, proposed by Greengard and Rokhlin, reduces the time complexity to O(N). In this paper, we design an efficient parallel fast multipole algorithm in three dimensions. For portability, our parallel program is implemented using Message Passing Interface. Is it possible to obtain high performance for a computing intensive application out of a LAN of workstations? In this paper, we also attempted to answer this commonly asked question by those researchers who have no access to parallel computers or supercomputers.
This paper presents a framework for generic modeling of distributed embedded applications. An application is decomposed into services and mapped on a set of distributed nodes, whereas each node hosts one or more servi...
详细信息
ISBN:
(纸本)076952124X
This paper presents a framework for generic modeling of distributed embedded applications. An application is decomposed into services and mapped on a set of distributed nodes, whereas each node hosts one or more services. Each service is described by four interfaces: a real-time input/output, a configuration and planning (CP), and a diagnostic and management (DM) interface. The overall application is described by a cluster configuration description that specifies the interaction of services within and across nodes. The application requirements, the service properties of a node, and the interaction of the services as well as the application mapping are described formally with XML descriptions. The XML format allows a language-neutral and extensible semantic description of interfaces supporting the implementation of context-aware tools for modeling, scheduling, monitoring, simulation, and validation. A central concept of the model is the interface file system (IFS) that acts as a distributed shared memory and transparently implements the interfaces to services from other nodes. In principle, the communication system that updates the data in the IFS data is not bound to a specific implementation as long as it fulfills the given timing requirements. The presented concepts are applied in a case study that uses the time-triggered fieldbus protocol TTP/A for the implementation of a small sensor fusion application.
Increased platform heterogeneity and varying resource availability, in distributed systems motivates the design of resource-aware applications, which ensure a desired performance level by continuously adapting their b...
详细信息
ISBN:
(纸本)0769507840
Increased platform heterogeneity and varying resource availability, in distributed systems motivates the design of resource-aware applications, which ensure a desired performance level by continuously adapting their behavior to changing resource characteristics. In this paper we describe an application-independent adaptation framework that simplifies the design of resource-aware applications. This framework eliminates the need for adaptation decisions to be explicitly programmed into the application by relying on two novel components: (1) a tunability interface, which exposes adaptation choices in the form of alternate application configurations while encapsulating core application functionality;and (2) a virtual execution environment, which emulates application execution under diverse resource availability, enabling off-line collection of information about resulting behavior: Together these components permit automatic run-time decisions on when to adapt by continuously monitoring resource conditions and application progress, min how to adapt by dynamically choosing an application configuration most appropriate for the prescribed user-preference. We evaluate the framework using an interactive distributed image visualization application. The framework permits automatic adaptation to changes in CPU load and network bandwidth by choosing a different compression algorithm or controlling the image transmission sequence so as to satisfy user preferences of visualization quality and timeliness.
We emphasize that future secure communicating systems, secured mass storages and access policies will require efficient and scalable security algorithms and protocols. More-over, parallelism will be used at quiet low ...
详细信息
Energy efficiency is a major concern in today's data centers that house large scale distributedprocessing systems such as data parallel MapReduce clusters. Modern power aware systems utilize the dynamic voltage a...
详细信息
ISBN:
(纸本)9781509044573
Energy efficiency is a major concern in today's data centers that house large scale distributedprocessing systems such as data parallel MapReduce clusters. Modern power aware systems utilize the dynamic voltage and frequency scaling mechanism available in processors to manage the energy consumption. In this paper, we initially characterize the energy efficiency of MapReduce jobs with respect to built-in power governors. Our analysis indicates that while a built-in power governor provides the best energy efficiency for a job that is CPU as well as IO intensive, a common CPU-frequency across the cluster provides best the energy efficiency for other types of jobs. In order to identify this optimal frequency setting, we derive energy and performance models for MapReduce jobs on a HPC cluster and validate these models experimentally on different platforms. We demonstrate how these models can be used to improve energy efficiency of the machine learning MapReduce applications running on the Yarn platform. The execution of jobs at their optimal frequencies improves the energy efficiency by average 25% over the default governor setting. In case of mixed workloads, the energy efficiency improves by up to 10% when we use an optimal CPU-frequency across the cluster.
In high level synthesis for real-time digital signal processing (DSP) architectures using heterogeneous functional units (FUs), an important problem is how to assign a proper FU type to each operation of a DSP applica...
详细信息
ISBN:
(纸本)0769521320
In high level synthesis for real-time digital signal processing (DSP) architectures using heterogeneous functional units (FUs), an important problem is how to assign a proper FU type to each operation of a DSP application and generate a schedule in such a way that all requirements can be met and the total cost can be minimized. In this paper, we propose a two-phase approach to solve this problem. In the first phase, we solve heterogeneous assignment problem, i.e., how to assign a proper FU type to a DSP application such that the total cost can be minimized while the timing constraint is satisfied. In the second phase, based on the assignments obtained from the first phase, we propose a minimum resource scheduling algorithm to generate a schedule and a feasible configuration that uses as little resource as possible. We prove heterogeneous assignment problem is NP-complete and propose several algorithms to solve it. The experiments show that Algorithm DFG_Assign_Repeat is the best that gives a reduction of 25.7% on total cost compared with the previous work.
This paper examines the possibility of implementing the Hough transform for line and circle detection on arrays with reconfigurable optical buses (AROBs). It is shown that the Hough transform for line and circle detec...
详细信息
This paper examines the possibility of implementing the Hough transform for line and circle detection on arrays with reconfigurable optical buses (AROBs). It is shown that the Hough transform for line and circle detection in an N × N image can be implemented in a constant number of steps. The costs of the two algorithms are O(N2p) and O(N2p2), respectively, where p is the magnitude of one dimension in the parameter space. These values are optimal with respect to the time complexity of the best known sequential algorithms.
Remote method invocation in Java RMI allows the flow of control to pass across local Java threads and thereby span multiple virtual machines. However, the resulting distributed threads do not strictly follow the parad...
详细信息
A distributed asynchronous algorithm that minimizes a functional whose minimum drifts with time is discussed. The communication delays among the processors are assumed to be stochastic with Markovian character. The au...
详细信息
暂无评论