This paper presents design, implementation, and performance evaluation results of a parallel particle filter (PF) and a particle flow filter (PFF) using a Graphics Processing Unit (GPU) as a parallelcomputing environ...
详细信息
ISBN:
(纸本)9781479902842
This paper presents design, implementation, and performance evaluation results of a parallel particle filter (PF) and a particle flow filter (PFF) using a Graphics Processing Unit (GPU) as a parallelcomputing environment to speedup the computation. Simulation results from a high dimensional nonlinear filtering problem show that, for the considered example, the parallel PFF implementation is significantly superior to the parallel PF implementation in both estimation accuracy and computational performance. It is demonstrated that using GPU can markedly accelerate both particle filters and particle flow filters through parallelization.
Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific wo...
详细信息
Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such "many-task" computations (MTC). Provenance information can record the behavior of such computational experiments via the lineage of process and data artifacts. However, work to date has focused on lineage data models, leaving unsolved issues of recording and querying other aspects, such as domain-specific information about the experiments, MTC behavior given by resource consumption and failure information, or the impact of environment on performance and accuracy. In this work we contribute with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance. To help users query provenance data we provide a high level interface that hides relational query complexities. We evaluate MTCProv using an application in protein science, and describe how important query patterns such as correlations between provenance, runtime data, and scientific parameters are simplified and expressed.
In this paper we introduce a framework for parallel and distributed execution of simulations (Sim-PETEK), a middleware for minimizing the total run time of batch runs and Monte Carlo trials. Sim-PETEK proposes a gener...
详细信息
In this paper we introduce a framework for parallel and distributed execution of simulations (Sim-PETEK), a middleware for minimizing the total run time of batch runs and Monte Carlo trials. Sim-PETEK proposes a generic solution for applications in the simulation domain, which improves on our previous work done to parallelize simulation runs in a single node, multiple central processing unit (CPU) setting. Our new framework aims at managing a heterogeneous computational resource pool consisting of multiple CPU nodes distributed on a potentially geographically dispersed network, through a service-oriented middleware layer that is compliant to Web Services Resource Framework standard, thereby providing a scalable and flexible architecture for simulation software developers. What differentiates Sim-PETEK from a general-purpose, Grid-based job-distribution middleware is a number of simulation-specific aspects regarding the specification, distribution, monitoring, result collection and aggregation of simulation runs. These aspects are prevalent in the structure of the messages and in the protocol of interaction both among the constituent services of the framework and within the interfaces exposed to the external clients.
The Deep Stacking Network (DSN) is a special type of deep architecture developed to enable and benefit from parallel learning of its model parameters on large CPU clusters. As a prospective key component of future spe...
详细信息
ISBN:
(纸本)9781622767595
The Deep Stacking Network (DSN) is a special type of deep architecture developed to enable and benefit from parallel learning of its model parameters on large CPU clusters. As a prospective key component of future speech recognizers, the architectural design of the DSN and its parallel training endow the DSN with scalability over a vast amount of training data. In this paper, we present our first parallel implementation of the DSN training algorithm. Particularly, we show the tradeoff between the time/memory saving via training parallelism and the associated cost arising from inter-CPU communication. Further, in phone classification experiments, we demonstrate a significantly lowered error rate using parallel full-batch training distributed over a CPU cluster, compared with sequential mini-batch training implemented in a single CPU machine under otherwise identical experimental conditions and as exploited prior to the work reported in this paper.
We consider the classical rumor spreading problem, where a piece of information must be disseminated from a single node to all n nodes of a given network. We devise two simple push-based protocols, in which nodes choo...
详细信息
ISBN:
(纸本)9783939897354
We consider the classical rumor spreading problem, where a piece of information must be disseminated from a single node to all n nodes of a given network. We devise two simple push-based protocols, in which nodes choose the neighbor they send the information to in each round using pairwise independent hash functions, or a pseudo-random generator, respectively. For several wellstudied topologies our algorithms use exponentially fewer random bits than previous protocols. For example, in complete graphs, expanders, and random graphs only a polylogarithmic number of random bits are needed in total to spread the rumor in O(log n) rounds with high probability. Previous explicit algorithms, e.g., [10, 17, 6, 15], require Omega(n) random bits to achieve the same round complexity. For complete graphs, the amount of randomness used by our hashing-based algorithm is within an O(log n)-factor of the theoretical minimum determined by Giakkoupis and Woelfel [15].
The last decade has witnessed the rapid growth of both the complexity and the scale of problem domains. A variety of techniques have been developed for fostering large and even hybrid simulations over the Internet. Al...
详细信息
The last decade has witnessed the rapid growth of both the complexity and the scale of problem domains. A variety of techniques have been developed for fostering large and even hybrid simulations over the Internet. Along this direction, existing work normally only suit for coarse-grained models. In this paper, we present a Grid infrastructure which applies to a hybrid simulation comprising models of different grains and/or models of various types in nature. A gateway approach has been proposed to present simulation models of an individual administrative domain for fine-grained problems, and it also bridges simulation models operating in multiple administrative domains to form hybrid simulations for studying large and complicated problems. A prototype infrastructure has realized with the support of federated simulation technology. Potential applications, such as simulation of huge crowd, have also been discussed.
The density-matrix renormalization group (DMRG) method is widely used by computational physicists as a high accuracy tool to explore the ground state in large quantum lattice models, e.g., Heisenberg and Hubbard model...
详细信息
The density-matrix renormalization group (DMRG) method is widely used by computational physicists as a high accuracy tool to explore the ground state in large quantum lattice models, e.g., Heisenberg and Hubbard models, which are well-known standard models describing interacting spins and electrons, respectively, in solid states. After the DMRG method was originally developed for 1-D lattice/chain models, some specific extensions toward 2-D lattice (n-leg ladder) models have been proposed. However, high accuracy as obtained in 1-D models is not always guaranteed in their extended versions because the original exquisite advantage of the algorithm is partly lost. Thus, we choose an alternative way. It is a direct 2-D extension of DMRG method which instead demands an enormously large memory space, but the memory explosion is resolved by parallelizing the DMRG code with performance tuning. The parallelized direct extended DMRG shows a good accuracy like 1-D models and an excellent parallel efficiency as the number of states kept increases. This success promises accurate analysis on large 2-D (n-leg ladder) quantum lattice models in the near future when peta-flops parallel supercomputers are available.
We propose a scheduling strategy in this paper to address the load imbalance problem of the distributedparallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the t...
详细信息
We propose a scheduling strategy in this paper to address the load imbalance problem of the distributedparallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the tasks defined by DPA into smaller subtasks. The subtasks will be scheduled by a dynamic self-scheduling scheme for better load balance. Furthermore, we propose two different methods for data transmission from the master to workers. The first one broadcasts all the frequent k-itemsets to all work nodes while the second one transmits only the required data to each individual work node. Experimental results demonstrate the proposed two approaches both outperform DPA. The first one is more suitable for small datasets and the second one provides steadier performance improvement no matter which self-scheduling scheme is adopted.
The peculiarities of the LuNA run-time subsystem implementation are considered. LuNA is the language and system of fragmented programming. The peculiarities are conditioned by the properties of numerical algorithms, t...
详细信息
ISBN:
(纸本)9783642231773
The peculiarities of the LuNA run-time subsystem implementation are considered. LuNA is the language and system of fragmented programming. The peculiarities are conditioned by the properties of numerical algorithms, to implementation and execution of which the LuNA is mainly oriented.
The paper treats the Multi-Objective Programming problem with a large composite set of (linear and nonlinear) objective functions, the domain of feasible solutions being defined by a set of linear equalities/inequalit...
详细信息
ISBN:
(纸本)9783642184659
The paper treats the Multi-Objective Programming problem with a large composite set of (linear and nonlinear) objective functions, the domain of feasible solutions being defined by a set of linear equalities/inequalities representing a large scale problem. One constructs a preferred solution i.e. a non-dominated solution chosen via extending the decision-making framework. A feasible approach, for this class of problems, is to use a solver for the Linear Programming problems and a solver for Multiple Attribute Decision Making problems in combination with parallel and distributed computing techniques based on a GRID configuration.
暂无评论