In this paper we present the design and implementation of a distributed sensor network application for embedded, isolated-word, real-time speech recognition. In our system design, we adopt a parameterized-data-flow-ba...
详细信息
ISBN:
(纸本)9781424416936
In this paper we present the design and implementation of a distributed sensor network application for embedded, isolated-word, real-time speech recognition. In our system design, we adopt a parameterized-data-flow-based modeling approach to model the functionalities associated with sensing and processing of acoustic data, and we implement the associated embedded software on an off-the-shelf sensor node platform that is equipped with an acoustic sensor the topology of the sensor network deployed in this work involves a clustered network hierarchy. A customized time division multiple access protocol is developed to manage the wireless channel. We analyze the distribution of the overall computation workload across the network to improve energy efficiency. In our experiments, we demonstrate the recognition accuracy for our speech recognition system to verify its functionality and utility. Me also evaluate improvements in network lifetime to demonstrate the effectiveness of our energy-aware optimization techniques.
We present a junction tree decomposition based algorithm for parallel exact inference. this is a novel parallel exact inference method for evidence propagation in an arbitrary junction tree. If multiple cliques contai...
详细信息
ISBN:
(纸本)9781424416936
We present a junction tree decomposition based algorithm for parallel exact inference. this is a novel parallel exact inference method for evidence propagation in an arbitrary junction tree. If multiple cliques contain evidence, the performance of any state-of-the-art parallel inference algorithm achieving logarithmic time performance is adversely affected. In this paper, we propose a new approach to overcome this problem. We decompose a junction tree into a set of chains. Cliques in each chain are partially updated after the evidence propagation. these partially updated cliques are then merged in parallel to obtain fully updated cliques. We derive the formula for merging partially updated cliques and estimate the computation workload of each step. Experiments conducted using MPI on state-of-the-art clusters showed that the proposed algorithm exhibits linear scalability and superior performance compared with other parallel inference methods.
Withthe continued progress in VLSI technologies, we can integrate numerous cores in a single billion-transistor chip to build a multi-core system-on-a-chip (SoC). this also brings great challenges to traditional para...
详细信息
ISBN:
(纸本)9781424416936
Withthe continued progress in VLSI technologies, we can integrate numerous cores in a single billion-transistor chip to build a multi-core system-on-a-chip (SoC). this also brings great challenges to traditional parallel programming as to how we can increase the performance of applications with increased number of cores. In this paper;we meet the challenges using a novel approach. Specifically, we propose a reconfigurable heterogeneous multi-core system. Under our proposed system, in addition to conventional processor cores, we introduce dynamically reconfigurable accelerator cores to boost the performance of applications. We have built a prototype of the system using FPGAs. Experimental evaluation demonstrates significant system efficiency of the proposed heterogeneous multi-core system in terms of computation and power consumption.
It has been proved in previous algorithm-based fault tolerance that, for matrix matrix multiplication, the checksum relationship in the input checksum matrices is preserved at the end of the computation no mater which...
详细信息
ISBN:
(纸本)9781424416936
It has been proved in previous algorithm-based fault tolerance that, for matrix matrix multiplication, the checksum relationship in the input checksum matrices is preserved at the end of the computation no mater which algorithm is used. However, whether this checksum relationship can be maintained in the middle of the computation or not remains open. In this paper, we first demonstrate that this checksum relationship is not maintained in the middle of the computation for most algorithms for matrix matrix multiplication. We then prove that, however, for the outer product version matrix matrix multiplication algorithm, this checksum relationship can be maintained in the middle of the computation. Based on this checksum relationship maintained in the middle of the computation, we demonstrate that fail-stop process failures (which are often tolerated by checkpointing Or Message logging) in the outer product version matrix-matrix multiplication. can be tolerated without checkpointing or message logging.
In this paper, we evaluate the effects of a partitioned global address space (PGAS) versus a flat, randomized distributed global address space (DGAS) in the context of a lightweight multithreaded parallel architecture...
详细信息
ISBN:
(纸本)9781424416936
In this paper, we evaluate the effects of a partitioned global address space (PGAS) versus a flat, randomized distributed global address space (DGAS) in the context of a lightweight multithreaded parallel architecture. We also execute the benchmarks on the Cray MTA-2, a multithreaded architecture with a DGAS mapping. Key results demonstrate that distributing data under the PGAS mapping increases locality, effectively reducing the memory latency and the number of threads needed to achieve a given level of performance. In contrast, the DGAS mapping provides a simpler programming model by eliminating the need to distribute data and, assuming sufficient application parallelism, can achieve similar performance by leveraging large numbers of threads to hide the longer latencies.
the Global Cellular Automata model (GCA) is a massively parallel computation model which extends the classical Cellular Automata model (CA) with dynamic global neighbors. We present for that model a data parallel arch...
详细信息
ISBN:
(纸本)9781424416936
the Global Cellular Automata model (GCA) is a massively parallel computation model which extends the classical Cellular Automata model (CA) with dynamic global neighbors. We present for that model a data parallel architecture which is scalable in the number of parallel pipelines and which uses application specific operators (adapted operators). the instruction set consists of control and RULE instructions. A RULE computes the next cell contents for each cell in the destination object. the machine consists of P pipelines. Each pipeline has an associated primary memory bank and has access to the global memory (real or emulated multi port memory). the diffusion of particles was used as an example in order to demonstrate the adaptive operators, the machine programming and its performance. Particles which point to each other within a defined neighborhood search space are interchanged. the pointers are modified in each generation by a pseudo random function. the machine with up to 32 pipelines was synthesized for an Altera FPGA for that application.
this paper shows how lightpath-based networks can allow challenging, fine-grained parallel supercomputing applications to be run on a grid, using parallel retrograde analysis on DAS-3 as a case study. Detailed perform...
详细信息
ISBN:
(纸本)9781424442379
this paper shows how lightpath-based networks can allow challenging, fine-grained parallel supercomputing applications to be run on a grid, using parallel retrograde analysis on DAS-3 as a case study. Detailed performance analysis shows that several problems arise that are not present on tightly-coupled systems like clusters. In particular flow control, asynchronous communication, and host-level communication overheads become new obstacles. By optimizing these aspects, however a JOG grid can obtain high performance for this type of communication-intensive application. the class of large-scale distributed applications suitable for running on a grid is therefore larger than previously thought realistic.
this paper describes the RTGrid distributed simulation framework for conformal radiotherapy. We introduce novel approaches through which several distributed computing technologies are made accessible to Monte Carlo si...
详细信息
ISBN:
(纸本)9781424416936
this paper describes the RTGrid distributed simulation framework for conformal radiotherapy. We introduce novel approaches through which several distributed computing technologies are made accessible to Monte Carlo simulations for radiotherapy dose calculations. Currently, radiotherapy treatment planning is typically performed on PCs and workstations which lack the computational power to run Monte Carlo simulations quickly enough to be useful to clinicians. therefore, although Monte Carlo simulation techniques offer highly accurate doses, they are seldom chosen for clinical deployment. the RTGrid project is investigating boththe capability and capacity modes of exploiting grid computing for radiotherapy treatment planning using Monte Carlo simulations.
We overview a library generation framework called Spiral. For the domain of linear transforms, Spiral automatically generates implementations for parallel platforms including SIMD vector extensions, multicore processo...
详细信息
ISBN:
(纸本)9781424416936
We overview a library generation framework called Spiral. For the domain of linear transforms, Spiral automatically generates implementations for parallel platforms including SIMD vector extensions, multicore processors, field-programmable gate arrays (FPGAs) and FPGA accelerated processors. the performance of the generated code is competitive withthe best available hand-written libraries.
As high performance and distributed computing become more important tools for enabling scientists and engineers to solve large computational problems, the need for methods to fairly and efficiently schedule tasks acro...
详细信息
ISBN:
(纸本)9781424416936
As high performance and distributed computing become more important tools for enabling scientists and engineers to solve large computational problems, the need for methods to fairly and efficiently schedule tasks across multiple, possibly geographically distributed, computing resources becomes more crucial. Given the nature of distributed systems and the immense numbers of resources to be managed in distributed and large-scale cluster environments, traditional centralized schedulers will not be extremely effective at providing timely scheduling information. In order to manage large numbers of resources quickly, less computationally intensive methods for scheduling tasks must be explored. this paper proposes a novel resource management system based on the immune system metaphor making use of the concepts in Immune Network theory and Danger theory. By emulating various elements in the immune system, the proposed manager could efficiently execute tasks on very large systems of heterogeneous resources across geographic and/or administrative domains. the distributed nature of the immune system is also exploited in order to allow efficient scheduling of tasks, even in extremely large environments, without the use of a centralized or hierarchical scheduler.
暂无评论