In this work we describe a parallel implementation of the Poisson Surface Reconstruction algorithm based on multigrid domain decomposition. We compare implementations using different;models of data-sharing between pro...
详细信息
ISBN:
(数字)9783642103315
ISBN:
(纸本)9783642103308
In this work we describe a parallel implementation of the Poisson Surface Reconstruction algorithm based on multigrid domain decomposition. We compare implementations using different;models of data-sharing between processors and show that a parallel implementation withdistributed memory provides the best scalability. Using our method. we are able to parallelize the reconstruction of models from one billion data points on twelve processors across three machines. providing a ninefold speedup in run nit time without sacrificing reconstruction accuracy.
the paper is focused on the analysis of the contribution of distributed energy sources to the increase in the quality of electricity supply in a given part of the distribution network. For an efficient assessment of t...
详细信息
ISBN:
(纸本)9788055304014
the paper is focused on the analysis of the contribution of distributed energy sources to the increase in the quality of electricity supply in a given part of the distribution network. For an efficient assessment of the continuity of electricity supply, there is a need for common indicators to be defined. the most common ones are the indices SAIFI, SAIDI, CAIDI, MAIFI according to the ieee standard. the paper presents some significant results of the analysis of the impact of new distributed energy sources connected into the given distribution network. the analysis is based on the steady-state calculation. the impact on the power system caused by this kind of energy sources is - in accordance the national energy legislation - analyzed even before their connection into the grid. the paper, therefore, deals also withthe description of the calculation methods used for this kind of evaluation. With regard to its real importance for the power industry, the relation supplier-consumer will also be given emphasis in the paper.
Early P2P-TV systems have already attracted millions of users, and many new commercial solutions are entering this market. Little information is however available about how these systems work. In this paper we present...
详细信息
To meet the demands for high simulation fidelity and speed, paxallel and distributed simulation techniques are widely used in building wireless sensor network simulators. However, accurate simulations of dynamic inter...
详细信息
ISBN:
(纸本)9783642020841
To meet the demands for high simulation fidelity and speed, paxallel and distributed simulation techniques are widely used in building wireless sensor network simulators. However, accurate simulations of dynamic interactions of sensor network applications incur large synchronization overheads and severely limit the performance of existing distributed simulators. In this paper, we present LazySync, a novel conservative synchronization scheme that can significantly reduce such overheads by minimizing the number of clock synchronizations during simulations. We implement and evaluate this scheme in a cycle accurate distributed simulation framework that we developed based on Avrora, a popular parallel sensor network simulator. In our experiments, the scheme achieves a speedup of 4% to 53% in simulating single-hop sensor networks with 8 to 256 nodes and 4% to 118% in simulating multi-hop sensor networks with 16 to 256 nodes. the experiments also demonstrate that the speedups can be significantly larger as the scheme scales with boththe number of packet transmissions and sensor network size.
there is increasing convergence between the fields of parallel and embedded computing. the demand for more functionality in embedded devices means that complex multicore architectures will be used. In order to promote...
详细信息
ISBN:
(纸本)9780769535739
there is increasing convergence between the fields of parallel and embedded computing. the demand for more functionality in embedded devices means that complex multicore architectures will be used. In order to promote scalability and obtain predictability;on-chip processor-local private memory subsystems will be used. Whilst at the hardware level this is technical feasible, the more pressing problem is how such memory is presented to the programmer and how its local access is policed. In this paper we illustrate how Java augmented by the Real-time Specification for Java can be used to present the abstraction of a thread-local scoped memory area. We show how to enforce access to the memory area to a single real-time thread. We implement the model on the JOP multiprocessor system and report on our experiences.
In this paper, closed-loop quasi-orthogonal space time block coding (QO-STBC) is exploited within a four relay node transmission scheme to achieve full-rate and increase the available diversity gain provided by earlie...
详细信息
ISBN:
(纸本)9781424453337
In this paper, closed-loop quasi-orthogonal space time block coding (QO-STBC) is exploited within a four relay node transmission scheme to achieve full-rate and increase the available diversity gain provided by earlier two relay approaches. the problem of imperfect synchronization between relay nodes is overcome by applying a parallel interference cancellation (PIC) detection scheme at the destination node. Bit error rate simulations confirm the advantages of the proposed methodology for a range of levels of imperfect synchronization and that only a small number of iterations is necessary within the PIC detection.
We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, an...
详细信息
ISBN:
(纸本)9780769537160
We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. the coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA's DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. the FPGA-based system is about 20x faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10W of power.
the available rendering performance on current computers increases constantly, primarily by employing parallel algorithms using the newest many-core hardware, as for example multi-core CPUs or GPUs. this development e...
详细信息
ISBN:
(数字)9783642103315
ISBN:
(纸本)9783642103308
the available rendering performance on current computers increases constantly, primarily by employing parallel algorithms using the newest many-core hardware, as for example multi-core CPUs or GPUs. this development enables faster rasterization, as well as conspicuously faster software-based real-time ray tracing. Despite the tremendous progress in rendering power, there are and always will be applications in classical computer graphics and Virtual Reality, which require distributed configurations employing multiple machines for both rendering and display. In this paper we address this problem and use NMM, a distributed multimedia middleware, to build a powerful and flexible rendering framework. Our framework is highly modular, and can be easily reconfigured - even at runtime - to meet the changing demands of applications built on top of it. We show that the flexibility of our approach comes at a negligible cost in comparison to a specialized and highly-optimized implementation of distributed rendering.
the proceedings contain 15 papers. the topics discussed include: dynamic resource-critical workflow scheduling in heterogeneous environments;decentralized grid scheduling with evolutionary fuzzy systems;analyzing the ...
ISBN:
(纸本)3642046320
the proceedings contain 15 papers. the topics discussed include: dynamic resource-critical workflow scheduling in heterogeneous environments;decentralized grid scheduling with evolutionary fuzzy systems;analyzing the EGEE production grid workload: application to jobs submission optimization;the resource usage aware backfilling;the gain of overbooking;modeling parallel system workloads with temporal locality;scheduling restartable jobs with short test runs;effects of topology-aware allocation policies on scheduling performance;contention-aware scheduling with task duplication;job admission and resource allocation in distributed streaming systems;scalability analysis of job scheduling using virtual nodes;competitive two-level adaptive scheduling using resource augmentation;and job scheduling with lookahead group matchmaking for time/space sharing on multi-core parallel machines.
this paper proposes a new current-mode incremental signaling parallel link interface with per-pin skew compensation. Per-pin skew compensation is carried out in a calibration phase where clock-like training data are s...
详细信息
ISBN:
(纸本)1424411769
this paper proposes a new current-mode incremental signaling parallel link interface with per-pin skew compensation. Per-pin skew compensation is carried out in a calibration phase where clock-like training data are sent to all channels along with a reference clock of the same frequency. Training data are deskewed with respect to the common reference clock using DLLs such that all channels are skew-compensated simultaneously. New encoding and decoding scheme have been proposed to reduce the signal critical path at the transmitter. Transimpedance amplifiers with replica biasing are used to perform current-to-voltage conversion at the receiving end with a minimum sensitivity to supply voltage fluctuation. To evaluate the performance of the proposed skew compensating technique, a 1 Gbytes/s parallel link interface consisting of two data channels and one reference clock channel has been implemented with UMC 0.13 mu m 1.2 V CMOS technology and analyzed using SpectreRF from Cadence Design Systems with BIM3V3 device models. the channels are modeled as 50 Omega microstrip lines on a FR-4 substrate. Simulation results of the parallel link at all process corners have demonstrated that the proposed parallel link interface provides a minimum deskew range of 1.2 ns (+/- 0.6 ns in each direction).
暂无评论