Before implementation into hardware signal processing algorithms are tested in simulation mode. Lab VIEW provides highly convenient environment for simulation development and also tools for generation of simulation en...
详细信息
ISBN:
(纸本)9781509012022
Before implementation into hardware signal processing algorithms are tested in simulation mode. Lab VIEW provides highly convenient environment for simulation development and also tools for generation of simulation environment that can include simulation itself and collection of simulation data. Despite the fact these tools use Lab VIEW for code generation, it is not easy to understand the principles of code generation and effectively develop simulation generators. This paper presents toolbox for improved LabVIEW code generation. The developed toolbox is based on standard LabVIEW code generation functions maximally simplifying the application and minimizing the necessary amount of tools for code generation. This paper consists of theoretical part about LabVIEW code generation methods, practical part about principles of LabVIEW code generation using scripting and a graphical presentation of improved LabVIEW code generation advantages. The presented graphical results show that the improved LabVIEW code generation is simpler (thus better) and more understandable for practical realization and the code generator is clearer and more comprehensible than the original one.
Summary form only given. parallel programming with low-level interfaces has been the most viable choice in scientific computing for a long time. In such models, different parallelisms require different parallel progra...
详细信息
ISBN:
(纸本)9781467376853
Summary form only given. parallel programming with low-level interfaces has been the most viable choice in scientific computing for a long time. In such models, different parallelisms require different parallel programming interfaces, e.g., message passing for parallelism across nodes, threading for intranode parallelism, and vector processing for SIMD and GPUs. Often applications are confronted with these multiple interfaces to fully exploit the current and future large-scale machines. We present our work toward higher-level programming models, allowing for a single program to run on different parallel platforms without much human intervention, and at the same time to achieve close to hand-tuned performance.
W ith continuous increasing of the data scale of GNSS observations network,the computing pressure of data processing is growingThe undifferenced precise point positioning(PPP) model is one of the main strategies of GN...
详细信息
W ith continuous increasing of the data scale of GNSS observations network,the computing pressure of data processing is growingThe undifferenced precise point positioning(PPP) model is one of the main strategies of GNSS network data processing. With the increasing of stations' scale,the processing time of PPP pattern also increases linearly,the traditional serial processing pattern need to consume a large amount of computing time. As the PPP model is not related,this model has good characteristics of parallel processing between stations. This paper established a distributed parallel processing strategy based on the PPP model,whichcan not only improve the efficiency of data processing,but also enhance the efficiency of hardware performance. However,due to the high concurrency of data access and processing,the parallel programming is faced with greatchallenges which can cause immeasurable results. In this paper,by analyzing the flow characteristics of the PPP method,a parallel GNSS data process model at multi-core and multi node level was set up,and a lightweight parallel programming model was adopted to realize the parallel model. Through a large number of data tests and experiments,high efficiency of parallel processing of GNSS data based on the PPP model was achievedThe experiment shows that,under the environment of four multi-core nodes,the parallel processing is at least six times faster than the traditional serial processing.
The Indonesia Colorectal Cancer Consortium(IC3), the first cancer biobank repository in Indonesia, is faced with computational challenges in analyzing large quantities of genetic and phenotypic data. To overcome this ...
详细信息
The Indonesia Colorectal Cancer Consortium(IC3), the first cancer biobank repository in Indonesia, is faced with computational challenges in analyzing large quantities of genetic and phenotypic data. To overcome this challenge, we explore and compare performance of two parallel computing platforms that use central and graphic processing units. We present the design and implementation of a genome-wide association analysis using the Map Reduce and Compute Unified Device Architecture(CUDA) frameworks and evaluate performance(speedup) using simulated case/control status on 1000 Genomes, Phase 3, chromosome 22 data(1,103,547 Single Nucleotide Polymorphisms). We demonstrated speedup on a server with Intel Xeon E5-2620(6 cores) and NVIDIA Tesla K20 over sequential processing.
Current workstations can offer really amazing raw computational power, in the order of TFlops on a single machine equipped with multiple CPUs and accelerators, which means less than half a dollar for a GFlop. Such res...
详细信息
Current workstations can offer really amazing raw computational power, in the order of TFlops on a single machine equipped with multiple CPUs and accelerators, which means less than half a dollar for a GFlop. Such result can only be achieved with a massive parallelism of the computational devices, but unfortunately not every application is able to fully exploit them. In this paper we analyze the performances of some widely used, computational intensive, applications, like FFT, convolution and n-body simulation, comparing the performance of a multi-core cluster node, with or without the contribution of GPUs. We aim to provide clear measure of the benefit of a heterogeneous architecture, in terms of time and cost, with a stress on the technology adopted at different levels of the software stack for the application parallelization. (C) 2014 Elsevier B.V. All rights reserved.
Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distrib...
详细信息
Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at resolution, 11.75 fps at resolution, and 62.52 fps at resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.
GooFit is a thread-parallel, GPU-friendly function evaluation library, nominally designed for use with the maximum likelihood fitting program MINUIT. In this use case, it provides highly parallel calculations of norma...
详细信息
GooFit is a thread-parallel, GPU-friendly function evaluation library, nominally designed for use with the maximum likelihood fitting program MINUIT. In this use case, it provides highly parallel calculations of normalization intergrals and log (likelihood) sums. A key feature of the design is its use of the Thrust library to manage all parallel kernel launches. This allows GooFit to execute on any architecture for which Thrust has a backend, currently, including CUDA for nVidia GPUs and OpenMP for single- and multicore CPUs. Running on an nVidia C2050, GooFit executes 300 times more quickly for a complex high energy physics problem than does the prior (algorithmically equivalent) code running on a single CPU core. The design and implementation choices, discussed in detail, can help to guide developers of other highly parallel, compute-intensive libraries.
General-purpose computing on graphics processing units (GPGPU), with programming models such as the Compute Unified Device Architecture (CUDA) by NVIDIA, offers the capability for accelerating the solution process of ...
详细信息
General-purpose computing on graphics processing units (GPGPU), with programming models such as the Compute Unified Device Architecture (CUDA) by NVIDIA, offers the capability for accelerating the solution process of computational electromagnetics analysis. However, due to the communication-intensive nature of the finite-element algorithm, both the assembly and the solution phases cannot be implemented via fine-grained many-core GPU processors in a straightforward manner. In this paper, we identify the bottlenecks in the GPU parallelization of the Finite-Element Method for electromagnetic analysis, and propose potential solutions to alleviate the bottlenecks. We first discuss efficient parallelization strategies for the finite-element matrix assembly on a single GPU and on multiple GPUs. We then explore parallelization strategies for the finite-element matrix solution, in conjunction with parallelizable preconditioners to reduce the total solution time. We show that with a proper parallelization and implementation, GPUs are able to achieve significant speedups over OpenMP-enabled multi-core CPUs.
Powerful algebraic techniques have been developed for classical sequential computation. Many of them are based on regular expressions and the associated regular algebra. For parallel and interactive computation, exten...
详细信息
Powerful algebraic techniques have been developed for classical sequential computation. Many of them are based on regular expressions and the associated regular algebra. For parallel and interactive computation, extensions to handle 2-dimensional patterns are often required. Finite interactive systems, a 2-dimensional version of finite automata, may be used to recognize 2-dimensional languages. In this paper we present a blueprint for getting a formal representation of parallel, interactive programs and of their semantics. It is based on a recently introduced approach for getting regular expressions for 2-dimensional patterns, particularly using words of arbitrary shapes and powerful control mechanisms on composition. We extend the previously defined class of expressions n2RE with new control features, progressively increasing the expressive power of the formalism up to a level where a procedure for generating the words accepted by finite interactive systems may be obtained. Targeted applications come from the area of modelling, specification, analysis and verification of structured interactive programs via the associated scenario semantics.
The depth map extraction from stereo video is a key technology of stereoscopic 3D video as well as view synthesis and 2D-3D video conversions. Sum of Absolute Differences (SAD) is a representative method to reconstruc...
详细信息
暂无评论