检索结果-内蒙古大学图书馆

IEEE Workshop on Advances in Information, Electronic and Electrical Engineering

作者： E. Vavilina G. Gaigals Researcher Ventspils University College

ISBN: (纸本)9781509012022

Before implementation into hardware signal processing algorithms are tested in simulation mode. Lab VIEW provides highly convenient environment for simulation development and also tools for generation of simulation environment that can include simulation itself and collection of simulation data. Despite the fact these tools use Lab VIEW for code generation, it is not easy to understand the principles of code generation and effectively develop simulation generators. This paper presents toolbox for improved LabVIEW code generation. The developed toolbox is based on standard LabVIEW code generation functions maximally simplifying the application and minimizing the necessary amount of tools for code generation. This paper consists of theoretical part about LabVIEW code generation methods, practical part about principles of LabVIEW code generation using scripting and a graphical presentation of improved LabVIEW code generation advantages. The presented graphical results show that the improved LabVIEW code generation is simpler (thus better) and more understandable for practical realization and the code generator is clearer and more comprehensible than the original one.

关键词： automatic programming parallel programming object oriented modeling signal processing algorithms virtual prototyping

来源：评论

学校读者我要写书评

暂无评论

PDSEC 2015 Keynote - Domain-specific approaches in scientific computing

PDSEC 2015 Keynote - Domain-specific approaches in scientifi...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Naoya Maruyama Team Leader of the HPC Programming Framework Research Team at RIKEN

ISBN: (纸本)9781467376853

Summary form only given. parallel programming with low-level interfaces has been the most viable choice in scientific computing for a long time. In such models, different parallelisms require different parallel programming interfaces, e.g., message passing for parallelism across nodes, threading for intranode parallelism, and vector processing for SIMD and GPUs. Often applications are confronted with these multiple interfaces to fully exploit the current and future large-scale machines. We present our work toward higher-level programming models, allowing for a single program to run on different parallel platforms without much human intervention, and at the same time to achieve close to hand-tuned performance.

关键词： parallel processing Scientific computing parallel programming Handheld computers Computational modeling Message passing

来源：评论

学校读者我要写书评

暂无评论

A parallel processing strategy of large GNSS data based on precise point positioning model

A parallel processing strategy of large GNSS data based on p...

引用

第六届中国卫星导航学术年会

作者： Yang Cui Zhiping Lu Hao Lu Jian Li Yupu Wang

W ith continuous increasing of the data scale of GNSS observations network,the computing pressure of data processing is growingThe undifferenced precise point positioning(PPP) model is one of the main strategies of GNSS network data processing. With the increasing of stations' scale,the processing time of PPP pattern also increases linearly,the traditional serial processing pattern need to consume a large amount of computing time. As the PPP model is not related,this model has good characteristics of parallel processing between stations. This paper established a distributed parallel processing strategy based on the PPP model,whichcan not only improve the efficiency of data processing,but also enhance the efficiency of hardware performance. However,due to the high concurrency of data access and processing,the parallel programming is faced with greatchallenges which can cause immeasurable results. In this paper,by analyzing the flow characteristics of the PPP method,a parallel GNSS data process model at multi-core and multi node level was set up,and a lightweight parallel programming model was adopted to realize the parallel model. Through a large number of data tests and experiments,high efficiency of parallel processing of GNSS data based on the PPP model was achievedThe experiment shows that,under the environment of four multi-core nodes,the parallel processing is at least six times faster than the traditional serial processing.

关键词： Large GNSS network Undifferenced model PPP parallel programming Task parallel library WCF

来源：评论

学校读者我要写书评

暂无评论

parallel Genome-Wide Analysis With Central And Graphic Processing Units

Parallel Genome-Wide Analysis With Central And Graphic Proce...

引用

2015 IEEE International Conference on Computer and Communications(ICCC 2015)

作者： Muhamad Fitra Kacamarga James W.Baurley Bens Pardamean Bioinformatics & Data Science Research Center Bina Nusantara University

The Indonesia Colorectal Cancer Consortium(IC3), the first cancer biobank repository in Indonesia, is faced with computational challenges in analyzing large quantities of genetic and phenotypic data. To overcome this challenge, we explore and compare performance of two parallel computing platforms that use central and graphic processing units. We present the design and implementation of a genome-wide association analysis using the Map Reduce and Compute Unified Device Architecture(CUDA) frameworks and evaluate performance(speedup) using simulated case/control status on 1000 Genomes, Phase 3, chromosome 22 data(1,103,547 Single Nucleotide Polymorphisms). We demonstrated speedup on a server with Intel Xeon E5-2620(6 cores) and NVIDIA Tesla K20 over sequential processing.

关键词： Genome-wide Analysis parallel programming MapReduce CUDA

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous architectures for computational intensive applications: A cost-effectiveness analysis

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2014年 270卷 63-77页

作者： Danovaro, E. Clematis, A. Galizia, A. Ripepi, G. Quarati, A. D'Agostino, D. Natl Res Council Italy Inst Appl Math & Informat Technol Genoa Italy

Current workstations can offer really amazing raw computational power, in the order of TFlops on a single machine equipped with multiple CPUs and accelerators, which means less than half a dollar for a GFlop. Such result can only be achieved with a massive parallelism of the computational devices, but unfortunately not every application is able to fully exploit them. In this paper we analyze the performances of some widely used, computational intensive, applications, like FFT, convolution and n-body simulation, comparing the performance of a multi-core cluster node, with or without the contribution of GPUs. We aim to provide clear measure of the benefit of a heterogeneous architecture, in terms of time and cost, with a stress on the technology adopted at different levels of the software stack for the application parallelization. (C) 2014 Elsevier B.V. All rights reserved.

关键词： parallel programming HPC Heterogeneous architectures

来源：评论

学校读者我要写书评

暂无评论

parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture

引用

INTERNATIONAL JOURNAL OF parallel programming 2014年第6期42卷 875-899页

作者： Chen, Shin-Kai Hung, Cheng-Yu Chen, Ching-Chih Liu, Chih-Wei Natl Chiao Tung Univ Dept Elect Engn Hsinchu Taiwan

Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at resolution, 11.75 fps at resolution, and 62.52 fps at resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.

关键词： parallel programming Streaming application Multicore architecture Distributed scratchpad memory architecture

来源：评论

学校读者我要写书评

暂无评论

Implementation of a Thread-parallel, GPU-Friendly Function Evaluation Library

引用

IEEE ACCESS 2014年 2卷 160-176页

作者： Andreassen, Rolf E. De Silva, Weeraddana Manjula Meadows, Brian T. Sokoloff, Michael D. Tomko, Karen A. Univ Cincinnati Dept Phys Cincinnati OH 45221 USA Ohio Supercomp Ctr Columbus OH 43212 USA

GooFit is a thread-parallel, GPU-friendly function evaluation library, nominally designed for use with the maximum likelihood fitting program MINUIT. In this use case, it provides highly parallel calculations of normalization intergrals and log (likelihood) sums. A key feature of the design is its use of the Thrust library to manage all parallel kernel launches. This allows GooFit to execute on any architecture for which Thrust has a backend, currently, including CUDA for nVidia GPUs and OpenMP for single- and multicore CPUs. Running on an nVidia C2050, GooFit executes 300 times more quickly for a complex high energy physics problem than does the prior (algorithmically equivalent) code running on a single CPU core. The design and implementation choices, discussed in detail, can help to guide developers of other highly parallel, compute-intensive libraries.

关键词： parallel processing parallel programming parameter estimation parameter extraction

来源：评论

学校读者我要写书评

暂无评论

GPU Accelerated Finite-Element Computation for Electromagnetic Analysis

引用

IEEE ANTENNAS AND PROPAGATION MAGAZINE 2014年第2期56卷 39-62页

作者： Meng, Huan-Ting Nie, Bao-Lin Wong, Steven Macon, Charles Jin, Jian-Ming Univ Illinois Dept Elect & Comp Engn Ctr Computat Electromagnet Urbana IL 61801 USA Dynam Res Corp Wright Patterson AFB OH 45433 USA US Air Force Res Lab Wright Patterson AFB OH 45433 USA

General-purpose computing on graphics processing units (GPGPU), with programming models such as the Compute Unified Device Architecture (CUDA) by NVIDIA, offers the capability for accelerating the solution process of computational electromagnetics analysis. However, due to the communication-intensive nature of the finite-element algorithm, both the assembly and the solution phases cannot be implemented via fine-grained many-core GPU processors in a straightforward manner. In this paper, we identify the bottlenecks in the GPU parallelization of the Finite-Element Method for electromagnetic analysis, and propose potential solutions to alleviate the bottlenecks. We first discuss efficient parallelization strategies for the finite-element matrix assembly on a single GPU and on multiple GPUs. We then explore parallelization strategies for the finite-element matrix solution, in conjunction with parallelizable preconditioners to reduce the total solution time. We show that with a proper parallelization and implementation, GPUs are able to achieve significant speedups over OpenMP-enabled multi-core CPUs.

关键词： Computational electromagnetics finite element analysis frequency-domain analysis high performance computing graphics processing units parallel programming

来源：评论

学校读者我要写书评

暂无评论

Towards a Formal Representation of Interactive Systems

引用

FUNDAMENTA INFORMATICAE 2014年第3-4期131卷 313-336页

作者： Banu-Demergian, Iulia Teodora Stefanescu, Gheorghe Univ Bucharest Dept Comp Sci Bucharest 010014 Romania

Powerful algebraic techniques have been developed for classical sequential computation. Many of them are based on regular expressions and the associated regular algebra. For parallel and interactive computation, extensions to handle 2-dimensional patterns are often required. Finite interactive systems, a 2-dimensional version of finite automata, may be used to recognize 2-dimensional languages. In this paper we present a blueprint for getting a formal representation of parallel, interactive programs and of their semantics. It is based on a recently introduced approach for getting regular expressions for 2-dimensional patterns, particularly using words of arbitrary shapes and powerful control mechanisms on composition. We extend the previously defined class of expressions n2RE with new control features, progressively increasing the expressive power of the formalism up to a level where a procedure for generating the words accepted by finite interactive systems may be obtained. Targeted applications come from the area of modelling, specification, analysis and verification of structured interactive programs via the associated scenario semantics.

关键词： parallel programming interactive programming regular expressions regular algebra Kleene theorem network algebra 2-dimensional languages finite interactive systems relational semantics specification verification structured interactive programming

来源：评论

学校读者我要写书评

暂无评论

parallel SAD for fast dense disparity map using a shared memory programming

Parallel SAD for fast dense disparity map using a shared mem...

引用

5th FTRA International Conference on Information Technology Convergence and Services, ITCS 2013 and the 3rd International Conference on Intelligent Robotics, Automations, Telecommunication Facilities, and Applications, IRoA 2013

作者： Kim, Cheong Ghil

ISBN: (纸本)9789400769953

The depth map extraction from stereo video is a key technology of stereoscopic 3D video as well as view synthesis and 2D-3D video conversions. Sum of Absolute Differences (SAD) is a representative method to reconstruct disparity map. However, dense disparity map implementation requires heavy computations and extensive memory accesses. In this situation, the rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors to become a dominant market trend in desk-top PCs as well as high end mobile devices;this movement allows many parallel programming technologies to be realized in users computing devices. Therefore, this paper proposes a parallel algorithm for SAD operation using a shared memory programming, OpenMP, which can provide the advantage to simplify managing and synchronization of program threads. The parallel implementation results show the 2.5 times of performance improvements on the processing speed compared with the serial implementation. © 2013 Springer Science+Business Media Dordrecht.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：