检索结果-内蒙古大学图书馆

Protocol-Independent Event Building Evaluator for the LHCb DAQ System

IEEE TRANSACTIONS ON NUCLEAR SCIENCE 2015年第3期62卷 1110-1114页

作者： Perez, Daniel Hugo Campora Schwemmer, Rainer Neufeld, Niko CERN PH Dept CH-1211 Geneva 23 Switzerland

The Data Acquisition (DAQ) system of LHCb is a complex real-time system. It will be upgraded to provide LHCb with an all-software, trigger-free readout starting from 2020. Consequently, more CPU power in the form of servers will be needed and the DAQ network will grow to a capacity of 40 Tbps. A PC-based readout system would receive data incoming from the detector, which would then be scattered across builder nodes, and further distributed to a computing farm for data filtering. The design bandwidth of such a DAQ system requires rates as high as 400 Gbps single-duplex per node. These builder nodes will be connected with cost-effective, high-bandwidth data-centre switches in order to minimize the system cost. The behaviour of such an Event Building network can of course be studied in simulation but experience tells us that it is crucial to test, in particular to find out limitations in the switches themselves and to which extent various Event Building protocols can mitigate these limitations. We present a protocol, topology and transport independent emulation software named DAQ Protocol-Independent Performance Evaluator (DAQPIPE). It allows us to test different communication architectures, such as push or pull, with regards to the initiator of the communication. Different topologies and transport protocols can also be tested. We present throughput and stress tests on an InfiniBand FDR multi-rail based LAN network setup, with a focus on the network performance. Large tests on the current system LHCb DAQ are shown to demonstrate the scalability of DAQPIPE itself and its capability to be deployed on any kind of large, tightly interconnected network to test its suitability for Event Building applications.

关键词： Computer networks data acquisition message passing next generation networking parallel programming

来源：评论

学校读者我要写书评

暂无评论

A parallel programming framework for multi-core DNA sequence alignment

A parallel programming framework for multi-core DNA sequence...

引用

4th International Conference on Complex, Intelligent and Software Intensive Systems (CICIS)

作者： Almeida, Tiago Roma, Nuno Univ Tecn Lisboa IST INESC ID Lisbon Portugal

ISBN: (纸本)9780769539676

A new parallel programming framework for DNA sequence alignment in homogeneous multi-core processor architectures is proposed. Contrasting with traditional coarse-grained parallel approaches, that divide the considered database in several smaller subsets of complete sequences to be aligned with the query sequence, the presented methodology is based on a slicing procedure of both the query and the database sequence under consideration in several tiles/chunks that are concurrently processed by the several cores available in the multi-core processor. The obtained experimental results have proven that significant accelerations of traditional biological sequence alignment algorithms can be obtained, reaching a speedup that is linear with the number of available processing cores and very close to the theoretical maximum.

关键词： Multi-core processor parallel programming Computational biology framework

来源：评论

学校读者我要写书评

暂无评论

Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures

引用

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING 2015年第9-10期25卷 1739-1741页

作者： Adornes, Daniel Griebler, Dalvan Ledur, Cleverson Fernandes, Luiz Gustavo Pontifical Catholic Univ Rio Grande do Sul PUCRS Fac Informat FACIN Comp Sci Grad Program PPGCC GMAP Ave Ipiranga 6681Bldg 32 BR-90619900 Porto Alegre RS Brazil

MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, diffrerent architectural levels require diffrerent optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very diffrerent MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.

关键词： MapReduce domain-specific language parallel programming productivity

来源：评论

学校读者我要写书评

暂无评论

Redistribution mechanism for associative distributed collections of objects

Redistribution mechanism for associative distributed collect...

引用

International Conference on Computer and Information Science (ACIS)

作者： Daisuke Fujishima Tomio Kamada Grad. School of System Informatics Kobe University

ISBN: (纸本)9781509008070

The field of parallel computing has experienced an increase in the number of computing nodes and parallel computing has widened its application to include computations that have irregular features. Some parallel programming languages handle object data structures and offer marshaling/unmarshaling mechanisms to transport them. To manage data elements spread over computing nodes, some research on distributed collections has been conducted. This study proposes a distributed collection library that can handle multiple collections of object elements and change their distributions while maintaining the associativity between their elements. This library is implemented on an object-oriented parallel programming language, X10. We suppose pairs of associative collections such as vehicles and streets in a traffic simulation. When many vehicles are concentrated on streets assigned to certain computing nodes, some of those streets should be moved to other nodes. Our library supports the programmer in easily distributing the associative collections over the computing nodes and re-allocating their elements while maintaining the data sharing relationship among associative elements. The programmer can describe the associativity between objects using both declarative and procedural methods.

关键词： Vehicles Libraries parallel programming Computational modeling Load management Load modeling parallel processing

来源：评论

学校读者我要写书评

暂无评论

Estimating transaction execution times for a software transactional memory

Estimating transaction execution times for a software transa...

引用

International Conference on Information Science and Technology (ICIST)

作者： Miroslav Popovic Branislav Kordic Ilija Basicevic Faculty of Technical Sciences University of Novi Sad Novi Sad Serbia

Over the last two decades, researchers developed many software, hardware, and hybrid Transactional Memories (TMs) with various APIs and semantics. However, reduced performance when exposed to high contention loads is still the major downside of all the TMs. Although many strategies and methods have been proposed, contention management and transaction scheduling still remains an open area of research. An important piece of unsolved contention management puzzle is plausible transaction execution time estimation. In this paper we proposed two methods for estimating transaction execution times, namely the method based on log-normal distribution and the method based on gamma distribution. Experimental results presented in this paper indicate that the method based on log-normal distribution has better estimation accuracy than the method based on gamma distribution. Even more importantly, the method based on log-normal distribution uses 10 times shorter sliding windows and its complexity is much lower than for the method based on gamma distribution, thus it is faster and requires less electrical power.

关键词： Decision support systems Log-normal distribution parallel programming Estimation

来源：评论

学校读者我要写书评

暂无评论

pocl: A Performance-Portable OpenCL Implementation

引用

INTERNATIONAL JOURNAL OF parallel programming 2015年第5期43卷 752-785页

作者： Jaaskelainen, Pekka Sanchez de La Lama, Carlos Schnetter, Erik Raiskila, Kalle Takala, Jarmo Berg, Heikki Tampere Univ Technol FIN-33101 Tampere Finland Knowledge Dev POF Madrid Spain Perimeter Inst Theoret Phys Waterloo ON Canada Univ Guelph Dept Phys Guelph ON N1G 2W1 Canada Louisiana State Univ Ctr Computat & Technol Baton Rouge LA 70803 USA Nokia Res Ctr Espoo Finland

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear;multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the OpenCL implementation to run OpenCL applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close

关键词： OpenCL LLVM GPGPU VLIW SIMD parallel programming Heterogeneous platforms Performance portability

来源：评论

学校读者我要写书评

暂无评论

Introducing parallelism by Using REPARA C++11 Attributes

Introducing Parallelism by Using REPARA C++11 Attributes

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： M. Danelutto J. Daniel Garcia Luis Miguel Sanchez Rafael Sotomayor M. Torquati Dept. of Computer Science Univ. of Pisa Computer Science and Engineering Dep. Universidad Carlos III de Madrid

Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.

关键词： Kernel parallel processing Pipelines Syntactics parallel programming Standards

来源：评论

学校读者我要写书评

暂无评论

SciPAL: Expression templates and composition closure objects for high performance computational physics with CUDA and openMP

引用

ACM Transactions on parallel Computing 2015年第2期1卷 1–31页

作者： Kramer, Stephan C. Hagemann, Johannes Max-Planck-Institut für Biophysikalische Chemie Am Faßberg 11 Göttingen37077 Germany Institut für Röntgenphysik Universität Göttingen Friedrich-Hund-Platz 1 Göttingen37077 Germany

We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library. Its core is a domain-specific embedded language for numerical linear algebra. The main fields of application are finite element simulations, coherent optics and the solution of inverse problems. Using Sci- PAL, algorithms can be stated in a mathematically intuitive way in terms of matrix and vector operations. Existing algorithms can easily be adapted to GPU-based computing by proper template specialization. Our library is compatible with the finite element library *** and provides a port of ***'s most frequently used linear algebra classes to CUDA (NVidia's extension of the programming languages C and C++ for programming their GPUs). SciPAL's operator-based API for BLAS operations particularly aims at simplifying the usage of NVidia's CUBLAS. For non-BLAS array arithmetic SciPAL's expression templates are able to generate CUDA kernels at compile *** demonstrate the benefits of SciPAL using the iterative principal component analysis as example which is the core algorithm for the spike-sorting problem in neuroscience. © 2015 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

On The parallel programming of Flash Memory Cells

On The Parallel Programming of Flash Memory Cells

引用

IEEE Information Theory Workshop (ITW)

作者： Yaakobi, Eitan Jiang, Anxiao (Andrew) Siegel, Paul H. Vardy, Alexander Wolf, Jack K. Univ Calif San Diego Elect & Comp Engn La Jolla CA 92093 USA Texas A&M Univ Comp Sci Engn College Stn TX 77843 USA

ISBN: (纸本)9781424482641

parallel programming is an important tool used in flash memories to achieve high write speed. In parallel programming, a common programm voltage is applied to many cells for simultaneous charge injection. This property significantly simplifies the complexity of the memory hardware, and is a constraint that limits the storage capacity of flash memories. Another important property is that cells have different hardness for charge injection. It makes the charge injected into cells differ even when the same program voltage is applied to them. In this paper, we study the parallel programming of flash memory cells, focusing on the above two properties. We present algorithms for parallel programming when there is information on the cells' hardness for charge injection, but there is no feedback information on cell levels during programming. We then proceed to the programming model with feedback information on cell levels, and study how well the information on the cells' hardness for charge injection can be obtained. The results can be useful for understanding the storage capacity of flash memories with parallel programming.

关键词： charge injection electronic engineering computing feedback information flash memories flash memory cells high write speed memory hardware parallel programming parallel programming program voltage programming model simultaneous charge injection storage capacity

来源：评论

学校读者我要写书评

暂无评论

Optimization of Nonlinear Structures based on Object-Oriented parallel programming

Optimization of Nonlinear Structures based on Object-Oriente...

引用

7th International Conference on Engineering Computational Technology

作者： Fischer, M. Firl, M. Masching, H. Bletzinger, K. -U. Tech Univ Munich Chair Struct Anal D-80290 Munich Germany

ISBN: (纸本)9781905088416

This contribution presents a computational framework for simulation and gradient-based structural optimization of geometrically nonlinear and large-scale structural finite element models. CAGD-free optimization methods have been developed to integrate shape optimization in an early stage of design and to reduce the related modelling effort. To overcome the problem of an increasing numerical cost due to the large design space, the design sensitivities for objectives and constraints are evaluated via adjoint formulations. A new parallel computation strategy for sensitivity evaluation is presented which takes advantage of a completely parallelized simulation and optimization environment. Two application examples illustrate the method and demonstrate the high parallel efficiency.

关键词： object-oriented programming parallel programming finite element method structural optimization nonlinear kinematics CAGD-free optimization C plus

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：