As bioinformatics is an emerging application of highperformancecomputing, this paper first evaluates the memory performance of several representative bioinformatics applications so that some appropriate optimization...
详细信息
Heterogeneous architecture is becoming an important way to build a massive parallel computer system, i.e. the CPU-GPU heterogeneous systems ranked in Top500 list. However, it is a challenge to efficiently utilize mass...
详细信息
ISBN:
(纸本)9781450305525
Heterogeneous architecture is becoming an important way to build a massive parallel computer system, i.e. the CPU-GPU heterogeneous systems ranked in Top500 list. However, it is a challenge to efficiently utilize massive parallelism of both applications and architectures on such heterogeneous systems. In this paper we present a practice on how to exploit and orchestrate parallelism at algorithm level to take advantage of underlying parallelism at architecture level. A potential Petaflops application cryo-EM 3D reconstruction is selected as an example. We exploit all possible parallelism in cryo-EM 3D reconstruction, and leverage a self-adaptive dynamic scheduling algorithm to create a proper parallelism mapping between the application and architecture. the parallelized programs are evaluated on a subsystem of Dawning Nebulae supercomputer, whose node is composed of two Intel six-core Xeon CPUs and one Nvidia Fermi CPU. the experiment confirms that hierarchical parallelism is an efficient pattern of parallel programming to utilize capabilities of both CPU and CPU in a heterogeneous system. the CUDA kernels run more than 3 times faster than the OpenMP parallelized ones using 12 cores (threads). Based on the CPU-only version, the hybrid CPU-CPU program further improves the whole application's performance by 30% on the average.
this conference proceedings contain 32 papers. the main topics are architectural characteristics of scientific applications, TLBs and memory management, input/output systems, fault-tolerant computerarchitecture, mult...
详细信息
ISBN:
(纸本)0818638109
this conference proceedings contain 32 papers. the main topics are architectural characteristics of scientific applications, TLBs and memory management, input/output systems, fault-tolerant computerarchitecture, multiprocessor caches, high-performancecomputing from the application perspective, multithreading support, shared memory systems, cache designs,and multiprocessor memory systems and interconnections.
this research presents preliminary work to address the challenge of identifying at-risk students using supervised machine learning and three unique data categories: engagement, demographics, and performance data colle...
详细信息
the goal of this work is to design cache coherence protocols with many cores that can be verified with state-of-the-art automated verification methodologies. In particular, we focus on flat (non-hierarchical) coherenc...
详细信息
this paper explores the performance implications of using future byte addressable non-volatile memory (NVM) like PCM in end client devices. We explore how to obtain dual benefits - increased capacity and faster persis...
详细信息
Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible addres...
详细信息
this special issue presents new trends in computerarchitecture and in parallel and distributed systems. It is based on the best papers of the 24thinternationalsymposium on computerarchitecture and highperformance...
详细信息
this special issue presents new trends in computerarchitecture and in parallel and distributed systems. It is based on the best papers of the 24thinternationalsymposium on computerarchitecture and highperformancecomputing, which was held in New York, NY, USA on October 24-26, 2012 in the Columbia University. the authors were invited to provide extended versions of the papers presented in the conference, taking into account suggestions by the double-blinded peer review process and comments gathered during the conference.
Dynamic power management (DPM) is critical to maximizing the performance of systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM scheme...
详细信息
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). the usual service connectors of commercial component models do not fit some requirements of...
详细信息
ISBN:
(纸本)9780769530147
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). the usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however this paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.
暂无评论