检索结果-内蒙古大学图书馆

MilkyWay-2 supercomputer： system and application

Frontiers of computer Science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. this paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. the key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Improved Computation of Database Operators via Vector Processing Near-Data 35

Improved Computation of Database Operators via Vector Proces...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Santos, Sairo Kepe, Tiago R. Alves, Marco A. Z. Fed Rural Univ Semiarid Angicos Brazil Fed Inst Parana Curitiba Parana Brazil

ISBN: (纸本)9798350305487

Data-centric applications are increasingly more common, causing issues brought on by the discrepancy between processor and memory technologies to be increasingly more apparent. Near-Data Processing (NDP) is an approach to mitigate this issue. It proposes moving some of the computation close to the memory, thus allowing for reduced data movement and aiding data-intensive workloads. Analytical database queries are very commonly used in NDP research due to their intrinsics usage of very large volumes of data. In this paper, we investigate the migration of most time-consuming database operators to VIMA, a novel 3D-stacked memory-based NDP architecture. We consider the selection, projection, and bloom join database query operators, commonly used by data analytics applications, comparing Vector-In-Memory architecture (VIMA) to a highperformance x86 baseline. We pitch VIMA against both a single-thread baseline and a modern 16-thread x86 system to evaluate its performance. Against a single-thread baseline, our experiments show that VIMA is able to speed up execution by up to 5x for selection, 2.5x for projection, and 16x for join while consuming up to 99% less energy. When considering a multi-thread baseline, VIMA matches the execution time performance even at the largest dataset sizes considered. In comparison to existing state-of-the-art NDP platforms, we find that our approach achieves superior performance for these operators.

关键词： near-data processing high performance computing database operators

来源：评论

学校读者我要写书评

暂无评论

Is Intel high performance Analytics Toolkit a good alternative to Apache Spark? 16

Is Intel High Performance Analytics Toolkit a good alternati...

引用

IEEE 16th International symposium on Network computing and Applications (NCA)

作者： de Carvalho, Rafael Aquino Goldman, Alfredo Cavalheiro, Gerson Geraldo H. Univ Sao Paulo Inst Math & Stat IME Sao Paulo Brazil Univ Fed Pelotas UFPel Grad Program Comp Sci PPGC Pelotas Brazil

ISBN: (纸本)9781538614655

this paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the high performance Analytics Toolkit (HPAT). the comparison was performed using two applications: a unidimensional vector sum and the K-means clustering algorithm. the experiments were performed in distributed and shared memory environments with different numbers and configurations of virtual machines. By analyzing the results we are able to conclude that HPAT has performance improvements in relation to Apache Spark in our case studies. We independently validated the results and potential presented by the HPAT developers. We also provide an analysis of both frameworks in the presence of failures.

关键词： performance comparison big data frameworks HPAT Apache Spark

来源：评论

学校读者我要写书评

暂无评论

A framework for exploiting adaptation in high heterogeneous distributed processing 14

A framework for exploiting adaptation in high heterogeneous ...

引用

14th symposium on computer architecture and high performance computing

作者： Yamin, AC Barbosa, JV Augustin, I da Silva, L Real, R Geyer, C Cavalheiro, G Catholic Univ Pelotas Pelotas RS Brazil

ISBN: (纸本)0769517722

ISAM(1) is a proposal directed to resource management in heterogeneous networks, supporting physical and logical mobility, dynamic adaptation and the execution of distributed applications based on components. In order to achieve its goals, ISAM uses, as strategy, an integrated enviromnent that: (a) provides a programming paradigm and its execution environment;(b) handles the adaptation process through a multilevel collaborative model, in which both the system and the application contribute. In this paper we discuss the main mechanisms used to implement the ISAM features, and we also present a parallel application that explores some of this features.

关键词： Application software Collaboration computer networks Distributed computing Distributed processing Grid computing Mobile computing Peer to peer computing Pervasive computing Proposals

来源：评论

学校读者我要写书评

暂无评论

Enterprise IT trends and implications for architecture research

Enterprise IT trends and implications for architecture resea...

引用

11th International symposium on high-performance computer architecture

作者： Ranganathan, P Jouppi, N Hewlett Packard Labs ISSL MMSL Palo Alto CA USA

ISBN: (纸本)0769522750

the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualization and modularization, and an increased commoditization of hardware. At the application level, both the workload mix and usage patterns have evolved to an increased emphasis on service-centric computing and SLA-driven performance tuning. these, often dramatic, changes in the enterprise IT landscape motivate equivalent changes in the emphasis of architecture research. In this paper, we summarize some recent trends in enterprise IT systems and discuss the implications for architecture research, suggesting some high-level challenges and open questions for the community to address.

关键词： computer architecture

来源：评论

学校读者我要写书评

暂无评论

Petascale computing research challenges - A manycore perspective

Petascale computing research challenges - A manycore perspec...

引用

13th International symposium on high-performance computer architecture

作者： Pawlowski, Steve Intel Corp Digital Enterprise Grp Santa Clara CA 95051 USA

ISBN: (纸本)9781424408047

Future high performance computing will undoubtedly reach Petascale and beyond. Today's HPC is tomorrow's Personal computing. What are the evolving processor architectures towards Multi-core and Many-core for the best performance per watt;memory bandwidth solutions to feed the ever more powerful processors;intra-chip interconnect options for optimal bandwidth vs. power? With Moore's Law continuing to prove its viability and shrinking transistors' geometry, improving reliability is even more challenging. Intel Senior Fellow and Chief Technology Officer of Intel's Digital Enterprise Group, Steve Pawlowski, will provide his technology vision, insight and research challenges to achieve the vision of Petascale computing and beyond. © 2007 IEEE.

关键词： Computation theory

来源：评论

学校读者我要写书评

暂无评论

SEDA4SC: A Staged Event-Driven architecture for Adaptive Service computing Runtime

SEDA4SC: A Staged Event-Driven Architecture for Adaptive Ser...

引用

16th IEEE symposium on computers and Communications (lSCC)

作者： Li, Zhuqing Ma, Dianfu Sun, Dou Liu, Jian Beihang Univ Inst Adv Comp Technol Beijing Peoples R China

ISBN: (纸本)9781457706783

the rapid development of web service technology brings up a number of crucial requirements for designing service computing runtime, such as supporting multiple message exchange patterns, switching among different transports, integrating various extended web service protocols and achieving robust performance under high concurrency. Based on staged event-driven architecture, we propose a novel architecture for an adaptive web-service-centric service computing runtime, named SEDA4SC. In SEDA4SC, the process of basic and extended web service protocols is divided into four primary event-driven stages to enable system independence and module isolation. Moreover, this architecture allows messages to be handled in two independent pipelines: the input pipeline and the output pipeline. Arbitrary message exchange patterns can be supported through a combination of the two pipelines. With SEDA4SC, we design and implement a service computing runtime system. the performance evaluation results show that our system exhibits robust performance under high concurrency.

关键词： Web Services Service computing SEDA Service Execution Environemnt high Concurrency

来源：评论

学校读者我要写书评

暂无评论

Acceleration of Electromagnetic Launchers Modeling by Using Graphic Processing Units

Acceleration of Electromagnetic Launchers Modeling by Using ...

引用

16th IEEE International symposium on Electromagnetic Launch (EML) Technology

作者： Musolino, Antonino Rizzo, Rocco Toni, Michele Tripodi, Ernesto Univ Pisa Dept Energy & Syst Engn I-56100 Pisa Italy

ISBN: (纸本)9781467303057;9781467303064

the solution of large and complex coupled electromechanical problems requires high performance computing resources. In the last years the use of Graphic Processing Units (GPUs) has gained increasing popularity in scientific computing because of their low cost and parallel architecture. In this paper the authors report the main results of the GPU approach to the parallelization of a research code for the electromagnetic launcher analysis. Programming a GPU - based environment poses a number of critical issues that have to be carefully addressed in order to fully exploit the potentiality of the system. Data have to be properly organized in order to fit the Single Instruction Multiple Data scheme;the data transfer between the host and the device, as well as the memory management of the GPU deserve accurate programming. Two examples of application of the parallelized code have been reported to show the performance improvements that can be obtained in the numerical analysis of both rail and induction launchers.

关键词： electromagnetic launchers Graphics Processing Unit Store management Graphics processing acceleration single instruction multiple datastream high performance computing Parallel architectures Scientific computing parallelization

来源：评论

学校读者我要写书评

暂无评论

FPGA-Accelerated for Constrained high Dispersal Network 15

FPGA-Accelerated for Constrained High Dispersal Network

引用

15th IEEE International symposium on Parallel and Distributed Processing with Applications (ISPA) / 16th IEEE International Conference on Ubiquitous computing and Communications (IUCC)

作者： Chen, Yanliang Zhu, Minghua Xiao, Bo Meng, Dan East China Normal Univ MOE Res Ctr Software Hardware Codesign Engn & App Shanghai Peoples R China

ISBN: (纸本)9781538637906

In recent years, the Deep Neural Network (DNN) has been successfully used in image classification. Most of existing DNN often need to learn a very large set of parameters, which require a huge amount of computational resources and time to train these model parameters using the gradient descent and back-propagation procedure. To solve this issue, the PCANet has been developed for high efficient design and training of the DNN. Compared with traditional DNN, PCANet has simpler structure and better performance, which makes it attractive for hardware design. To overcome the limitations of PCANet and significantly improve its performance, we have proposed a novel model named Constrained high Dispersal Network (CHDNet) which is a variant of PCANet. In this paper, we implement the CHDNet on the Xilinx ZYNQ FPGA to ensure the instantaneity of the system with lower power than personal computer needed by taking advantage of the algorithmic parallelism and ZYNQ architecture. Our experimental results over two major datasets, the MNIST dataset for handwritten digits recognition, and the Extended Yale B dataset for face recognition, demonstrate that our model of implementation on FPGA is more than 15x faster than software implementation on PC (Intel i7-4720HQ, 2.6GHz).

关键词： Image classification Deep neural network FPGA high-level Synthesis

来源：评论

学校读者我要写书评

暂无评论

echofs: A Scheduler-guided Temporary Filesystem to leverage Node-local NVMs 30

echofs: A Scheduler-guided Temporary Filesystem to leverage ...

引用

30th International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Miranda, Alberto Nou, Ramon Cortes, Toni BSC Barcelona Spain Univ Politecn Cataluna BSC Barcelona Spain

ISBN: (纸本)9781538677698

the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary filesystems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.

关键词： Buffer storage Metadata Fuses Nonvolatile memory Hardware computer architecture Random access memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：