On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. this paper provides an overview of the MilkyWay-2 project and describes the design...
详细信息
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. this paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. the key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide highperformance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
Data-centric applications are increasingly more common, causing issues brought on by the discrepancy between processor and memory technologies to be increasingly more apparent. Near-Data Processing (NDP) is an approac...
详细信息
ISBN:
(纸本)9798350305487
Data-centric applications are increasingly more common, causing issues brought on by the discrepancy between processor and memory technologies to be increasingly more apparent. Near-Data Processing (NDP) is an approach to mitigate this issue. It proposes moving some of the computation close to the memory, thus allowing for reduced data movement and aiding data-intensive workloads. Analytical database queries are very commonly used in NDP research due to their intrinsics usage of very large volumes of data. In this paper, we investigate the migration of most time-consuming database operators to VIMA, a novel 3D-stacked memory-based NDP architecture. We consider the selection, projection, and bloom join database query operators, commonly used by data analytics applications, comparing Vector-In-Memory architecture (VIMA) to a highperformance x86 baseline. We pitch VIMA against both a single-thread baseline and a modern 16-thread x86 system to evaluate its performance. Against a single-thread baseline, our experiments show that VIMA is able to speed up execution by up to 5x for selection, 2.5x for projection, and 16x for join while consuming up to 99% less energy. When considering a multi-thread baseline, VIMA matches the execution time performance even at the largest dataset sizes considered. In comparison to existing state-of-the-art NDP platforms, we find that our approach achieves superior performance for these operators.
this paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the highperformance Analytics Toolkit (HPAT). the comparison was performed using two applications: a unidimensi...
详细信息
ISBN:
(纸本)9781538614655
this paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the highperformance Analytics Toolkit (HPAT). the comparison was performed using two applications: a unidimensional vector sum and the K-means clustering algorithm. the experiments were performed in distributed and shared memory environments with different numbers and configurations of virtual machines. By analyzing the results we are able to conclude that HPAT has performance improvements in relation to Apache Spark in our case studies. We independently validated the results and potential presented by the HPAT developers. We also provide an analysis of both frameworks in the presence of failures.
ISAM(1) is a proposal directed to resource management in heterogeneous networks, supporting physical and logical mobility, dynamic adaptation and the execution of distributed applications based on components. In order...
详细信息
ISBN:
(纸本)0769517722
ISAM(1) is a proposal directed to resource management in heterogeneous networks, supporting physical and logical mobility, dynamic adaptation and the execution of distributed applications based on components. In order to achieve its goals, ISAM uses, as strategy, an integrated enviromnent that: (a) provides a programming paradigm and its execution environment;(b) handles the adaptation process through a multilevel collaborative model, in which boththe system and the application contribute. In this paper we discuss the main mechanisms used to implement the ISAM features, and we also present a parallel application that explores some of this features.
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualiz...
详细信息
ISBN:
(纸本)0769522750
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualization and modularization, and an increased commoditization of hardware. At the application level, boththe workload mix and usage patterns have evolved to an increased emphasis on service-centric computing and SLA-driven performance tuning. these, often dramatic, changes in the enterprise IT landscape motivate equivalent changes in the emphasis of architecture research. In this paper, we summarize some recent trends in enterprise IT systems and discuss the implications for architecture research, suggesting some high-level challenges and open questions for the community to address.
Future highperformancecomputing will undoubtedly reach Petascale and beyond. Today's HPC is tomorrow's Personal computing. What are the evolving processor architectures towards Multi-core and Many-core for t...
详细信息
the rapid development of web service technology brings up a number of crucial requirements for designing service computing runtime, such as supporting multiple message exchange patterns, switching among different tran...
详细信息
ISBN:
(纸本)9781457706783
the rapid development of web service technology brings up a number of crucial requirements for designing service computing runtime, such as supporting multiple message exchange patterns, switching among different transports, integrating various extended web service protocols and achieving robust performance under high concurrency. Based on staged event-driven architecture, we propose a novel architecture for an adaptive web-service-centric service computing runtime, named SEDA4SC. In SEDA4SC, the process of basic and extended web service protocols is divided into four primary event-driven stages to enable system independence and module isolation. Moreover, this architecture allows messages to be handled in two independent pipelines: the input pipeline and the output pipeline. Arbitrary message exchange patterns can be supported through a combination of the two pipelines. With SEDA4SC, we design and implement a service computing runtime system. the performance evaluation results show that our system exhibits robust performance under high concurrency.
the solution of large and complex coupled electromechanical problems requires highperformancecomputing resources. In the last years the use of Graphic Processing Units (GPUs) has gained increasing popularity in scie...
详细信息
ISBN:
(纸本)9781467303057;9781467303064
the solution of large and complex coupled electromechanical problems requires highperformancecomputing resources. In the last years the use of Graphic Processing Units (GPUs) has gained increasing popularity in scientific computing because of their low cost and parallel architecture. In this paper the authors report the main results of the GPU approach to the parallelization of a research code for the electromagnetic launcher analysis. Programming a GPU - based environment poses a number of critical issues that have to be carefully addressed in order to fully exploit the potentiality of the system. Data have to be properly organized in order to fit the Single Instruction Multiple Data scheme;the data transfer between the host and the device, as well as the memory management of the GPU deserve accurate programming. Two examples of application of the parallelized code have been reported to show the performance improvements that can be obtained in the numerical analysis of both rail and induction launchers.
In recent years, the Deep Neural Network (DNN) has been successfully used in image classification. Most of existing DNN often need to learn a very large set of parameters, which require a huge amount of computational ...
详细信息
ISBN:
(纸本)9781538637906
In recent years, the Deep Neural Network (DNN) has been successfully used in image classification. Most of existing DNN often need to learn a very large set of parameters, which require a huge amount of computational resources and time to train these model parameters using the gradient descent and back-propagation procedure. To solve this issue, the PCANet has been developed for high efficient design and training of the DNN. Compared with traditional DNN, PCANet has simpler structure and better performance, which makes it attractive for hardware design. To overcome the limitations of PCANet and significantly improve its performance, we have proposed a novel model named Constrained high Dispersal Network (CHDNet) which is a variant of PCANet. In this paper, we implement the CHDNet on the Xilinx ZYNQ FPGA to ensure the instantaneity of the system with lower power than personal computer needed by taking advantage of the algorithmic parallelism and ZYNQ architecture. Our experimental results over two major datasets, the MNIST dataset for handwritten digits recognition, and the Extended Yale B dataset for face recognition, demonstrate that our model of implementation on FPGA is more than 15x faster than software implementation on PC (Intel i7-4720HQ, 2.6GHz).
the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of ...
详细信息
ISBN:
(纸本)9781538677698
the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary filesystems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates withthe job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.
暂无评论