检索结果-内蒙古大学图书馆

7th ieee international symposium on Networking Computing and applications, NCA 2008

作者： Sugawara, Yutaka Tezuka, Hiroshi Inaba, Mary Hiraki, Kei Yoshino, Takeshi University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-8656 Japan Google Japan Inc.

ISBN: (纸本)9780769531922

With the rapid progress of high-performance cluster applications, data transfer between clusters in distant locations becomes more important. But, it is difficult to transfer data using parallel TCP streams on long distance high bandwidth network. In this paper, we microscopically observe parallel TCP streams on 10Gbps network using our network analyzer, propose, implement, and evaluate "Stream Equalizer" which relaxes self-congestion and balances throughput among streams. We evaluate it using a real wide-area network over the Pacific Ocean. the network analyzer and the Stream Equalizer are implemented on FPGA-based programmable high-speed network testbed TGNLE-1. © 2008 ieee.

关键词： Equalizers

来源：评论

学校读者我要写书评

暂无评论

An efficient, model-based CPU-GPU heterogeneous FFT library

An efficient, model-based CPU-GPU heterogeneous FFT library

引用

10th Workshop on Advances in parallel and distributed Computational Models/22nd ieee international parallel and distributed processing symposium

作者： Ogata, Yasuhito Endo, Toshio Maruyama, Naoya Matsuoka, Satoshi Tokyo Inst Technol Tokyo Japan

ISBN: (纸本)9781424416936

General-Purpose computing on Graphics processing Units (GPGPU) is becoming popular in HPC because of its high peak performance. However, in spite of the potential performance improvements as well as recent promising results in scientific computing applications, its real performance is not necessarily higher than that of the current high-performance CPUs, especially with recent trends towards increasing the number of cores on a single die. this is because the GPU performance can be severely limited by such restrictions as memory size and bandwidth and programming using graphics-specific APIs. To overcome this problem, we propose a model-based, adaptive library for 2D FFT that automatically achieves optimal performance using available heterogeneous CPU-GPU computing resources. To find optimal load distribution ratios between CPUs and GPUs, we construct a performance model that captures the respective contributions of CPU vs. GPU, and predicts the total execution time of 2D-FFT for arbitrary problem sizes and load distribution. the performance model divides the FFT computation into several small sub steps, and predicts the execution time of each step using profiling results. Preliminary evaluation with our prototype shows that the performance model can predict the execution time of problem sizes that are 16 times as large as the profile runs with less than 20% error and that the predicted optimal load distribution ratios have less than 1% error. We show that the resulting performance improvement using both CPUs and GPUs can be as high as 50% compared to using either a CPU core or a GPU.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Modeling and predicting application performance on parallel computers using HPC Challenge benchmarks

Modeling and predicting application performance on parallel ...

引用

22nd ieee international parallel and distributed processing symposium (IPDPS 2008)

作者： Pfeiffer, Wayne Wright, Nicholas J. San Diego Supercomp Ctr La Jolla CA 92093 USA

ISBN: (纸本)9781424416936

A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the application run time is expressed as a linear combination of inverse speeds and latencies from microkernels or system characteristics. the model parameters are obtained by an automated series of least squares fits using backward elimination to ensure statistical significance. If necessary, outliers are deleted to ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memory bandwidth, interconnect bandwidth, and interconnect latency. Such models allow prediction of application performance on future computers from easier-to-make predictions of microkernel performance. the method was used to build models for four benchmark problems involving the PARATEC and MILC scientific applications. these models not only describe performance well on the ten computers used to build the models, but also do a good job of predicting performance on three additional computers with newer design features. For the four application benchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. the method was also used to build models for the HPL and G-FFTE benchmarks in HPCC, including functional dependences on problem size and core count from complexity analysis. the model for HPL predicts performance even better than the application models do, while the model for G-FFTE systematically underpredicts run times.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

Automated generation of explicit connectors for component based hardware/software interaction in embedded real-time systems

Automated generation of explicit connectors for component ba...

引用

10th Workshop on Advances in parallel and distributed Computational Models/22nd ieee international parallel and distributed processing symposium

作者： Forster, Wolfgang Kutschera, Christof Steinilnger, Andreas Goeschka, Karl M. Vienna Univ Technol Karlspl 13 A-1040 Vienna Austria Univ Appl Sci Tech Vienna Dept Embedded Syst A-1200 Vienna Austria

ISBN: (纸本)9781424416936

the complexity of today's embedded real-time systems is continuously growing with high demands on dependability, resource-efficiency, and reusability Two solution approaches address these needs: First, in the component based software engineering (CBSE) paradigm, software is decomposed into self-contained components with explicit interactions and context dependencies. Connectors represent the abstraction of interactions between these components. Second, components can be shifted from software to reconfigurable hardware, typically field programmable gate arrays (FPGAs), in order to meet real-time constraints. this paper proposes a component-based concept to support efficient hardware/software co-design: A hardware component together with the hardware/soflware connector can seamlessly replace a software component with the same functionality, while the particularities of the alternative interaction are encapsulated in the component connector. Our approach provides for tools that can generate all necessary interaction mechanisms between hardware and software components. A proof-of-concept application demonstrates the advantages of our concept: Rapid change and comparison of different partitioning decisions due to automated and faultless generation of the hardware/software connectors.

关键词： HW/SW interaction CBSE embedded real-time systems automated design flow

来源：评论

学校读者我要写书评

暂无评论

On the Optimization of Resource Utilization in distributed Multimedia applications 08

On the Optimization of Resource Utilization in Distributed M...

引用

8th ieee international symposium on Cluster Computing and the Grid

作者： Yang, R. van der Mei, R. D. Roubos, D. Seinstra, F. J. Koole, G. M. Vrije Univ Amsterdam Fac Sci NL-1081 HV Amsterdam Netherlands

ISBN: (纸本)9781424442379

the application and research area of Multimedia Content Analysis (MAICA) considers all aspects of the automated extraction of new knowledge from large multimedia data streams and archives. In recent years, there has been a tremendous growth in the MMCA application domain (for real-time and off-line execution scenarios alike), and this growth is likely to continue in the near future. Multimedia applications operated in a real-time environment pose very strict requirements on the obtained processing times, while off-line applications have to perform within 'tolerable' time frames. To meet these requirements, large-scale multimedia applications typically are being executed on Grid systems consisting of large collections of compute clusters. For optimized use of resources, it is essential to determine the optimal number of compute nodes per cluster, properly dealing with the perceived computation versus communication ratio. this ratio generally depends on the characteristics of the application at hand, and on the software and hardware specifics of the computational environment. Motivated by these observations, in this paper we develop a simple and easy-to-implement method to determine the "optimal" number of parallel compute nodes. the method is based on the classical binary search method for non-linear optimization, and does not depend on the, usually unknown, specifics of the system. Extensive experimental validation on a real distributed system shows that our method is indeed highly effective.

关键词： Resource management Streaming media Data mining Multimedia systems Grid computing Application software Hardware Concurrent computing Search methods Optimization methods

来源：评论

学校读者我要写书评

暂无评论

Integrating Security Solutions to Support nanoCMOS Electronics Research

Integrating Security Solutions to Support nanoCMOS Electroni...

引用

ieee international symposium on parallel and distributed processing with applications

作者： Sinnott, R. Bayliss, C. Doherty, T. Martin, D. Millar, C. Stewart, G. Watt, J. Asenov, A. Roy, G. Roy, S. Davenhall, C. Harbulot, B. Jones, M. Univ Glasgow Natl E Sci Ctr Glasgow G12 8QQ Lanark Scotland Univ Glasgow Dept Elect & Elect Engn Glasgow G12 8QQ Lanark Scotland Univ Edinburgh Natl E Sci Edinburg TX USA Univ Manchester North West E Sci Manchester M13 9PL Lancs England

ISBN: (纸本)9780769534718

the UK Engineering and Physical Sciences Research Council (EPSRC) funded project "Meeting the Design Challenges of nanoCMOS Electronics" (nanoCMOS) is developing a research infrastructure for collaborative electronics research across multiple institutions in the UK with especially strong industrial and commercial involvement. Unlike other domains, the electronics industry is driven by the necessity of protecting the intellectual property of the data, designs and software associated with next generation electronics devices and therefore requires fine-grained security. Similarly, the project also demands seamless access to large scale high performance compute resources for atomic scale device simulations and the capability to manage the hundreds of thousands of files and the metadata associated with these simulations. Within this context, the project has explored a wide range of authentication and authorization irfrastructures facilitating compute resource access and providing fine-grained security over numerous distributed file stores and files. We conclude that no single security solution meets the needs of the project. this paper describes the experiences of applying X.509-based certificates and public key infrastructures, VOMS, PERMIS, Kerberos and the Internet2 Shibboleth technologies for nanoCMOS security. We outline how we are integrating these solutions to provide a complete end-to-end security framework meeting the demands of the nanoCMOS electronics domain.

关键词： Electronics data design Physical science Security Intellectual Property electronics industry Metadata

来源：评论

学校读者我要写书评

暂无评论

Flexible parameterization of XOR based codes for distributed storage

Flexible parameterization of XOR based codes for distributed...

引用

7th ieee international symposium on Networking Computing and applications, NCA 2008

作者： Sobe, Peter Peter, Kathrin University of Luebeck Institute of Computer Engineering Germany Zuse Institute Berlin Computer Science Research Germany

ISBN: (纸本)9780769531922

distributed storage systems apply erasure-tolerant codes to guarantee reliable access to data despite failures of storage resources. While many codes can be mapped to XOR operations and efficiently implemented on common microprocessors, only a certain number of codes are usually implemented in a certain system (out of a wide variety of different codes). the ability to include new codes easily, to exchange codes and finally to select codes for several types of data is desirable. To provide this flexibility, a parameterization is used which allows the definition of different XOR based codes, and beyond different styles of en- and decoding. the parameters include (i) the assignment of data and redundancy elements to the storage resources and (ii) a description of en- and decoding algorithms with XOR based equations. the parameters of a certain code can be changed and in addition a wide variety of codes can be described and included in a storage system implementation. the proposed parameterization adopts the ability of codes like EVEN-ODD, Cauchy-R/S and HoVer codes to map to distributed resources. Furthermore, en- and decoding algorithms can be described differently, either for minimal coding cost or for minimal coding time on parallel systems. © 2008 ieee.

关键词： Parameterization

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 13th international Workshop on High-Level Programming Models and Supportive Environments

Proceedings of the 13th International Workshop on High-Level...

引用

international symposium on parallel and distributed processing (IPDPS)

Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. this may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 13th international Workshop on High-Level Programming Models and Supportive Environments

IPDPS Miami 2008 - Proceedings of the 22nd IEEE Internationa...

引用

IPDPS Miami 2008 - Proceedings of the 22nd ieee international parallel and distributed processing symposium, Program and CD-ROM 2008年

作者： Schulz, Martin Midkiff, Sam Lawrence Livermore National Laboratory Purdue University

No abstract available

ISBN: (纸本)9781424416943

No abstract available

关键词：

来源：评论

学校读者我要写书评

暂无评论

Autonomic share allocation and bounded prediction of response times in parallel job scheduling for grids

Autonomic share allocation and bounded prediction of respons...

引用

7th ieee international symposium on Networking Computing and applications, NCA 2008

作者： Sodan, Angela University of Windsor Computer Science

ISBN: (纸本)9780769531922

Grid schedulers which need to decide on which sites the jobs are best allocated require controlled and predictable service. Fair-share scheduling has become widely used but lacks a formal model and depends on the current machine load. Existing approaches for response-time prediction still show significant prediction errors, mostly due to problems in dynamic arrival of jobs with potentially higher priority and hard-to-anticipate packing and backfilling effects. thus, we propose a different job scheduler (Scojo-PECT) which provides a more suitable framework for predictability and service guarantees by employing preemption with coarse-grain time sharing. We formalize the approach via a queuing model to determine the resource shares necessary to meet target service levels. As further extension, Scojo-PECT can adapt resource shares within certain limits to variations in machine load, while maintaining predictability and service guarantees. We demonstrate the feasibility of service control, the tightness of the 95% prediction intervals (0-30% from average), and the high predictability obtained. © 2008 ieee.

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：