检索结果-内蒙古大学图书馆

Exploiting task-level concurrency in a programmable network interface

acm SIGPLAN NOTICES 2003年第10期38卷 61-72页

作者： Kim, HY Pai, VS Rixner, S Rice Univ Houston TX 77251 USA

Programmable network interfaces provide the potential to extend the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. this paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware for a programmable controller with two processors. By carefully partitioning the handler procedures that process various events related to the progress of a packet, the system can minimize sharing, achieve load balance, and efficiently utilize on-chip storage. Compared to the uniprocessor firmware released by the manufacturer, the parallelized network interface firmware increases throughput by 65% for bidirectional UDP traffic of maximum-sized packets, 157% for bidirectional UDP traffic of minimum-sized packets, and 32-107% for real network services. this parallelization results in performance within 10-20% of a modem ASIC-based network interface for real network services.

关键词： experimentation, performance programmable network interface parallel programming ethernet firmware

来源：评论

学校读者我要写书评

暂无评论

ARMI: An adaptive, platform independent communication library

引用

acm SIGPLAN NOTICES 2003年第10期38卷 229-240页

作者： Saunders, S Rauchwerger, L Texas A&M Univ Dept Comp Sci College Stn TX 77843 USA

ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. the library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. these details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer, We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.

关键词： languages RMI MPI pthreads OpenMP RPC run-time system communication library parallel programming

来源：评论

学校读者我要写书评

暂无评论

Using generative design patterns to generate parallel code for a distributed memory environment

引用

acm SIGPLAN NOTICES 2003年第10期38卷 202-214页

作者： Tan, K Szafron, D Schaeffer, J Anvik, J MacDonald, S Univ Alberta Dept Comp Sci Edmonton AB T6G 2E8 Canada Univ Waterloo Sch Comp Sci Waterloo ON N2L 3G1 Canada

A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization structures. Our tools, CO2P3S (Correct Object-Oriented Pattern-based parallel programming System) and MetaCO(2)P(3)S, use generative design patterns. A programmer selects the parallel design patterns that are appropriate for an application, and then adapts the patterns for that specific application by selecting from a small set of code-configuration options. CO2P3S then generates a custom framework for the application that includes all of the structural code necessary for the application to ran in parallel. the programmer is only required to write simple code that launches the application and to fill in some application-specific sequential hook routines. We use generative design patterns to take an application specification (parallel design patterns + sequential user code) and use it to generate parallel application code that achieves good performance in shared memory and distributed memory environments. Although our implementations are for Java, the approach we describe is tool and language independent. this paper describes generalizing CO2P3S to generate distributed-memory parallel solutions.

关键词： performance design reliability languages parallel programming design patterns frameworks programming tools

来源：评论

学校读者我要写书评

暂无评论

programming the FlexRAM parallel intelligent memory system

引用

acm SIGPLAN NOTICES 2003年第10期38卷 49-60页

作者： Fraguela, BB Renau, J Feautrier, P Padua, D Torrellas, J Univ A Coruna Dept Elect & Sistemas Coruna Spain Univ Illinois Dept Comp Sci Urbana IL 61801 USA Ecole Normale Super Lyon LIP F-69364 Lyon France

In an intelligent memory architecture, the main memory of a computer is enhanced with many simple processors. the result is a highly-parallel, heterogeneous machine that is able to exploit computation in the main memory. While several instantiations of this architecture have been proposed, the question of how to effectively program them with little effort has remained a major challenge. In this paper, we show how to effectively hand-program an intelligent memory architecture at a high level and with very modest effort. We use FlexRAM as a prototype architecture. To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex. Such directives enable the processors in memory to execute the program in cooperation with the main processor. In addition, we propose libraries of highly-optimized functions called Intelligent Memory Operations (IMOs). these functions program the processors in memory through CFlex, but make them completely transparent to the programmer. Simulation results show that, with CFlex and IMOs, a server with 64 simple processors in memory runs on average 10 times faster than a conventional server. Moreover, a set of conventional programs with 240 lines on average are transformed into CFlex parallel form with only 7 CFlex directives and 2 additional statements on average.

关键词： languages intelligent memory architecture compiler directives programming heterogeneous computers parallel languages

来源：评论

学校读者我要写书评

暂无评论

Compactly representing parallel program executions

引用

acm SIGPLAN NOTICES 2003年第10期38卷 190-201页

作者： Goel, A Roychoudhury, A Mitra, T Natl Univ Singapore Sch Comp Singapore 117543 Singapore

Collecting a program's execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution profiles are more detailed than count based execution profiles, since they present the order of execution of the various blocks in a program: modules, procedures, basic blocks etc. Recently, online string compression techniques have been employed for collecting compact representations of sequential program executions. In this paper, we show how a similar approach can be taken for shared memory parallel programs. Our compaction scheme yields one to two orders of magnitude compression compared to the uncompressed parallel program trace on some of the SPLASH benchmarks. Our compressed execution traces contain detailed information about synchronization and control/data flow which can be exploited for post-mortem analysis. In particular, information in our compact execution traces are useful for accurate data race detection (detecting unsynchronized shared variable accesses that occurred in the execution).

关键词： algorithms measurement path profiling program path compression dynamic program analysis

来源：评论

学校读者我要写书评

暂无评论

Exploiting high-level coherence information to optimize distributed shared state

引用

acm SIGPLAN NOTICES 2003年第10期38卷 131-142页

作者： Chen, DQ Tang, CQ Sanders, B Dwarkadas, S Scott, ML Univ Rochester Dept Comp Sci Rochester NY 14627 USA

InterWeave is a distributed middleware system that supports the sharing of strongly typed, pointer-rich data structures across a wide variety of hardware architectures, operating systems, and programming languages. As a complement to RPC/RMI, InterWeave facilitates the rapid development of maintainable code by allowing processes to access shared data using ordinary reads and writes. Internally, InterWeave employs a variety of aggressive optimizations to obtain significant performance improvements with minimal programmer effort. In this paper, we focus on application-specific optimizations that exploit dynamic high-level information about an application's spatial data access patterns and the stringency of its coherence requirements. Using applications drawn from computer vision, datamining, and web proxy caching, we illustrate the specification of coherence requirements based on the (temporal) concept of "recent enough" to use, and introduce two (spatial) notions of views, which allow a program to limit coherence management to the portion of a data structure actively in use. Experiments with these applications show that InterWeave can reduce their communication traffic by up to one order of magnitude with minimum effort on the part of the application programmer.

关键词： Computer systems programming

来源：评论

学校读者我要写书评

暂无评论

Impala: a middleware system for managing autonomic, parallel sensor systems

引用

acm SIGPLAN NOTICES 2003年第10期38卷 107-118页

作者： Liu, T Martonosi, M Princeton Univ Dept Comp Sci Princeton NJ 08544 USA Princeton Univ Dept Elect Engn Princeton NJ 08544 USA

Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms in response. Like traditional parallel systems, communication between nodes is of fundamental importance, but is typically accomplished via wireless transceivers. One further key attribute of sensor networks is that they are almost always long-running systems, intended to operate in situ, with minimal direct human intervention, for months or years. this requirement for long-running autonomy mandates careful design of the runtime system that manages applications on each node, to ensure reliability and ease of upgrades over the life of the system. this paper describes Impala, a middleware architecture that enables application modularity, adaptivity, and repair-ability in wireless sensor networks. Impala allows software updates to be received via the node's wireless transceiver and to be applied to the running system dynamically. In addition, Impala also provides an interface for on-the-fly application adaptation in order to improve the performance, energy-efficiency, and reliability of the software system. Impala has been designed to be a part of the ZebraNet mobile sensor network, but we are also prototyping it within HP/Compaq iPAQ Pocket PC handhelds. We present performance data for both real system measurements of the Pocket PC version as well as simulations of a full mobile sensor system deployment. Overall, Impala is a lightweight runtime system that can greatly improve system reliability, performance, and energy-efficiency. the ideas introduced here for sensor networks have applicability more broadly in other long-running autonomous parallel systems as well.

关键词： design experimentation performance sensor networks middleware system software adaptation software update

来源：评论

学校读者我要写书评

暂无评论

Bounding space usage of conservative garbage collectors 02

Bounding space usage of conservative garbage collectors

引用

POPL 2002: 29th acm SIGPLAN-SIGACT symposium on principles of programming Languages

作者： Boehm, Hans-J. Hewlett-Packard Laboratories 1501 Page Mill Rd. Palo Alto CA 94304 United States

ISBN: (纸本)9781581134506

Conservative garbage collectors can automatically reclaim unused memory in the absence of precise pointer location information. If a location can possibly contain a pointer, it is treated by the collector as though it contained a pointer. Although it is commonly assumed that this can lead to unbounded space use due to misidentified pointers, such extreme space use is rarely observed in practice, and then generally only if the number of misidentified pointers is itself unbounded. We show that if the program manipulates only data structures satisfying a simple GC-robustness criterion, then a bounded number of misidentified pointers can at most result in increasing space usage by a constant factor. We argue that nearly all common data structures are already GC-robust, and it is typically easy to identify and replace those that are not. thus it becomes feasible to prove space bounds on programs collected by mildly conservative garbage collectors, such as the one in [2]. the worst-case space overhead introduced by such mild conservatism is comparable to the worst-case fragmentation overhead for inherent in any non-moving storage allocator. the same GC-robustness criterion also ensures the absence of temporary space leaks of the kind discussed in [13] for generational garbage collectors.

关键词： programming theory

来源：评论

学校读者我要写书评

暂无评论

Proceeding of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP

Proceeding of the ACM SIGPLAN symposium on principles and pr...

引用

8th acm SIGPLAN symposium on principles and practice of parallel programming

the proceedings contains 14 papers from the conference on the Proceedings of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP. Topics discussed include: reference idempotency analysis: a framework for optimizing speculative execution;pointer and escape analysis for multithread programs;language support for motion-order matrices;efficient load balancing for wide-area divide-and-conquer applications;scalable queue-based spin locks with timeout;contention ellimination by replication of sequential sections in distributed shared memory programs;and accurate data redistribution cost estimation in software distributes shared memory systems.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel programming challenges for Internet-scale computing (Entropia) 01

Parallel programming challenges for Internet-scale computing...

引用

8th acm SIGPLAN symposium on principles and practice of parallel programming

作者： Chien, A.A. Entropia Inc. University of California San Diego CA United States

No abstract available.

ISBN: (纸本)9781581133462

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：