检索结果-内蒙古大学图书馆

Space-time scheduling of instruction-level parallelism on a raw machine

Space-time scheduling of instruction-level parallelism on a ...

Proceedings of 1998 8th international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-8

作者： Lee, Walter Barua, Rajeev Frank, Matthew Srikrishna, Devabhaktuni Babb, Jonathan Sarkar, Vivek Amarasinghe, Saman M.I.T. Lab for Computer Science United States

Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor distributes all of its resources, including instruction streams, register files, memory ports, and ALUs, over a pipelined two-dimensional mesh interconnect, and exposes them fully to the compiler. Because communication in Raw machines is distributed, compiling for instruction-level parallelism (ILP) requires both spatial instruction partitioning as well as traditional temporal instruction scheduling. In addition, the compiler must explicitly manage all communication through the interconnect, including the global synchronization required at branch points. this paper describes RAWCC, the compiler we have developed for compiling general-purpose sequential programs to the distributed Raw architecture. We present performance results that demonstrate that although Raw machines provide no mechanisms for global communication the Raw compiler can schedule to achieve speedups that scale with the number of available functional units.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

ScoreGraph: dynamically activated connectivity among parallel processes for interactive computer music performance 24

ScoreGraph: dynamically activated connectivity among paralle...

引用

24th international Computer Music conference, ICMC 1998

作者： Choi, Insook Betts, Alex Bargar, Robin Human-Computer Intelligent Interaction Laboratory Beckman Institute University of Illinois at Urbana-Champaign 405 N Mathews UrbanaIL61801 United States Beckman Institute UIUC United States NCSA and Beckman Institute UIUC United States

the structural specification and modeling of time critical real-time systems has become a major area for recent research topics. this is particularly relevant for computer music when sound computation is realized involving multiple methods of synthesis algorithms, simulations, input devices, and display systems. Such sound computation requires a parallel processing for real-time computation 1) to execute its own algorithm, 2) to receive a state change instruction, and 3) to display the changes of its state. In our system the synthesis algorithms reside as open systems in a connectivity configured to support multi-modal performance. Performers generate performance events by interacting with simulations through various input devices, in turn the changes of states in simulations are reflected in changes of states in sound and graphic synthesis algorithms. We note the deliberate placement of indirection between performers and synthesis algorithms in order to enhance a performability. ScoreGraph incorporates recent advances in graph-based architectures to enable us to manage multiple tasks in parallel continuity with computational efficiency. Dynamical activation of nodes and edges are achieved through a structural definition of connectivity. Efficiency is managed by local activation of graph-organized processes, where the depth of a locality is redefined interactively over time. In this paper we present details of the implementation and case studies of interactive computer music and Virtual Reality compositions realized in ScoreGraph. © 1998 ICMC. All Rights Reserved.

关键词： Chemical activation

来源：评论

学校读者我要写书评

暂无评论

EDPEPPS: A toolset for the design and performance evaluation of parallel applications

引用

4th international Euro-Par conference on parallel processing

作者： Delaitre, T Zemerly, MJ Vekariya, P Justo, GR Bourgeois, J Schinkmann, F Spies, F Randoux, S Winter, SC Univ Westminster Cavendish Sch Comp Sci Ctr Parallel Comp London W1M 8JS England

ISBN: (纸本)3540649522

this paper describes a performance-oriented environment for the design of portable parallel software. the environment consists of a graphical design tool based on the PVM communication library for building parallel algorithms, a state-of-the-art simulation engine, a CPU characteriser and a visualisation tool for animation of program execution and visualisation of platform and network performance measures and statistics. the toolset is used to model a virtual machine composed of a cluster of workstations interconnected by a local area network. the simulation model used is modular and its components are interchangeable which allows easy re-configuration of the platform. Both communication and CPU models are validated.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Performance counters and state sharing annotations: A unified approach to thread locality

Performance counters and state sharing annotations: A unifie...

引用

Proceedings of 1998 8th international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-8

作者： Weissman, Boris Univ of California at Berkeley Berkeley United States

this paper describes a combined approach for improving thread locality that uses the hardware performance monitors of modern processors and program-centric code annotations to guide thread scheduling on SMPs. the approach relies on a shared state cache model to compute expected thread footprints in the cache on-line. the accuracy of the model has been analyzed by simulations involving a set of parallel applications. We demonstrate how the cache model can be used to implement several practical locality-based thread scheduling policies with little overhead. Active threads, a portable, high-performance thread system, has been built and used to investigate the performance impact of locality scheduling for several applications.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel sparse matrix computations using the PINEAPL library: A performance study

引用

4th international conference on parallel processing, Euro-Par 1998

作者： Krommer, Arnold R. Numerical Algorithms Group Ltd Wilkinson House Jordan Hill Road Oxford OX2 8DR United Kingdom

ISBN: (纸本)3540649522

the Numerical algorithms Group Ltd is currently participating in the European HPCN Fourth Framework project on parallel industrial Aum-Erical applications and Portable Libraries (PINEAPL). One of the main goals of the project is to increase the suitability of the existing NAG parallel Library for dealing with computationally intensive industrial applications by appropriately extending the range of library routines. Additionally, several industrial applications are being ported onto parallel computers within the PINEAPL project by replacing sequential code sections with calls to appropriate parallel library routines. A substantial part of the library material being developed is concerned with the solution of PDE problems using parallel sparse linear algebra modules. this talk provides a number of performance results which demonstrate the efficiency and scalability of core computational routines - in particular, the iterative solver, the preconditioner and the matrix-vector multiplication routines. Most of the software described in this talk has been incorporated into the recently launched Release 1 of the PINEAPL Library.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Data speculation support for a chip multiprocessor

Data speculation support for a chip multiprocessor

引用

Proceedings of 1998 8th international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-8

作者： Hammond, Lance Willey, Mark Olukotun, Kunle Stanford Univ Stanford CA United States

ISBN: (纸本)9781581131079

thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. this paper describes the complete implementation of the support for thread-level speculation on the Hydra chip multiprocessor (CMP). the support consists of a number of software speculation control handlers and modifications to the shared secondary cache memory system of the CMP. this support is evaluated using five representative integer applications. Our results show that the speculative support is only able to improve performance when there is a substantial amount of medium-grained loop-level parallelism in the application. When the granularity of parallelism is too small or there is little inherent parallelism in the application, the overhead of the software handlers overwhelms any potential performance benefits from speculative-thread parallelism. Overall, thread-level speculation still appears to be a promising approach for expanding the class of applications that can be automatically parallelized, but more hardware intensive implementations for managing speculation control are required to achieve performance improvements on a wide class of integer applications.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Accelerating multi-media processing by implementing memoing in multiplication and division units

Accelerating multi-media processing by implementing memoing ...

引用

Proceedings of 1998 8th international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-8

作者： Citron, Daniel Feitelson, Dror Rudolph, Larry Hebrew Univ of Jerusalem Jerusalem Israel

ISBN: (纸本)9781581131079

this paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root ...) computations in a single cycle. the technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. this technique is especially suitable for Multi-Media (MM) processing. In MM applications the local entropy of the data tends to be low which results in repeated operations on the same datum. the inputs and outputs of assembly level operations are stored in cache-like lookup tables and accessed in parallel to the conventional computation. A successful lookup gives the result of a multi-cycle computation in a single cycle, and a failed lookup doesn't necessitate a penalty in computation time. Results of simulations have shown that on the average, for a modestly sized memo-table, about 40% of the floating point multiplications and 50% of the floating point divisions, in Multi-Media applications, can be avoided by using the values within the memo-table, leading to an average computational speedup of more than 20%.

关键词： Multimedia systems

来源：评论

学校读者我要写书评

暂无评论

parallel calibrated emulation as a technique for evaluating parallel architectures

引用

COMPUTER SYSTEMS SCIENCE AND ENGINEERING 1998年第1期13卷 17-25页

作者： Muller, HL Raina, S Stallard, PWA Warren, DH Univ Bristol Dept Comp Sci Bristol Avon England

We describe the use of a calibrated emulator to simulate a parallel computer architecture. the emulator has a virtual clock, but unlike the virtual clock of a simulator, the emulator clock is bound to a fixed fraction of real time. Individual processors time actions independently, thus without the need for a globally synchronised clock value. Each component of the emulator is calibrated (by slowing it down artificially) so that the balance of the speeds of all components reflects the balance of the system under consideration. Unlike an ordinary simulator, a calibrated emulator is inherently parallel. the technique has been applied in the form of a parallel transputer-based emulator developed to evaluate the DDM - a scalable virtual shared memory architecture. the emulator provides performance results of a hardware implementation of the DDM using a calibrated virtual clock. A large transputer platform is used to run experiments. A couple of hours are sufficient to emulate the execution of a realistic application on a large DDM.

关键词： emulation parallel simulation virtual time architecture evaluation virtual shared memory

来源：评论

学校读者我要写书评

暂无评论

Field programmable gate array design for an application specific signal processing algorithms

Field programmable gate array design for an application spec...

引用

IEEE international Caracas conference on Devices, Circuits and Systems

作者： W.A. Moreno K. Poladia Center for Microelectronics Research University of South Florida Tampa FL USA

Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presents a new FPGA architecture suitable for the application specific signal processing algorithms and Wafer-Scale integration (WSI) Technology. the architecture must be designed for versatility, flexibility, high speed, improved logic density, and defect tolerance. the proposed FPGA architecture consists of 2 dimensional array of programmable logic elements based on look-up table, interconnection resources, and input/output (I/O) blocks. the architectural style is similar to the one used in XILINX FPGA architecture. A key variation from the commonly used FPGA is the dual switching scheme employed in the proposed architecture. the design methodology, the design tools, and results obtained by using a Segmented Channel Routing algorithm to map on it a 16 bit parallel multiplier, are presented.

关键词： Field programmable gate arrays Programmable logic arrays Signal processing algorithms Design methodology Logic circuits Manufacturing Prototypes Costs Wafer scale integration Logic design

来源：评论

学校读者我要写书评

暂无评论

A parallel DSP architecture for object-based video signal processing

A parallel DSP architecture for object-based video signal pr...

引用

conference on Multimedia Hardware architectures 1997

作者： Hilgenstock, J Herrmann, K Pirsch, P University of Hannover Laboratorium für Informationstechnologie Schneiderberg 32 Hannover 30167 Germany

ISBN: (纸本)0819424323

the DSP architecture PRISMA for object-based video signal processing is presented in this paper. Considering the specific hardware requirements of object-based algorithms a parallel architecture has been developed, which consists of 8 programmable data paths. To utilize the processing pourer provided by these data paths, a new controlling scheme is employed by the PRISMA processor this Dynamic Associative Controlling distributes 3 independent instruction streams to the 8 data paths and comprises the advantages of alternative controlling approaches, like SIMD and MlMD. It allows an efficient excecution of data-dependent operations as well as a flexible partitioning of the processing resources at runtime, which is advantageous for parallel processing of concurrent objects with different performance requirements.

关键词： object-based processing video signal processing MPEG-4 DSP parallel processing associative controlling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：