检索结果-内蒙古大学图书馆

A tracing protocol for optimizing data parallel irregular computations

4th international Euro-Par conference on parallel processing

作者： Brandes, T Germain, C Inst Algorithms & Sci Comp GMD SCAI D-53754 St Augustin Germany Univ Paris 06 Rech Informat Lab CRNS F-91405 Orsay France

ISBN: (纸本)3540649522

High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities for optimizing such codes on distributed memory architectures, especially for optimizing communication and reusing communication schedules between subroutine boundaries. this paper describes a dynamic approach for optimizing unstructured communication in codes with indirect addressing. the basic idea is that runtime data reflecting the communication patterns will be reused if possible. the user has only to specify which data in the program has to be traced for modifications. the experiments and results show the effectiveness of the chosen approach.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Achieving portability and efficiency through automatic optimisation: An investigation in parallel image processing

引用

4th international conference on parallel processing, Euro-Par 1998

作者： Crookes, D. Morrow, P.J. Brown, T.J. McAleese, S.G. Roantree, D. Spence, I.T.A. Department of Computer Science Queen's University of Belfast Belfast BT7 INN United Kingdom Department of Computing Science University of Ulster at Coleraine Coleraine BT52 7EQ United Kingdom

ISBN: (纸本)3540649522

this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was funded as part of the EPSRC Portable Software Tools for parallel architectures (PSTPA) programme. the paper summarises new portable programming abstractions for image processing, and outlines the automatically optimising implementation which achieves portability of application code and efficiency of implementation on a closely coupled distributed memory parallel system. the paper includes timings for optimised and unoptimised versions of typical image processing algorithms;it draws the main conclusion that it is possible to achieve portability with efficiency, for a specific application, by adopting a high level algebraic programming model, together with a transformation-based optimiser which reclaims the loss of efficiency which an algebraic approach traditionally entails.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Efficient realization of the M-D nonrecursive filters: From sequential implementation to mapping on systolic array processors

Efficient realization of the M-D nonrecursive filters: From ...

引用

5th IEEE international conference on Electronics, Circuits and Systems, ICECS 1998

作者： Burian, Adrian Rusu, Corneliu Kuosmanen, Pauli Signal Processing Laboratory Tampere University of Technology P.O. Box 553 TampereFIN-33101 Finland

ISBN: (纸本)0780350081

this paper presents algorithms and architectures for implementing from 1-D to multidimensional M-D digital nonrecursive filters. these architectures are very regular and support single chip implementation in VLSI, as well as multiple chip implementations. the proposed systolic arrays, used in implementation of these algorithms, are optimal with respect to time. In a systolic implementation the highest degree of parallel processing and thus performance is achieved. But with this implementation the highest number of gates and thus complexity is obtained. As a compromise, we propose and analyse simpler systolic system for realtime nonrecursive filtering. © 1998 IEEE.

关键词： Systolic arrays

来源：评论

学校读者我要写书评

暂无评论

An integrated partitioning and synthesis system for dynamically reconfigurable multi-FPGA architectures 12th

引用

10 IPPS/SPDP 98 Workshops Held in Conjunction with the 12th international parallel processing Symposium / 9th Symposium on parallel Distributed processing

作者： Ouaiss, I Govindarajan, S Srinivasan, V Kaul, M Vemuri, R Univ Cincinnati DDEL Dept ECECS Cincinnati OH 45221 USA

ISBN: (纸本)3540643591

this paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAs). the SPARCS system accepts design specifications at the behavior level, in the form of task graphs. the system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. this paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a multi-FPGA board using SPARCS.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

10 Workshops Held in Conjunction with the 12th international parallel processing Symposium and 9th Symposium on parallel and Distributed processing, IPPS/SPDP 1998

引用

10 Workshops held in conjunction with 12th international parallel Symposium and 9th Symposium on parallel and Distributed processing, IPPS/SPDP 1998

ISBN: (纸本)3540643591

the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel algorithm for minimum cost path computation on polymorphic processor array;a performance modeling and analysis environment for reconfigurable computers;an integrated partitioning and synthesis system for dynamically reconfigurabte multi-FPGA architectures;temporal partioning for partially-reconfigurable-field-programmable gate;a java development and runtime environment for reconfigurable computing;synthesizing reconfigurable sequential machines using tabular models;evaluation of a low-power reconfigurable DSP architecture;a reconfigurable hardware-monitor for communication analysis in distributed real-time systems;a mathematical benefit analysis of context switching reconfigurable computing;a configurable computing approach towards real-time target tracking;hardware reconfigurable neural networks;a simulator for the reconfigurable mesh architecture;processor architectures for circuit emulation;an empirical comparison of runtime systems for conservative parallel simulation;synchronizing operations on multiple objects;migration and rollback transparency for arbitrary distributed applications in workstation clusters;a topology based approach to coordinated multicast operations;a parallel evolutionary algorithm for the vehicle routing problem with heterogeneous fleet;artificial neural networks on reconfigurable meshes;a molecular quasi-random model of computations applied to evaluate collective intelligence;replicated shared object model for edge detection with spiral architecture and scheduling tasks of a parallel program in two-processor systems with use of cellular automata.

关键词：

来源：评论

学校读者我要写书评

暂无评论

10th international Workshop on Languages and Compilers for parallel Computing, LCPC 1997

引用

10th Annual international Workshop on Languages and Compilers for parallel Computing, LCPC 1997

ISBN: (纸本)3540644725

the proceedings contain 32 papers. the special focus in this conference is on Data Locality and Program Analysis. the topics include: Quantifying the multi-level nature of tiling interactions;reuse-driven tiling for data locality;table-lookup approach for compiling two-level data-processor mappings in HPF;code generation for complex subscripts in data-parallel programs;automatic data decomposition for message-passing machines;program analysis of overlap area usage in self-similar parallel programs;analysis and optimization of explicitly parallel programs using the parallel program graph representation;concurrent static single assignment form and constant propagation for explicitly parallel programs;program optimization for concurrent multithreaded architectures;interactive compilation and performance analysis with ursa minor;a new technology for run-time speculative parallelization of loops;lowering HPF procedure interface to a canonical representation;data parallel language extensions for exploiting locality in irregular problems;simplifying control flow in compiler-generated parallel code;reducing synchronization overhead for compiler-parallelized codes on software DSMs;an array data flow analysis based communication optimizer;a compiler abstraction for machine independent parallel communication generation;the aggregate function API;exploiting parallelism through directives on the nano-threads programming model and java as a language for scientific parallel programming.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ScoreGraph: dynamically activated connectivity among parallel processes for interactive computer music performance 24

ScoreGraph: dynamically activated connectivity among paralle...

引用

24th international Computer Music conference, ICMC 1998

作者： Choi, Insook Betts, Alex Bargar, Robin Human-Computer Intelligent Interaction Laboratory Beckman Institute University of Illinois at Urbana-Champaign 405 N Mathews UrbanaIL61801 United States Beckman Institute UIUC United States NCSA and Beckman Institute UIUC United States

the structural specification and modeling of time critical real-time systems has become a major area for recent research topics. this is particularly relevant for computer music when sound computation is realized involving multiple methods of synthesis algorithms, simulations, input devices, and display systems. Such sound computation requires a parallel processing for real-time computation 1) to execute its own algorithm, 2) to receive a state change instruction, and 3) to display the changes of its state. In our system the synthesis algorithms reside as open systems in a connectivity configured to support multi-modal performance. Performers generate performance events by interacting with simulations through various input devices, in turn the changes of states in simulations are reflected in changes of states in sound and graphic synthesis algorithms. We note the deliberate placement of indirection between performers and synthesis algorithms in order to enhance a performability. ScoreGraph incorporates recent advances in graph-based architectures to enable us to manage multiple tasks in parallel continuity with computational efficiency. Dynamical activation of nodes and edges are achieved through a structural definition of connectivity. Efficiency is managed by local activation of graph-organized processes, where the depth of a locality is redefined interactively over time. In this paper we present details of the implementation and case studies of interactive computer music and Virtual Reality compositions realized in ScoreGraph. © 1998 ICMC. All Rights Reserved.

关键词： Chemical activation

来源：评论

学校读者我要写书评

暂无评论

Effect of austenitic stainless steel composition on low-energy, high-flux, nitrogen ion beam processing

引用

SURFACE & COATINGS TECHNOLOGY 1998年第1期104卷 178-184页

作者： Williamson, DL Davis, JA Wilbur, PJ Colorado Sch Mines Dept Phys Golden CO 80401 USA Colorado State Univ Dept Mech Engn Ft Collins CO 80523 USA

A collection of 16 metals with the face-centered-cubic (fcc) crystal structure, including stainless steels, Fe-Ni alloys and pure Nil have been subjected to the same nitrogen ion beam processing conditions to examine the role of alloy composition in the surface modification behavior. A low-energy (700 eV), high-flux (2 mA cm(-2)) beam of ions was used with each sample held at 400 degrees C during a 15 min treatment. the near surface regions have been characterized by conventional and grazing-incidence X-ray diffraction, Auger electron spectroscopy, conversion electron Mossbauer spectroscopy, and microhardness measurements. there is a clear distinction in the modifications depending on whether the alloys are Fe-rich or Ni-rich. Fe-rich samples all yield relatively thick (2.5-3.5 mu m) layers with high N content in solid solution. the large lattice expansions lead to ferromagnetism in these surfaces. A novel double-layer structure has been induced in all the Fe-rich alloys, corresponding to two rather well-defined N contents: high (20-26 at%) in the surface layer, and medium (4-10 at%) in the subsurface layer. It is suggested that this substructure is caused by stress-assisted diffusion. the Ni-rich alloys have much thinner N-containing layers (less than or equal to 1 mu m) and a much lower amount of N retained in the (111) planes oriented parallel to the surface compared to those in the Fe-rich alloys. (C) 1998 Elsevier Science S.A.

关键词： diffusion nitrogen implantation stainless steel X-ray diffraction

来源：评论

学校读者我要写书评

暂无评论

10th international conference on Computer Performance Evaluation, Tools 1998

10th International Conference on Computer Performance Evalua...

引用

10th international conference on Modelling Techniques and Tools for Computer Performance Evaluation, Tools 1998

ISBN: (纸本)3540649492

the proceedings contain 35 papers. the special focus in this conference is on Software Performance Tools and Network Performance. the topics include: A modular and scalable simulation tool for large wireless networks;designing process replication and threading policies;software reliability estimation and prediction tool;reusable software components for performability tools and their utilization for web-based configurable tools;a performance evaluation tool for communication networks with multicast data streams;response times in client-server systems;a queueing model with varying service rate for ABR;simulative performance evaluation of the temporary pseudonym method for protecting location information in GSM networks;a model driven monitoring approach to support the multi-view performance analysis of parallel responsive applications;instrumentation of synchronous reactive systems for performance analysis;a perturbation and reduction based algorithm;a comparison of numerical splitting-based methods for Markovian dependability and performability models;probability, parallelism and the state space exploration problem;an improved multiple variable inversion algorithm for reliability calculation;performance evaluation of web proxy cache replacement policies;performance analysis of a WDM bus network based on GSPN models;scheduling write backs for weakly-connected mobile clients;on choosing a task assignment policy for a distributed server system;structured characterization of the Markov chain of phase-type SPN;performance evaluation of distributed object architectures;an execution driven interconnection network simulator for DSM systems and integrated measurement and analysis tool for internet and its use in wireless in-house environment.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Field programmable gate array design for an application specific signal processing algorithms

Field programmable gate array design for an application spec...

引用

IEEE international Caracas conference on Devices, Circuits and Systems

作者： W.A. Moreno K. Poladia Center for Microelectronics Research University of South Florida Tampa FL USA

Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presents a new FPGA architecture suitable for the application specific signal processing algorithms and Wafer-Scale integration (WSI) Technology. the architecture must be designed for versatility, flexibility, high speed, improved logic density, and defect tolerance. the proposed FPGA architecture consists of 2 dimensional array of programmable logic elements based on look-up table, interconnection resources, and input/output (I/O) blocks. the architectural style is similar to the one used in XILINX FPGA architecture. A key variation from the commonly used FPGA is the dual switching scheme employed in the proposed architecture. the design methodology, the design tools, and results obtained by using a Segmented Channel Routing algorithm to map on it a 16 bit parallel multiplier, are presented.

关键词： Field programmable gate arrays Programmable logic arrays Signal processing algorithms Design methodology Logic circuits Manufacturing Prototypes Costs Wafer scale integration Logic design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：