High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities ...
详细信息
ISBN:
(纸本)3540649522
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities for optimizing such codes on distributed memory architectures, especially for optimizing communication and reusing communication schedules between subroutine boundaries. this paper describes a dynamic approach for optimizing unstructured communication in codes with indirect addressing. the basic idea is that runtime data reflecting the communication patterns will be reused if possible. the user has only to specify which data in the program has to be traced for modifications. the experiments and results show the effectiveness of the chosen approach.
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was fu...
详细信息
ISBN:
(纸本)3540649522
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was funded as part of the EPSRC Portable Software Tools for parallelarchitectures (PSTPA) programme. the paper summarises new portable programming abstractions for image processing, and outlines the automatically optimising implementation which achieves portability of application code and efficiency of implementation on a closely coupled distributed memory parallel system. the paper includes timings for optimised and unoptimised versions of typical image processingalgorithms;it draws the main conclusion that it is possible to achieve portability with efficiency, for a specific application, by adopting a high level algebraic programming model, together with a transformation-based optimiser which reclaims the loss of efficiency which an algebraic approach traditionally entails.
this paper presents algorithms and architectures for implementing from 1-D to multidimensional M-D digital nonrecursive filters. these architectures are very regular and support single chip implementation in VLSI, as ...
详细信息
this paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable board...
详细信息
ISBN:
(纸本)3540643591
this paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAs). the SPARCS system accepts design specifications at the behavior level, in the form of task graphs. the system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. this paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a multi-FPGA board using SPARCS.
the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel al...
ISBN:
(纸本)3540643591
the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel algorithm for minimum cost path computation on polymorphic processor array;a performance modeling and analysis environment for reconfigurable computers;an integrated partitioning and synthesis system for dynamically reconfigurabte multi-FPGA architectures;temporal partioning for partially-reconfigurable-field-programmable gate;a java development and runtime environment for reconfigurable computing;synthesizing reconfigurable sequential machines using tabular models;evaluation of a low-power reconfigurable DSP architecture;a reconfigurable hardware-monitor for communication analysis in distributed real-time systems;a mathematical benefit analysis of context switching reconfigurable computing;a configurable computing approach towards real-time target tracking;hardware reconfigurable neural networks;a simulator for the reconfigurable mesh architecture;processor architectures for circuit emulation;an empirical comparison of runtime systems for conservative parallel simulation;synchronizing operations on multiple objects;migration and rollback transparency for arbitrary distributed applications in workstation clusters;a topology based approach to coordinated multicast operations;a parallel evolutionary algorithm for the vehicle routing problem with heterogeneous fleet;artificial neural networks on reconfigurable meshes;a molecular quasi-random model of computations applied to evaluate collective intelligence;replicated shared object model for edge detection with spiral architecture and scheduling tasks of a parallel program in two-processor systems with use of cellular automata.
the proceedings contain 32 papers. the special focus in this conference is on Data Locality and Program Analysis. the topics include: Quantifying the multi-level nature of tiling interactions;reuse-driven tiling for d...
ISBN:
(纸本)3540644725
the proceedings contain 32 papers. the special focus in this conference is on Data Locality and Program Analysis. the topics include: Quantifying the multi-level nature of tiling interactions;reuse-driven tiling for data locality;table-lookup approach for compiling two-level data-processor mappings in HPF;code generation for complex subscripts in data-parallel programs;automatic data decomposition for message-passing machines;program analysis of overlap area usage in self-similar parallel programs;analysis and optimization of explicitly parallel programs using the parallel program graph representation;concurrent static single assignment form and constant propagation for explicitly parallel programs;program optimization for concurrent multithreaded architectures;interactive compilation and performance analysis with ursa minor;a new technology for run-time speculative parallelization of loops;lowering HPF procedure interface to a canonical representation;data parallel language extensions for exploiting locality in irregular problems;simplifying control flow in compiler-generated parallel code;reducing synchronization overhead for compiler-parallelized codes on software DSMs;an array data flow analysis based communication optimizer;a compiler abstraction for machine independent parallel communication generation;the aggregate function API;exploiting parallelism through directives on the nano-threads programming model and java as a language for scientific parallel programming.
the structural specification and modeling of time critical real-time systems has become a major area for recent research topics. this is particularly relevant for computer music when sound computation is realized invo...
详细信息
A collection of 16 metals withthe face-centered-cubic (fcc) crystal structure, including stainless steels, Fe-Ni alloys and pure Nil have been subjected to the same nitrogen ion beam processing conditions to examine ...
详细信息
A collection of 16 metals withthe face-centered-cubic (fcc) crystal structure, including stainless steels, Fe-Ni alloys and pure Nil have been subjected to the same nitrogen ion beam processing conditions to examine the role of alloy composition in the surface modification behavior. A low-energy (700 eV), high-flux (2 mA cm(-2)) beam of ions was used with each sample held at 400 degrees C during a 15 min treatment. the near surface regions have been characterized by conventional and grazing-incidence X-ray diffraction, Auger electron spectroscopy, conversion electron Mossbauer spectroscopy, and microhardness measurements. there is a clear distinction in the modifications depending on whether the alloys are Fe-rich or Ni-rich. Fe-rich samples all yield relatively thick (2.5-3.5 mu m) layers with high N content in solid solution. the large lattice expansions lead to ferromagnetism in these surfaces. A novel double-layer structure has been induced in all the Fe-rich alloys, corresponding to two rather well-defined N contents: high (20-26 at%) in the surface layer, and medium (4-10 at%) in the subsurface layer. It is suggested that this substructure is caused by stress-assisted diffusion. the Ni-rich alloys have much thinner N-containing layers (less than or equal to 1 mu m) and a much lower amount of N retained in the (111) planes oriented parallel to the surface compared to those in the Fe-rich alloys. (C) 1998 Elsevier Science S.A.
the proceedings contain 35 papers. the special focus in this conference is on Software Performance Tools and Network Performance. the topics include: A modular and scalable simulation tool for large wireless networks;...
ISBN:
(纸本)3540649492
the proceedings contain 35 papers. the special focus in this conference is on Software Performance Tools and Network Performance. the topics include: A modular and scalable simulation tool for large wireless networks;designing process replication and threading policies;software reliability estimation and prediction tool;reusable software components for performability tools and their utilization for web-based configurable tools;a performance evaluation tool for communication networks with multicast data streams;response times in client-server systems;a queueing model with varying service rate for ABR;simulative performance evaluation of the temporary pseudonym method for protecting location information in GSM networks;a model driven monitoring approach to support the multi-view performance analysis of parallel responsive applications;instrumentation of synchronous reactive systems for performance analysis;a perturbation and reduction based algorithm;a comparison of numerical splitting-based methods for Markovian dependability and performability models;probability, parallelism and the state space exploration problem;an improved multiple variable inversion algorithm for reliability calculation;performance evaluation of web proxy cache replacement policies;performance analysis of a WDM bus network based on GSPN models;scheduling write backs for weakly-connected mobile clients;on choosing a task assignment policy for a distributed server system;structured characterization of the Markov chain of phase-type SPN;performance evaluation of distributed object architectures;an execution driven interconnection network simulator for DSM systems and integrated measurement and analysis tool for internet and its use in wireless in-house environment.
Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presen...
详细信息
Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presents a new FPGA architecture suitable for the application specific signal processingalgorithms and Wafer-Scale integration (WSI) Technology. the architecture must be designed for versatility, flexibility, high speed, improved logic density, and defect tolerance. the proposed FPGA architecture consists of 2 dimensional array of programmable logic elements based on look-up table, interconnection resources, and input/output (I/O) blocks. the architectural style is similar to the one used in XILINX FPGA architecture. A key variation from the commonly used FPGA is the dual switching scheme employed in the proposed architecture. the design methodology, the design tools, and results obtained by using a Segmented Channel Routing algorithm to map on it a 16 bit parallel multiplier, are presented.
暂无评论