Active networking techniques embed computational capabilities into conventional networks thereby massively increasing the complexity and customization of the computations that can be performed with a network. In depth...
详细信息
ISBN:
(纸本)0889864934
Active networking techniques embed computational capabilities into conventional networks thereby massively increasing the complexity and customization of the computations that can be performed with a network. In depth studies of these large and complex networks that are still in their nascent stages cannot be effectively performed using analytical methods. Hence, discrete event simulation techniques are the only viable means to study and analyze active networking architectures. Furthermore, customized and flexible tools are required for the analysis of active networks using simulation. This paper describes an integrated environment for the modeling and parallel simulation of active networks called Active Networks Simulation Environment (or ANSE). ANSE utilizes the Time Warp synchronized kernel of WARPED (a general purpose discrete event simulation kernel) to enable parallel simulation of active network models. ANSE also includes complete support for the modeling and simulation of active networks based on PLAN (Packet Language for Active Networks). This paper presents the issues involved in the design and development of ANSE. The Application programming Interface (API) of ANSE is presented along with the issues involved in utilizing it to develop support for PLAN based active networks. The paper also presents some results obtained from several experiments conducted to evaluate the effectiveness of ANSE. Our studies indicate that ANSE provides an effective environment for modeling and simulation of large scale active networks.
Optimum visual and hearing qualities at high compression ratios as well as reduced area/power dissipation are key factors for actual and future commercial mobile multimedia devices. In this sense, a real time Smart Pi...
详细信息
ISBN:
(纸本)0819458325
Optimum visual and hearing qualities at high compression ratios as well as reduced area/power dissipation are key factors for actual and future commercial mobile multimedia devices. In this sense, a real time Smart Pixels Array designed to perform efficiently key video coding operations is presented in this paper. In particular, the array introduced is capable to perform the Discrete Wavelet Transform (DWT), Zerotree Entropy (ZTE) Coding and Frame Differencing (FD) over SQCIF images (128x96 pixels) by dividing them into wavelet blocks (84 pixels). In order to perform these tasks, the array has been designed as a bidimensional network of interconnected smart pixel processors working in a massivelyparallel fashion, allowing the operation at very low clock frequencies and hence, low power dissipation. Each of these smart pixels is composed by a photodetector, an analog-digital converter in order to obtain a digital representation of the light intensity received by the photodetector and a Ferroelectric Liquid Crystal placed over the whole surface of the pixel to display the image. Additionally, each pixel has a dedicated circuitry associated which performs all the specific computations related with the three video coding operations previously mentioned, exhibiting a power dissipation of 4.15 mu W@128 kHz and a square area of 110x110 mu m(2) using a 0.25 mu m CMOS technology. The array has been integrated into a mobile multimedia device prototype, fully designed at our research centre, capable to send and receive compressed audio and video information with a total power consumption of 1.36 W in an area of 351.5 mm2.
This paper emphasizes on load balancing issues associated with hybrid programmingmodels for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid parallelprogrammingmodels usually suffer fr...
详细信息
This paper emphasizes on load balancing issues associated with hybrid programmingmodels for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid parallelprogrammingmodels usually suffer from intrinsic load imbalance between threads, mainly because most existing message passing libraries generally provide limited multi-threading support, allowing only the master thread to perform internode message passing communication. In order to mitigate this effect, the authors proposed a generic method for the application of static load balancing on the coarse-grain hybrid model for the appropriate distribution of the computational load to the working threads. The efficiency of the proposed scheme was experimentally evaluated against a micro-kernel benchmark, and demonstrated the potential of such load balancing schemes for the extraction of maximum performance out of hybrid parallel programs.
There are a large variety of Grid test-beds that can be used for experimental purposes by a small community. However, the number of production Grid systems that can be used as a service for a large community is very l...
详细信息
ISBN:
(纸本)3540231633
There are a large variety of Grid test-beds that can be used for experimental purposes by a small community. However, the number of production Grid systems that can be used as a service for a large community is very limited. The current tutorial provides introduction to three of these very few production Grid systems. They represent different models and policies of using Grid resources and hence understanding and comparing them is an extremely useful exercise to everyone interested in Grid technology The Hungarian ClusterGrid infrastructure connects clusters during the nights and weekends. These clusters are used during the day for educational purposes at the Hungarian universities and polytechnics. Therefore a unique feature of this Grid the switching mechanism by which the day time and night time working modes are loaded to the computers. In order to manage the system as a production one the system is homogeneous, all the machines should install the same Grid software package. The second even larger production Grid system is the LHC-Grid that was developed by CERN to support the Large Hydron Collider experiments. This Grid is also homogeneous but it works as a 24-hour service. All the computers in the Grid are completely devoted to offer Grid services. The LHC-Grid is mainly used by physists but in the EGEE project other applications like bio-medical applications will be ported and supported on this Grid. The third production Grid is the NorduGrid which is completely heterogeneous and the resources can join and leave the Grid at any time as they need. The NorduGrid was developed to serve the Nordic countries of Europe but now more and more institutions from other countries join this Grid due to its large flexibility. Concerning the user view an important question is how to handle this large variety of production Grids and other Grid test-beds. How to develop applications for such different Grid systems and how to port applications among them? A possible answer for
A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programmingmodels. The system is operational...
详细信息
A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programmingmodels. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).
This paper considers program modules, e.g. procedures, functions, and methods cis the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exist i...
详细信息
ISBN:
(纸本)0769513638
This paper considers program modules, e.g. procedures, functions, and methods cis the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exist in a set of C and Java programs on a set of chip-multiprocessor architecture models, and identify what inherent program features, as well as architectural deficiencies, that limit the speedup. Our data complement previous limit studies by indicating that the programming style - object-oriented versus imperative - does not seem to have any noticeable impact on the achievable speedup. Further we show that as few as eight processors are enough to exploit all of the inherent parallelism. However, memory-level data dependence resolution and thread management mechanisms of recent CMP proposals may impose overheads that severely limit the speedup obtained.
Blue Gene is a massivelyparallel system being developed at the IBM T. J. Watson Research Center. With its 4 million-way parallelism and 1 Petaflop peak performance, Blue Gene is a unique environment for research in p...
详细信息
The proceedings contain 21 papers. The special focus in this conference is on Architecture of Scientific Software. The topics include: Network-based scientific computing;future generations of problem-solving environme...
ISBN:
(纸本)9781475767193
The proceedings contain 21 papers. The special focus in this conference is on Architecture of Scientific Software. The topics include: Network-based scientific computing;future generations of problem-solving environments;developing an architecture to support the implementation and development of scientific computing applications;lessons learned developing an interface between components;component technology for high-performance scientific simulation software;a new approach to software integration frameworks for multi-physics simulation codes;a collaborative code development environment for computational electro-magnetics;on the role of mathematical abstractions for scientific computing;object-oriented modeling of parallel PDE solvers;a software architecture for scientific computing;formal methods for high-performance linear algebra libraries;a comprehensive DFT API for scientific computing;using a Fortran interface to POSIX threads;data management systems for scientific applications;software components for application development;hierarchichal representation and computation of approximate solutions in scientific simulations;software architecture for the investigation of controllable models with complex data sets;a mixed-language programming methodology for high performance java computing and the architecture of scientific software.
The proceedings contain 140 papers. The special focus in this conference is on Invited Speakers and Architecture-Specific Automatic Performance Tuning. The topics include: Exploiting openMP to provide scalable SMP BLA...
ISBN:
(纸本)3540422323
The proceedings contain 140 papers. The special focus in this conference is on Invited Speakers and Architecture-Specific Automatic Performance Tuning. The topics include: Exploiting openMP to provide scalable SMP BLAS and LAPACK routines;scientific discovery through advanced computing;quantification of uncertainty for numerical simulations with confidence intervals;large-scale simulation and visualization in medicine;can parallelprogramming be made easy for scientists?;software support for high performance problem-solving on computational grids;lattice rules and randomized quasi-monte carlo;a massivelyparallel system;dynamic grid computing;robust geometric computation based on topological consistency;metacomputing with the harness and IceT systems;IT challenges and opportunities;a data broker for distributed computing environments;towards an accurate model for collective communications;a family of high-performance matrix multiplication algorithms;performance evaluation of heuristics for scheduling pipelined multiprocessor tasks;automatic performance tuning in the UHFFT library;a modal model of memory;fast automatic generation of dsp algorithms;cache-efficent multigrid algorithms;statistical models for automatic performance tuning;optimizing sparse matrix computations for register reuse in SPARSITY;rescheduling for locality in sparse matrix computations;the computational highway and backroads;conceptualizing a collaborative problem-solving environment for regional climate modeling and assessment of climate impacts;computational design and performance of the fast ocean atmosphere model, version 1;the model coupling toolkit;parallelization of a subgrid orographic precipitation scheme in an MM5-based regional climate model;resolution dependence in modeling extreme weather events;visualizing high-resolution climate data;improving java server performance with interruptlets;protocols and software for exploiting myrinet clusters and cluster configuration aided by simulat
Focuses on the design of a parallel processor targeted at the rapid execution of neural networks. The basic architecture of the toroidal neural processor (TNP) is based on a toroidal mesh. This architecture was inspir...
详细信息
ISBN:
(纸本)9539676932
Focuses on the design of a parallel processor targeted at the rapid execution of neural networks. The basic architecture of the toroidal neural processor (TNP) is based on a toroidal mesh. This architecture was inspired by the need for a low-cost massivelyparallel processing system that could emulate a large variety of neural models. The TNP consists of two basic elements: a control unit and some processing units. The control unit acts as distributor of information and instructions for the processing units. The processing units perform exact operations on the data, based on the execution of instructions. The design of the TNP has a typical SIMD architecture. The processor has an enhanced interface with the host computer. This interface provides not only operations for programming and control of the TNP, but, in addition, any type of neural network and learning algorithm can be implemented through this interface. In the design of the TNP are implemented 10 control unit instructions and 11 processing unit instructions. The architecture of the TNP is optimized for Xilinx Virtex devices. The design uses many features of this family of FPGA devices. The VHDL constructs are mapped into hardware in the synthesis, optimization, place-and-route and implementation process. The optimization can significantly change the hardware that is generated. The TNP was tested, simulated and implemented in a Xilinx Foundation Technology Express version 3.3i environment with the Virtex XCV300 FPGA array and the HW-AFX-BG352-100 prototyping platform. The whole design can be implemented in Virtex E and Spartan devices too.
暂无评论