the scalability of parallelarchitectures, data sets and algorithms is often hindered by poor parallel input/output performance. However, recent studies show that for input/output of large arrays, the storage of array...
详细信息
the scalability of parallelarchitectures, data sets and algorithms is often hindered by poor parallel input/output performance. However, recent studies show that for input/output of large arrays, the storage of arrays by subarray divisions, also known as chunking, is capable of improving input/output performance. A new method for increasing the performance advantages of chunking by combining it with data compression is presented. the method is tested on the Intel iPSC/860.
this paper describes a data-parallel algorithm for boolean function manipulation. the algorithm adopts Binary Decision Diagrams (BDDs), which are the state-of-the-art approach for representing and handling boolean fun...
详细信息
this paper describes a data-parallel algorithm for boolean function manipulation. the algorithm adopts Binary Decision Diagrams (BDDs), which are the state-of-the-art approach for representing and handling boolean functions. the algorithm is well suited for SIMD architectures and is based on distributing BDD nodes to the available processing Elements and traversing BDDs in a breadth-first manner. An improved version of the same algorithm is also presented, which does not use virtual processors. A prototypical package has been implemented and its behavior has been studied with two different applications. In both cases the results show that the approach exploits well the parallel hardware by effectively distributing the load;thanks to the limited CPU time required and to the great amount of memory available, it can solve problems that can not be faced with by conventional architectures.
the paper focuses on the problem of the multi-spectral image segmentation, which leads - through the data fusion of several mono-spectral images - to reliable and robust vision systems for military or industrial purpo...
详细信息
the paper focuses on the problem of the multi-spectral image segmentation, which leads - through the data fusion of several mono-spectral images - to reliable and robust vision systems for military or industrial purposes. the proposed approach does not fit the classical taxinomy of image data fusion methods: indeed, data fusion is performed during the segmentation, in parallel, of different images. the presented algorithm has been implemented on the Connection Machine CM5 withthe data programming style.
this paper considers the matrix decomposition A = LDLT, as a vehicle to explore the improvement in performance obtainable through the execution of multiple streams of control on SIMD architectures. Several methods for...
详细信息
this paper considers the matrix decomposition A = LDLT, as a vehicle to explore the improvement in performance obtainable through the execution of multiple streams of control on SIMD architectures. Several methods for partitioning the SIMD array are considered. Architectural support for and feasibility of using control parallelism in SIMD algorithms is briefly considered. Techniques for converting the extracted control parallelism into increased performance are illustrated via their application to the example algorithm. Analytical expressions for execution times are expressed in terms of execution times of the constituent operations. Experimental results for the various partitioning schemes based on execution traces are also presented. Timings based on Mas-Par MP-2 operations and extrapolated from experimental data are used to compare the various control parallel versions of the algorithm and the traditional SIMD counterpart.
In this paper several design methods for parallel genetic algorithms (PGAs) optimizing morphological filters are discussed and an optimal scale and machine independent PGA for a loosely coupled, homogeneous or inhomog...
详细信息
In this paper several design methods for parallel genetic algorithms (PGAs) optimizing morphological filters are discussed and an optimal scale and machine independent PGA for a loosely coupled, homogeneous or inhomogeneous multiprocessor computer is developed. the optimization is made in terms of computation speed, parallelization efficiency and quality. Quality means, in this context, the fitness of the best chromosome, which is an objective measure for the performance of the corresponding morphological filter in a particular environment.
ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. ProcSimity's multicomputer simulator supports experimentation with selected allocation and s...
详细信息
ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. ProcSimity's multicomputer simulator supports experimentation with selected allocation and scheduling algorithms on architectures with a range of network topologies and for several current routing and flow control mechanisms. Message-passing can be simulated in detail at the flit level or at a higher level of modeling. Our tool supports both stochastic job streams as well as communication patterns from actual parallel applications, including several of the NAS parallel benchmarks. ProcSimity's visualization and performance analysis tool allows the user to view a dynamic animation of the selected algorithms as well as a variety of system and job level performance metrics. ProcSimity has been successfully used in experiments investigating the feasibility of non-contiguous processor allocation in meshes and k-ary n-cubes.
A limited-area numeric weather prediction model specifically targeted for parallel computers has been successfully implemented on an IBM SP2 distributed-memory parallel computer. the model employs an explicit finite-d...
详细信息
A limited-area numeric weather prediction model specifically targeted for parallel computers has been successfully implemented on an IBM SP2 distributed-memory parallel computer. the model employs an explicit finite-difference scheme and was parallelised using a simple domain decomposition technique. On a twelve processor SP2, a 24 hour forecast using archived operational data and including a sophisticated representation of physical processes was run at a range of resolutions between 150 km and 19 km and near-linear speedups were achieved. Major weather centres have indicated a requirement for regional prediction models to be run at resolutions of approximately 5 km by the end of the decade. Based on this work, it appears that this target can be achieved through the use of scalable parallel computers.
the prototype system OOXSDAR VISIS was implemented in VisualWorks/Smalltalk and Distributed Smalltalk respectively. To achieve distribution in a heterogeneous network a common object request broker architecture (CORBA...
详细信息
the prototype system OOXSDAR VISIS was implemented in VisualWorks/Smalltalk and Distributed Smalltalk respectively. To achieve distribution in a heterogeneous network a common object request broker architecture (CORBA)-based architecture was chosen. the architecture consists of three layers: knowledge client level, knowledge domain agent server level, and persistent knowledge storage level. the architecture is based on the semantic/presentation split of logical knowledge objects. this architecture combines the advantages of standardized communication protocols such as CORBA withthe power and expressivity of OODBMS. First studies withthe prototype system OOXSDAR VISIS were carried out for performance analysis. the results allowed significant improvement of the distributed inference process. Well known principles such as increasing intra-modul cohesion and minimizing inter-modul dependencies were applied for restructuring the distributed knowledge bases.
Accurate and rapid evaluation of radar signature for alternative aircraft/store configurations would be of substantial benefit in the evolution of integrated designs that meet radar cross section requirements across t...
详细信息
Accurate and rapid evaluation of radar signature for alternative aircraft/store configurations would be of substantial benefit in the evolution of integrated designs that meet radar cross section requirements across the threat spectrum. Finite-volume time domain methods offer the possibility of modeling the whole aircraft, including penetrable regions and stores, at longer wavelengths on today's supercomputers and at typical airborne radar wavelengths on the massively parallel teraflop computers of tomorrow. To realize this potential, practical means are being developed for the rapid generation of grids on and around the aircraft, and numerical algorithmsthat maintain high order accuracy on such grids are being constructed. A structured grid and an unstructured grid-based finite-volume, time-domain Maxwell's equation solver has been developed incorporating modeling techniques for general radar absorbing materials. Using this work as a base, the goal of the computational electromagnetics effort is to define, implement, and evaluate rapid prototype signature prediction, addressing many issues related to 1) physics of electromagnetics, 2) efficient and higher-order accurate algorithms, 3) boundary condition procedures, 4) geometry and gridding (structured and unstructured), 5) computer architecture, and 6) validation.
the proceedings contains 170 papers. Topics discussed include image processing, image coding, labelling and classification, medical applications, motion, stereo and three dimensional, image analysis, image interpretat...
详细信息
the proceedings contains 170 papers. Topics discussed include image processing, image coding, labelling and classification, medical applications, motion, stereo and three dimensional, image analysis, image interpretation, image coding and communications, shape description and recognition, image processing applications, computer architectures, image segmentation, neural networks, industrial inspection, filtering and morphology, texture and color, transport, security and remote sensing.
暂无评论