We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memo...
详细信息
We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memory and all processors are connected via some arbitrary interconnection network (e.g. mesh, hypercube, fat tree). We present O(T-sequential/p + T-s(n,p)) time scalable parallelalgorithms for several computational geometry problems. T-s(n,p) refers to the time of a global sort operation. Our results are independent of the multicomputer's interconnection network. their time complexities become optimal when T-sequential/p dominates T-s(n,p) or when T-s(n,p) is optimal. this is the case for several standard architectures, including meshes and hypercubes, and a wide range of ratios n/p that include many of the currently available machine configurations. Our methods also have some important practical advantages: For interprocessor communication, they use only a small fixed number of one global routing operation, global sort, and all other programming is in the sequential domain. Furthermore, our algorithms use only a small number of very large messages, which greatly reduces the overhead for the communication protocol between processors. (Note however, that our time complexities account for the lengths of messages.) Experiments show that our methods are easy to implement and give good timing results.
the proceedings contains 109 papers. Topics discussed include asynchronous transfer mode architectures, network protocols, trellis coded modulation, resource planning, packet data networks, location management in pers...
详细信息
the proceedings contains 109 papers. Topics discussed include asynchronous transfer mode architectures, network protocols, trellis coded modulation, resource planning, packet data networks, location management in personal communication services, radiofrequency in microcell systems, quality of service, wireless networks, phase shift keying, satellite communications, mobile telecommunication systems, internet, transmitter and receivers, channel coding and modulation, multipath propagation models, mobility management, digital voice transmission, antenna array processing.
this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture ha...
详细信息
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on ...
详细信息
ISBN:
(纸本)0818675829
Dedicated Cluster parallel Computers (DCPCs) are emerging as low-cost high performance environments for many important applications in science and engineering. A significant class of applications that perform well on a DCPC are coarse-grain applications that involve large amounts of file I/O. Current research in parallel file systems for distributed systems is providing a mechanism for adapting these applications to the DCPC environment. We present the parallel Virtual File System (PVFS), a system that provides disk striping across multiple nodes in a distributed parallel computer and file partitioning among tasks in a parallel program. PVFS is unique among similar systems in that it uses a streams-based approach that represents each file access with a single set of request parameters and decouples the number of network messages from details of the files striping and partitioning. PVFS also provides support for efficient collective file accesses and allows overlapping file partitions. We present results of early performance experiments that show PVFS achieves excellent speedups in accessing moderately sized file segments.
In this paper two domain decomposition formulations are presented in conjunction withthe preconditioned conjugate gradient method (PCG) for the solution of large-scale problems in solid and structural mechanics. In t...
详细信息
In this paper two domain decomposition formulations are presented in conjunction withthe preconditioned conjugate gradient method (PCG) for the solution of large-scale problems in solid and structural mechanics. In the first approach, the PCG method is applied to the global coefficient matrix, while in the second approach it is applied to the interface problem after eliminating the internal degrees of freedom. For both implementations, a subdomain-by-subdomain (SBS) polynomial preconditioner is employed, based on local information of each subdomain. the approximate inverse of the global coefficient matrix or the Schur complement matrix, which acts as the preconditioner, is expressed by a truncated Neumann series resulting in an additive type local preconditioner. Block type preconditioning, where full elimination is performed inside each block, is also studied and compared withthe proposed polynomial preconditioning. Copyright (C) Civil-Comp Limited and Elsevier Science Limited.
We describe a software environment for high performance distributed computing on a network of multiprocessor workstations. In designing this environment, we have used a problem-oriented approach as opposed to the trad...
详细信息
ISBN:
(纸本)0818675829
We describe a software environment for high performance distributed computing on a network of multiprocessor workstations. In designing this environment, we have used a problem-oriented approach as opposed to the traditional algorithm-oriented approach. this paradigm shift enables us to generate efficient programs automatically for a well-defined class of problems. thus, our system frees the users from the esoteric tasks of algorithm design and implementation. An important feature of our system is its ability to handle the large variation in granularity - we call this dual level parallelism - in a hybrid processing environment. this feature is the key to the superior efficiency delivered by the system. We give preliminary results from a case study in which our system is used to generate programs automatically for a scientific application, with a network of multiprocessors as the target platform.
this paper addresses the problem of developing an efficient training set parallel algorithm (TSPA) for the training procedure of a neural network based fingerprint image comparison (FIC) system. the target architectur...
详细信息
In this paper, an efficient, run-time, statistical scheme for estimating the execution time of a task is presented, in order to facilitate run-time matching and scheduling in a distributed heterogeneous computing envi...
详细信息
ISBN:
(纸本)0818675829
In this paper, an efficient, run-time, statistical scheme for estimating the execution time of a task is presented, in order to facilitate run-time matching and scheduling in a distributed heterogeneous computing environment. this scheme is based upon a nonparametric regression technique, where the execution time estimate for a task is computed from past observations. Furthermore, this technique is able to compensate for different parameters upon which the execution time depends, and does not require any knowledge of the architecture of the target machine. It is also able to make accurate predictions when erroneous data is present in the set of observations, and has been experimentally shown to produce estimates with very low error, even with few past values from which to calculate a new estimate.
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken pla...
详细信息
ISBN:
(纸本)0818675829
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken place slower than corresponding increase in processor speeds. It is therefore necessary to model I/O delays and evaluate performance benefits of moving an application to a better multiprocessor system. In this work, we perform such an analysis by measuring I/O delays for a synthesized application that uses parallel Distributed File System. the aim of this study was to evaluate the performance benefits of better disks in a multiprocessor system which was designed few years back. We report how the I/O performance would get affected if an application were to be run on a system which would have better disks and communication links. In this study, we show a substantial improvement in the performance of I/O system with better disks and communication links with respect to the existing system.
the proceedings contain 45 papers. the special focus in this conference is on Data Mining, Active Databases, Design Tools, Advanced DBMS and Optimization. the topics include: Generalizations and performance improvemen...
ISBN:
(纸本)354061057X
the proceedings contain 45 papers. the special focus in this conference is on Data Mining, Active Databases, Design Tools, Advanced DBMS and Optimization. the topics include: Generalizations and performance improvements;a fast scalable classifier for data mining;management of multiple models in an extensible database design tool;an assessment of non-standard DBMSS for case environments;the need for an object relational model and its use;data integration using self-maintainable views;a novel approach to high performance GIS processing;optimizing queries with aggregate views;translating OSQL queries into efficient set expressions;knowledge discovery from epidemiological databases;scalable update propagation in epidemic replicated databases;database support for efficiently maintaining derived data;optimal multi-block read schedules for partitioned signature files;amalgamating SGML documents and databases;indexing nucleotide databases for fast query evaluation;version management for scientific databases;first-order queries over temporal databases inexpressible in temporal logic;a formal temporal object-oriented data model;dynamic development and refinement of hypermedia documents;a hash partition strategy for distributed query processing;fine-granularity locking and client-based logging for distributed architectures;exploiting persistent intermediate code representations in open database environments;providing high availability in very large workflow management systems;a database benchmark for high-throughput workflow management;making relational and object-oriented database systems interoperable;object query services for telecommunication networks and dynamic declustering of TSB tree nodes for parallel access to temporal data.
暂无评论