We study parallel algorithms for the minimum spanning tree problem, based on the sequential algorithm of Boruvka. the target architectures for our algorithm are asynchronous, distributed-memory machines. Analysis of o...
详细信息
ISBN:
(纸本)0818672552
We study parallel algorithms for the minimum spanning tree problem, based on the sequential algorithm of Boruvka. the target architectures for our algorithm are asynchronous, distributed-memory machines. Analysis of our parallel algorithm, on a simple model that is reminiscent of the LogP model, shows that in principle a speedup proportional to the number of processors can be achieved, but that communication costs can be significant. To reduce these costs, we develop a new randomized linear work pointer jumping scheme that performs better than previous linear work algorithms. We also consider empirically the effects of data imbalance on the running time. For the graphs used in our experiments, load balancing schemes result in little improvement in running times. Our implementations on sparse graphs with 64,000 vertices on thinking Machine's CM-5 achieve a speedup factor of about 4 on 16 processors. On this environment, packaging of messages turns out to be the most effective way to reduce communication costs.
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken pla...
详细信息
ISBN:
(纸本)0818675829
Performance of I/O intensive applications on a multiprocessor system depends mostly on the variety of disk access delays encountered in the I/O system. Over the years, the improvement in disk performance has taken place slower than corresponding increase in processor speeds. It is therefore necessary to model I/O delays and evaluate performance benefits of moving an application to a better multiprocessor system. In this work, we perform such an analysis by measuring I/O delays for a synthesized application that uses paralleldistributed File System. the aim of this study was to evaluate the performance benefits of better disks in a multiprocessor system which was designed few years back. We report how the I/O performance would get affected if an application were to be run on a system which would have better disks and communication links. In this study, we show a substantial improvement in the performance of I/O system with better disks and communication links with respect to the existing system.
We present a new non-stationary signal classification algorithm based on a time-frequency distribution and multiple hypothesis testing. the time-frequency distribution is used to construct a time-dependent quadratic d...
详细信息
We present a new non-stationary signal classification algorithm based on a time-frequency distribution and multiple hypothesis testing. the time-frequency distribution is used to construct a time-dependent quadratic discriminant function. At selected points in time we evaluate the discriminant function and form a set of statistics which are used to test multiple hypotheses. We show that the statistics are a linear combination of chi square random variables with constant coefficients and hence are not normally distributed. the multiple hypotheses are treated simultaneously using the generalized sequentially rejective Bonferroni test to control the probability of incorrect classification of one class.
this paper presents a new HMM soft-output decoder for QAM signals transmitted through narrow-band flat-fading channels, in the case where the signal constellation has points which are clustered into groups. parallel H...
详细信息
this paper presents a new HMM soft-output decoder for QAM signals transmitted through narrow-band flat-fading channels, in the case where the signal constellation has points which are clustered into groups. parallel HMM filters are used for QAM detection, with one filter for each group of constellation points. this greatly reduces the number of computations required. A Kalman Filter (KF) is then applied for channel estimation. the resulting adaptive algorithms appear as coupled KF and HMM filters. As well as having reduced computations, this approach does not require so called `hard-decoding-decisions' to be made, which results in better robustness to noise.
the modest I/O configurations and file system limitations of many current high-performance systems preclude solution of problems with large I/O needs. I/O hardware and file system parallelism is the key to achieve hig...
详细信息
ISBN:
(纸本)0818675829
the modest I/O configurations and file system limitations of many current high-performance systems preclude solution of problems with large I/O needs. I/O hardware and file system parallelism is the key to achieve high performance. We analyze the I/O behavior of several versions of two scientific applications on the Intel Paragon XP/S. the versions involve incremental application code enhancements across multiple releases of the operating system. Studying the evolution of I/O access patterns underscores the interplay between application access patterns and file system features. Our results show that both small and large request sizes are common, that at present application developers must manually aggregate small requests to obtain high disk transfer rates, that concurrent file accesses are frequent, and that appropriate matching of the application access pattern and the file system access mode can significantly increase application I/O performance. Based on these results, we describe a set of file system design principles.
An object-oriented (OO) temporal database management system can better meet the data management requirements of many applications because it not only provides powerful facilities for modeling and processingthe struct...
详细信息
An object-oriented (OO) temporal database management system can better meet the data management requirements of many applications because it not only provides powerful facilities for modeling and processingthe structural and behavioral properties of complex objects but also supports the management of temporal data. However, due to the generality and high functionality of an OO system, the append-only nature of temporal databases and the irregular evolutions of object instances, processing efficiency is very difficult to achieve using serial algorithms and conventional computing systems. this paper presents two multi-wavefront parallel query processing strategies, five primitive time-alignment operations and their implementations on a parallel computer nCUBE2. the multi-wavefront algorithms allow asynchronous processing and propagation of temporal object instance identifiers and data in order to identify temporal object instances which satisfy a complex multi-class query. they offer both intraquery parallelism and interquery parallelism in query processing. the time-alignment operations allow data with different evolution time intervals to be aligned to determine the common intervals in which the combined factual information is valid. Some performance evaluation results are also reported.
this paper presents a new subspace based method for blind identification of a p-input/q-output system where q>p. this method exploits a minimum noise subspace (MNS) to retrieve the system impulse responses. the MNS...
详细信息
this paper presents a new subspace based method for blind identification of a p-input/q-output system where q>p. this method exploits a minimum noise subspace (MNS) to retrieve the system impulse responses. the MNS method requires only q-p noise vectors as opposed to all noise vectors needed by a standard subspace (SS) method. It is shown that the MNS can be obtained in a parallel structure from a set of tuples (combinations) of system outputs that form a properly connected sequence (PCS). the PCS exploits with minimum redundancy the diversity among system outputs. the MNS method is much more efficient in computation than the SS method although the former is less robust to noise than the latter.
this paper presents the first prototype of the PAPRICA massively parallel system which is integrated on the MOB-LAB experimental land vehicle for real-time vision-based road markings detection. Its main bottlenecks ar...
详细信息
this paper presents the first prototype of the PAPRICA massively parallel system which is integrated on the MOB-LAB experimental land vehicle for real-time vision-based road markings detection. Its main bottlenecks are highlighted and its evolution toward a linear array is discussed. this system has been enhanced with a simple but powerful interprocessor communication network for the exchange of information among processors not directly connected, which allows an extremely efficient implementation of the road markings detection algorithm as well as other morphological applications.
Technology trends make it attractive to use workstations connected by a local area network as a multicomputing platform for parallelapplications. Achieving acceptable application performance in such a workstation clu...
详细信息
Technology trends make it attractive to use workstations connected by a local area network as a multicomputing platform for parallelapplications. Achieving acceptable application performance in such a workstation cluster using commodity components requires good support from system software and network hardware. this paper describes our experience withparallel programming in a workstation cluster and the implications for operating systems and network adaptor design. Our cluster consists entirely of commercially available hardware: 24 Alpha workstations connected by a 155 Mbits/s AN2 ATM network. We present results from running several parallelapplications on this cluster. Our cluster demonstrates both excellent low-level communication performance and good overall application performance that compares favorably with dedicated multicomputers like the IBM SP2.
parallel input/output (I/O) workload characterization studies are necessary to better understand the factors that dominate performance. When translated into system design principles this knowledge can lead to higher p...
详细信息
parallel input/output (I/O) workload characterization studies are necessary to better understand the factors that dominate performance. When translated into system design principles this knowledge can lead to higher performance/cost systems. In this paper we present the experimental results of an I/O workload characterization study of NASA Earth and Space Sciences (ESS) applications. Measurements were collected using device driver instrumentation. Baseline measurements, with no workload, and measurements during regular application runs, were collected and then analyzed and correlated. It will be shown how the observed disk I/O can be identified as block transfers, page requests, and cache activity, and how the ESS applications are characterized by a high degree of spatial and temporal locality.
暂无评论