Classification is an important problem in the field of data mining. Construction of good classifiers is computationally intensive and offers plenty of scope for parallelization. Divide-and-conquer paradigm can be used...
详细信息
Classification is an important problem in the field of data mining. Construction of good classifiers is computationally intensive and offers plenty of scope for parallelization. Divide-and-conquer paradigm can be used to efficiently construct decision tree classifiers. We discuss in detail various techniques for parallel divide-and-conquer and extend these techniques to handle efficiently disk-resident data. Furthermore, a generic technique for parallel out-of-core divide-and-conquer problems is suggested. We present pCLOUDS, the parallel version of the decision tree classifier algorithm CLOUDS, capable of handling large out-of-core data sets. pCLOUDS exhibits excellent speedup, sizeup and scaleup properties which make it a competitive tool for data mining applications. We evaluate the performance of pCLOUDS for a range of synthetic data sets on the IBM-SP2.
PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM functionality on top of a reduced set of...
详细信息
PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM functionality on top of a reduced set of parallel programming primitives. Within PM-PVM, PVM tasks are mapped onto threads and the message passing functions are implemented using shared memory. three implementation approaches of the PVM message passing functions have been adopted. In the first one, a single message copy in memory is shared by all destination tasks. the second one replicates the message for every destination task but requires less synchronization. Finally, the third approach uses a combination of features from the two previous ones. Experimental results comparing the performance of PM-PVM and PVM applications running on a 4-processor Sparcstation 20 under Solaris 2.5 show that PM-PVM can produce execution times up to 54% smaller than PVM.
the technical challenges for the limited area model Aladin are various: Aladin is the French operational mesoscale model, run on the Météo-France VPP-700E on 4 processors twice a day. Also, the model is used...
详细信息
A new systolic algorithm which computes image differences in run-length encoded (RLE) format is described. the binary image difference operation is commonly used in many image processingapplications including automat...
详细信息
A new systolic algorithm which computes image differences in run-length encoded (RLE) format is described. the binary image difference operation is commonly used in many image processingapplications including automated inspection systems, character recognition, fingerprint analysis, and motion detection. the efficiency of these operations can be improved significantly withthe availability of a fast systolic system that computes the image difference as described in this paper. It is shown that for images with a high similarity measure, the time complexity of the systolic algorithm is small and in some cases constant with respect to the image size. the time for the systolic algorithm is proportional to the difference between the number of runs in the two images, while the time for the sequential algorithm is proportional to the total number of runs in the two images together. A formal proof of correctness for the algorithm is also given.
Clusters of workstations have emerged as a popular platform for parallel and distributed computing. Commodity high speed networks which are used to connect workstation clusters provide high bandwidth, but also have hi...
详细信息
Clusters of workstations have emerged as a popular platform for parallel and distributed computing. Commodity high speed networks which are used to connect workstation clusters provide high bandwidth, but also have high latency. SCRAMNet is an extremely low latency replicated non-coherent shared memory network, so far used only for real-time applications. this paper reports our early experiences with using SCRAMNet for cluster computing. We have implemented a user-level zero-copy message passing protocol for SCRAMNet called the BillBoard Protocol (BBP). the one way latency for sending a 4-byte message between two nodes using the BBP is measured to be as low as 7.8 μs. Since SCRAMNet supports hardware level replication of messages, it is possible to implement multicast with almost the same latency as point-to-point communication. Using the BBP, the latency for broadcasting short messages to 4 nodes is measured to be 10.1 μs and the latency for a 4-node barrier is measured to be 37 μs. We have also built an MPI library on top of the BBP which makes use of multicast support from the BBP. Our results demonstrate the potential of SCRAMNet as a high performance interconnect for building scalable workstation clusters supporting message passing.
In this paper we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our l...
详细信息
In this paper we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data according to the future memory access patterns, thus eliminating memory miss penalty during the recovery process. We have performed experiments on workstation clusters, comparing our protocols against the earlier reduced-stable logging (RSL) protocol by actually implementing both protocols in TreadMarks, a state-of-the-art SDSM system. the experimental results show that our lazy logging protocol consistently outperforms the RSL protocol. Our protocol increases the execution time slightly by 1% to 4% during failure-free execution, while the RSL protocol results in the execution time overhead of 6% to 21% due to its larger log size and higher disk access frequency. Our PCR protocol also outperforms the widely used simple crash recovery protocol by 18% to 57% under all applications examined.
the main contribution of this work is to propose a number of broadcast efficient VLSI architectures for computing the sum and the prefix sums of a wk-bit, k ≥ 2, binary sequence using, as basic building blocks, linea...
详细信息
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wi...
详细信息
this paper explains how efficient support for semi-regular distributions can be incorporated in a uniform compilation framework for hybrid applications. the key focus of this work is in showing how, unlike other exist...
详细信息
this paper explains how efficient support for semi-regular distributions can be incorporated in a uniform compilation framework for hybrid applications. the key focus of this work is in showing how, unlike other existing schemes, our scheme is able to minimize preprocessing overheads and maintain sophisticated communication optimizations (such as reduction of inter-processor communication during schedule generation and sharing of communicated information between regular and irregular accesses) even in the presence of semi-regular distributions. It is only natural that preprocessing overheads associated with semi-regular distributions be intermediate between those involved for regular and irregular distributions. this paper shows how various properties can be inferred for semi-regular distributions. these allow the use of the interval representation which in turn reduces the preprocessing overhead and makes possible compatible code generation for hybrid references. Experimental results on a 16-processor IBM SP-2 for a number of sparse applications using semi-regular distributions show that our scheme is feasible.
We explore the creation of a metacomputer by the aggregation of independent sites. Joining a metacomputer is voluntary, and hence it has to be an endeavor that mutually benefits all parties involved. We identify propo...
详细信息
We explore the creation of a metacomputer by the aggregation of independent sites. Joining a metacomputer is voluntary, and hence it has to be an endeavor that mutually benefits all parties involved. We identify proportional-share allocation as a key component of such a mutual benefit. Proportional-share allocation is the basis for enforcing the agreement reached among the sites on how to use the metacomputer's resources. We introduce a resource manager that provides proportional-share allocation over a cluster of workstations, assuming applications to be master-slave. this manager is novel because it performs non-preemptive proportional scheduling of multiple processors. A prototype has been implemented and we report on preliminary results. Finally, we discuss how tickets (first-class entities that encapsulate allocation endowments) can be used in practice to enforce the metacomputer agreement, and also how they can ease the site selection to be performed by the application.
暂无评论