Global Computing (GC) platforms such as BOINC [1] are nowadays considered as the most powerful distributed computing systems worldwide. Based on volunteer computing and various forms of incentives, such architecture a...
详细信息
Software development of high-performance graph algorithms is difficult on modern parallel computers. To simplify this task, we have designed and implemented a collection of C++ graph primitives, basic building blocks,...
详细信息
This paper describes the design and implementation of a solution to the constrained 2-D cutting stock problem on a cluster of workstations. The constrained 2-D cutting stock problem is an irregular problem with a dyna...
详细信息
ISBN:
(纸本)0818675829
This paper describes the design and implementation of a solution to the constrained 2-D cutting stock problem on a cluster of workstations. The constrained 2-D cutting stock problem is an irregular problem with a dynamically modified global data set and irregular amounts and patterns of communication. A replicated data structure is used for the parallel solution since the ratio of reads to writes is known to be large. Mutual exclusion and consistency are maintained using a token-based lazy consistency mechanism, and a randomized protocol for dynamically balancing the distributed work queue is employed. Speedups are reported for three benchmark problems executed on a cluster of workstations interconnected by a 10 Mbps Ethernet.
Partitioning data parallel computations across a network of heterogeneous workstations is a difficult problem for the user. We have developed a runtime partitioning method for choosing the number and type of processor...
详细信息
Partitioning data parallel computations across a network of heterogeneous workstations is a difficult problem for the user. We have developed a runtime partitioning method for choosing the number and type of processors to apply to a data parallel computation, and a decomposition of the data domain in order to achieve reduced completion time. The partitioning method utilizes information about the problem in the form of callback functions and uses a set of topology-specific communication functions to estimate communication costs. We show that the method makes effective partitioning decisions and has runtime overhead that is easily tolerated. In particular, we show that for two implementations of a canonical stencil application, minimum elapsed times are obtained for a range of problem sizes on a network of heterogeneous workstations.
Computer simulations of the propagation of ultrasonic pulses in multilayers require a specific physical model both for the material layers and for the interfaces. In the Local Interaction Simulation Approach (LISA) a ...
详细信息
Computer simulations of the propagation of ultrasonic pulses in multilayers require a specific physical model both for the material layers and for the interfaces. In the Local Interaction Simulation Approach (LISA) a `spring model' can be conveniently adopted for this purpose. In the spring model, the propagation medium is replaced by an analog set of tensorial springs. The springs within the layers are assumed to simulate the propagation inside the laminates, while the springs representing the interface (`internal springs') are assumed to predict the interface effect on the wave propagation due to its physical condition. The latter depends on a six component `contact quality tensor', Qij, for each discretization node along the interfaces. When all Q-components are equal to one, the bond at the corresponding node is considered `perfect'. A smaller or zero value for any component of Qij indicates and characterizes possible interface flaws, which is useful for NDE applications. Due to the flexible nature of the model, many other physical features affecting the wave propagation, such as attenuation, nonlinearity, hysteretic behavior and plasticity can be easily incorporated and the treatment extended to general 3-D heterogeneous and anisotropic media.
Vertex component analysis (VCA) has become a very popular and useful tool to linear unmix large hyperspectral datasets without the use of any a priori knowledge of the constituent spectra. Although VCA is fast method,...
详细信息
ISBN:
(纸本)9781467311595
Vertex component analysis (VCA) has become a very popular and useful tool to linear unmix large hyperspectral datasets without the use of any a priori knowledge of the constituent spectra. Although VCA is fast method, many hyperspectral imagery applications require a response in real time or near-real time. This paper proposes two different optimizations for accelerating the computational performance of VCA: the first one focus a parallel implementation based on graphics computing units (GPUs) to alleviate the VCA computational burden;The second one is focused on the development of a strategy to remove a large proportion of mixed pixels that play no effect on the VCA functioning. Experiments are conducted using simulated and real hyperspectral datasets. These results reveal considerable acceleration factors, which satisfies the real-time constraints given by the data acquisition rate.
The cost of interprocessor communication has a substantial impact on execution time when implementating parallel algorithms on physical parallel computers. We study these implementation costs, examining the number of ...
详细信息
The cost of interprocessor communication has a substantial impact on execution time when implementating parallel algorithms on physical parallel computers. We study these implementation costs, examining the number of inter-processor messages, the cost of routing these messages on various architectures, and the number of communication phases. We provide an improved direct routing algorithm for realizing h-relations on crossbar networks. We also introduce a round-robin message-delivery algorithm which reduces the number of times a communication link is established between a pair of processors (by delivering all messages of that phase for the pair in order without interruption.) We summarize criteria sufficient for a parallel algorithm to be implemented optimally on several common networks. We also describe a log n-phase optimal parallel list-ranking algorithm.
Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second...
详细信息
distributed computing involves systems that operate across networks transparently, using the resources of multiple machines. The Open Software Foundation's distributed Computing Environment (DCE) has evolved to ad...
详细信息
ISBN:
(纸本)0818677589
distributed computing involves systems that operate across networks transparently, using the resources of multiple machines. The Open Software Foundation's distributed Computing Environment (DCE) has evolved to address the need for a vendor-neutral platform to which distributedapplications can be developed, and upon which they can run. Central to the design philosophy of DCE is its reliance on the Remote Procedure Call (RPC) to facilitate communication among the entities in the distributed environment. Since it profoundly affects the performance of both the DCE environment and applications running on top of it, the performance of RPCs is very much a concern of both application developers and system managers in a DCE installation This short paper reports some results from an ongoing empirical investigation of the OS/2 DCE RPC facility. Our interest in this project is the effect on end-to-end RPC performance of protocol processing, flow control mechanisms within DCE, other load on the network, and interoperation with multiple DCE platforms.
Image coding is imperative for successful implementation of visual communications applications. With the emergence of enhanced multimedia technology, numerous video applications have emerged like multimedia conferenci...
详细信息
Image coding is imperative for successful implementation of visual communications applications. With the emergence of enhanced multimedia technology, numerous video applications have emerged like multimedia conferencing, video on demand and DVDs. Coding is an essential component of all video applications and it becomes necessary to have improved coding techniques for faster applications. This paper discusses how parallel video coding on load balanced multiprocessor systems can help in incorporating efficient coding techniques like vector quantization into practical applications. Two parallelprocessing platforms will be discussed namely the heterogeneous network of workstations and the TI C40 DSP Chips. The software platforms used for these are the parallel Virtual Machines (PVM) programming model and parallel C respectively. An integration of the two programming models by using a PVM to parallel C Translation and the effect of load balancing for improved performance will also be discussed.
暂无评论