A system implemented in MATLAB is described, which may be deployed over a Campus Grid utilizing the Condor job management system. Our approach can re-distribute jobs as node availability changes. the architecture of t...
详细信息
A system implemented in MATLAB is described, which may be deployed over a Campus Grid utilizing the Condor job management system. Our approach can re-distribute jobs as node availability changes. the architecture of the system, its components, and their deployment across the Cardiff University Campus Grid (consisting of 2500 machines) are presented. Challenges in image processing applications that can be deployed over such infrastructure are presented, along with performance results that demonstrate the use of our system alongside a standard Condor deployment, demonstrating a significant increase in throughput using our approach(double dagger). Copyright (C) 2008 John Wiley & Sons, Ltd.
We study a novel hierarchical wireless networking approach in which some of the nodes are more capable than *** such networks,the more capable nodes can serve as Mobile Backbone Nodes and provide a backbone over which...
详细信息
ISBN:
(纸本)9781595933683
We study a novel hierarchical wireless networking approach in which some of the nodes are more capable than *** such networks,the more capable nodes can serve as Mobile Backbone Nodes and provide a backbone over which end-to-end communication can take place. Our approac consists of controlling the mobility of the Backbone Nodes in order to maintain connectivity. We formulate the problem of minimizing the number of backbone nodes and refer to it as the Connected Disk Cover *** show that it can be decomposed into the Geometric Disk Cover (GDC)problem and the Steiner Tree Problem wit Minimum Number of Steiner Points (STP-MSP). We prove that if these sub-problems are solved separately by γ- and δ- approximation algorithms, the approximation ratio of t e joint solution is γ + δ. then, we focus on the two subproblems and present a number of distributed approximation algorithms that maintain a solution to the GDC problem under mobility A new approach to the solution of the STP-MSP is also described. We show that this approach can be extended in order to obtain a joint approximate solution to the Connected Disk Cover problem. Finally, we evaluate the performance of the algorithms via simulation and show that the proposed GDC algorithms perform very well under mobility and that the new approac for the joint solution can significantly reduce the number of required Mobile Backbone Nodes.
A multiple instruction stream-multiple data stream (MIMD) computer is a parallel computer with a large number of identical processing elements. the essential feature that distinguishes each MIMD computer family is the...
详细信息
ISBN:
(纸本)0818690143
A multiple instruction stream-multiple data stream (MIMD) computer is a parallel computer with a large number of identical processing elements. the essential feature that distinguishes each MIMD computer family is the interconnection network. In this paper, we are concerned with two representative types of interconnection networks that are called the hypercube and the chordal ring networks. A family of regular graphs is presented as a possible candidate for the implementation of a distributed system and for fault-tolerant architecture. the symmetry of these graphs makes it possible to determine message routing by using a simple distributed algorithm. Arbitrary data permutations are generally accomplished by sorting. For certain classes of permutations, however, (for example, many frequently used permutations in parallel processing, such as bit reversal, bit shuffle, bit complement, matrix transpose, butterfly permutations in FFT algorithms, and segment shuffles), there are algorithms that are more efficient than the best sorting algorithm. One of these is the bit permute complement (BPC) class of permutations. We have developed algorithms for bidirectional networks. the developed algorithm in hypercube networks requires only 1 token memory register in each node. the algorithm takes the same number of steps as the maximum Hamming distance. therefore, we have concluded that the presented algorithm is the optimal one. On the other hand, the developed algorithm in chordal ring networks requires 2 token storage register. the number of required routing steps in two kinds of networks is evaluated.
We present community-level study of the associations of human presence withthe distribution of mammals in Northwest Yunnan *** study sites were in a biodiversity hotspot which had been designated as part of the three...
详细信息
We present community-level study of the associations of human presence withthe distribution of mammals in Northwest Yunnan *** study sites were in a biodiversity hotspot which had been designated as part of the three parallel Rivers World Natural Heritage *** conducted surveys across 72 camera trapping locations to document mammal presence and *** used generalized linear mixed-effect models to document associations between ecological variables and the trapping rates of 8 mammal species including takin(Budorcus taxicolor),serow(Capricornis milneedwardsii),goral(Neamorhaedus griseus),blue sheep(Pseudois nayaur),tufted deer(Elaphodus cephalophus),musk deer(Moschus chrysogaster),leopard cat(Prionailurus bengalensis)and yellow-throated marten(Martes flavigula).We found that takin and serow occurrences were negatively associated with gathering while we detected no significant correlated between gathering and grazing and the abundance of some medium-sized mammal species(including musk deer,tufted deer,leopard cat and marten).At site-specific scales,blues sheep abundances were associated with alpine screes,serow and takin abundances were affected by canopy cover and distance to water sources,musk deer abundances were associated with oak shrubs,oak forests and open canopy cover,while tufted deer avoided oak *** species habitat associations were unspeceialised and showed no significant associations with habitat variables(leopard cat,marten and goral).High tolerant of medium-sized mammal species to gathering and grazing might be related to their nocturnal *** results showed large-sized mammals were more fragile to human disturbances and required higher cover conditions which might be relate to their unique life histories such as easy to be detected by predator including poachers,high energy requirements and low *** recommend rigorously control gathering and grazing in protected areas distributed with large-sized mammals,even for th
Computational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores...
详细信息
Computational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores (MICs), and those computational components are combined in heterogeneous clusters of nodes with different characteristics, including coprocessors of different types, with varying numbers of nodes at different speeds. the software previously developed and optimized for simpler system needs to be redesigned and reoptimized for these new, more complex systems. the adaptation to hybrid multicore+multiGPU and multicore+multiMIC of autotuning techniques for basic linear algebra routines is analyzed. the matrix-matrix multiplication kernel, which is optimized for different computational system components through guided experimentation, is studied. the routine is installed for each node in the cluster, and the information generated from individual installations may be used for a hierarchical installation in a cluster. the basic matrix-matrix multiplication may, in turn, be used inside higher level routines, which delegate their efficient execution to the optimization of the lower level routine. Experimental results are satisfactory in different multicore+multiGPU and multicore+multiMIC systems. So the guided search of execution configurations for satisfactory execution times proves to be a useful tool for heterogeneous systems, where the complexity of the system means a correct use of highly efficient routines and libraries is difficult.
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the s...
详细信息
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) the combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Finally, we propose new research directions involving the investigation of additional scheduling strategies.
the JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a...
详细信息
the JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1)an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2)a parallel entropy decoding step on the graphics processing unit (GPU). the block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. this configuration has become common withthe advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000JPEG images, JParEnt outperforms the SIMD-implementation of the libjpeg-turbo library by up to a factor of 4.3x, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2x. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to97% of the maximum attainable speedup (95% on average). On the IGP-based desktop platform,
暂无评论