作者:
Sulzmann, MartinLam, Edmund S. L.Programming
Logics and Semantics Group IT University of Copenhagen Rued Langgaards Vej 7 2300 Copenhagen S Denmark School of Computing
National University of Singapore S16 Level 5 3 Science Drive 2 Singapore 117543 Singapore
Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be appli...
详细信息
Grid networks are large distributed systems that share and virtualize heterogeneous resources. Quality of Service (QoS) is a key and complex issue for Grid services provisioning. Currently, most Grid networks offer be...
详细信息
ISBN:
(纸本)9781595937537
Grid networks are large distributed systems that share and virtualize heterogeneous resources. Quality of Service (QoS) is a key and complex issue for Grid services provisioning. Currently, most Grid networks offer best-effort (BE) services. Thus, QoS architectures initially developed for Internet such as DiffServ (DS) have been adapted to Grid environment. Since the widespread of Internet, many Grid networks will be deployed in the years to come over this technology. In this paper, we propose to compare two Flow-Aware Networking (FAN) architectures, mainly from the second generation (2GFAN). The purpose is to answer the question of which 2GFAN architecture performs better under Grid traffic. FAN is a promising option to DS for QoS provisioning in Internet networks. DS provides QoS differentiation through explicit packet marking and classification whereas FAN consist on per-flow admission control and implicit flow differentiation through priority fair queuing. The main difference between the two 2GFAN architectures is the fair queuing algorithm. Thus;to the knowledge of the authors, this is the first time two priority per-flow fair queuing algorithms are compared under Grid traffic. A GridFTP session may be seen as a succession of parallel TCP flows with large volumes of data transfers. Metrics used are average delay, average goodput and the average rejection rate.
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite gra...
详细信息
ISBN:
(纸本)9781595939609
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also called Tanner graphs, and their decoding demands very intensive computation. For that reason, VLSI dedicated architectures have been investigated and developed over the last few years. This paper proposes a new approach for LDPC decoding on graphics processing units (GPUs). Efficient data structures and an new algorithm are proposed to represent the Tanner graph and to perform LDPC decoding according to the stream-based computing model. GPUs were programmed to efficiently implement the proposed algorithms by applying data-parallel intensive computing. Experimental results show that GPUs perform LDPC decoding nearly three orders of magnitude faster than modem CPUs. Moreover, they lead to the conclusion that GPUs with their tremendous processing power can be considered as a consistent alternative to state-of-the-art hardware LDPC decoders.
The proceedings contain 134 papers. The topics discussed include: delaunay graphs of point sets in the plane with respect to axis-parallel rectangles;maintaining deforming surface meshes;two-phase greedy algorithms fo...
ISBN:
(纸本)9780898716474
The proceedings contain 134 papers. The topics discussed include: delaunay graphs of point sets in the plane with respect to axis-parallel rectangles;maintaining deforming surface meshes;two-phase greedy algorithms for some classes of combinatorial linear programs;yet another algorithm for dense max cut: go greedy;computing large matchings fast;distributed broadcast in unknown radio networks;the power of memory in randomized broadcasting;competitive queue management for latency sensitive packets;a nearly linear time algorithm for the half integral disjoint paths packing;nondecreasing paths in a weighted graph or: how to optimally read a train schedule;graph balancing: a special case of scheduling unrelated parallel machines;on the connectivity of dynamic random geometric graphs;balls and bins with structure: balanced allocations on hypergraphs;improved algorithms for orienteering and related problems;and deterministic random walks on regular trees.
We present a novel distributed algorithm for the maximal independent set (MIS) problem. On growth-bonded graphs (GBG) our deterministic algorithm finishes in O(log*n) time, n being the, number of nodes. In light of Li...
详细信息
ISBN:
(纸本)9781595939890
We present a novel distributed algorithm for the maximal independent set (MIS) problem. On growth-bonded graphs (GBG) our deterministic algorithm finishes in O(log*n) time, n being the, number of nodes. In light of Linial's Omega(log* n) lower bound our algorithm is asymptotically, v optimal. Our algorithm answers prominent open problems, in the ad hoc/sensor network domain. For instance, it solves the connected dominating set problem for unit. disk graphs in O(log* n) time, exponentially faster than the state-of-the-art algorithm. With a new extension our algorithm also computes a delta + 1 coloring in O(log'n) time, where delta is the maximum degree of the graph.
We consider the problem of frequent tree mining and present algorithms targeting emerging single-chip multiprocessor (CMP) architectures. We explore algorithmic designs that improve the memory performance of such algo...
详细信息
We analyze the shortest remaining processing time (SRPT) algorithm with respect to the problem of scheduling n jobs with release times on m identical machines to minimize total flow time. It is known that SRPT is opti...
详细信息
We analyze the shortest remaining processing time (SRPT) algorithm with respect to the problem of scheduling n jobs with release times on m identical machines to minimize total flow time. It is known that SRPT is optimal if m = 1 but that SRPT has a worst-case approximation ratio of Theta(min(log n/m, log Delta)) for this problem, where Delta is the ratio of the length of the longest job divided by the length of the shortest job. It has previously been shown that SRPT is able to use faster machines to produce a schedule as good as an optimal algorithm using slower machines. We now show that SRPT optimally uses these faster machines with respect to the worst-case approximation ratio. That is, if SRPT is given machines that are s >= 2-1/m times as fast as those used by an optimal algorithm, SRPT's flow time is at least s times smaller than the flow time incurred by the optimal algorithm. Clearly, no algorithm can offer a better worst-case guarantee, and we show that existing algorithms with similar performance guarantees to SRPT without resource augmentation do not optimally use extra resources.
Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer ...
详细信息
ISBN:
(纸本)9781424428366
Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a benchmark suite, VISBench, to sense as a proxy for this application class. We use VISBench to examine some important high level decisions for all accelerator architecture. We propose a highly parallel base architecture. We examine the need for synchronization and data communication. We also examine GPU-style SIMD execution and find that a MIMD architecture usually performs better. Given these high level choices, we use VISBench to explore the microarchitectural design space. We analyze area versus performance tradeoffs in designing individual cores and the memory hierarchy. We find that a design made of small, sample cores achieves much higher throughput than a general purpose uniprocessor. Further we find that a limited amount of support for ILP within each core aids overall performance. We find that fine-grained multithreading improves performance, but only up to a point. We find that word-level (SSE-style) SIMD provides a poor performance to area ratio. Finally, we find that sufficient memory and cache band-width is essential to performance.
We investigate the problem of monotonicity reconstruction, as defined in [3], in a parallel setting. We have oracle access to a nonnegative real-valued function f defined on domain [n]d = {1,..., n}d. We would like to...
详细信息
ISBN:
(纸本)9780898716474
We investigate the problem of monotonicity reconstruction, as defined in [3], in a parallel setting. We have oracle access to a nonnegative real-valued function f defined on domain [n]d = {1,..., n}d. We would like to closely approximate f by a monotone function g. This should be done by a procedure (a filter) that given as input a point × ∈ [n]d outputs the value of g(x), and runs in time that is highly sublinear in n. The procedure can (indeed must) be randomized, but we require that all of the randomness be specified in advance by a single short random seed. We construct such an implementation where the the time and space per query is (log n) O(1) and the size of the seed is polynomial in log n and d. Furthermore the distance of the approximating function g from f is at most a constant multiple of the minimum distance of any monotone function from f. This implementation allows for parallelization: one can initialize many copies of the filter with the same short random seed, and they can autonomously handle queries, while producing outputs that are consistent with the same approximating function g.
In this paper we analyze a dataflow architecture that maps efficiently onto modern FPGA architectures and is composed of communication channels which can be dynamically adapted to the algorithms dataflow. The reconfig...
详细信息
ISBN:
(纸本)9780769533070
In this paper we analyze a dataflow architecture that maps efficiently onto modern FPGA architectures and is composed of communication channels which can be dynamically adapted to the algorithms dataflow. The reconfiguration of the architecture's topology can be achieved within a single clock cycle while DSP operations are in progress. In order to maximize the bandwidth, the dataflow channel width is user-definable and can he chosen based on the application-specific requirements. Furthermore, the dataflow architecture can he efficiently mapped onto multi-FPGA platforms increasing at the same time the overall communication bandwidth.
暂无评论