The dominance of multimedia traffic over the internet, coupled with the rising number of applications of MANETs, has made Quality of Service (QoS) a major concern. The ieee 802.11e standard enhanced distributed channe...
详细信息
ISBN:
(纸本)9781509025268
The dominance of multimedia traffic over the internet, coupled with the rising number of applications of MANETs, has made Quality of Service (QoS) a major concern. The ieee 802.11e standard enhanced distributed channel access(EDCA) has been proposed as an enhancement of 802.11 standard, which provides QoS at MAC layer with a service differentiation scheme for high priority traffic. But ieee 802.11e has not adequately addressed the issue of handling best effort traffic data flows through contention-based networks which in turn results in TCP performance degradation. To enhance the best effort traffic performance in 802.11e with high-quality of service to maximize the system throughput, we propose a novel scheme called, Adaptive Best Effort Traffic Scheduler for EDCA(ABET-EDCA). In this scheme, TCP packets get prioritized by dynamically adapting to contention window parameters. In addition to this, traffic class monitors the MAC queue and computes TXOP limits value at runtime. This results in reduced delay and loss factor. Additionally, we incorporate the cross layer approach by exploiting the physical and MAC layer information to initiate corrective measures at the Transport and Network layer to enhance best effort traffic performance. The simulation conducted shows the significant improvement in TCP performance in terms of goodput, delay and throughput even under high loads compared to EDCA.
Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallelapplications for which there exist early ad-hoc strategies embedded into an asymmetry-conscious...
详细信息
ISBN:
(纸本)9781509036837
Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallelapplications for which there exist early ad-hoc strategies embedded into an asymmetry-conscious runtimes. In this paper we take a different path that addresses the complexity of the problem at the library level, via a few asymmetry-aware fundamental kernels, hiding the architecture heterogeneity from the task scheduler. For the specific domain of dense linear algebra, we show that this elegant solution delivers much higher performance than a naive approach based on an asymmetry-oblivious scheduler. Furthermore, this solution also outperforms an ad-hoc asymmetry-aware scheduler furnished with sophisticated scheduling techniques.
Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used soft...
详细信息
ISBN:
(纸本)9781509036837
Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. In this paper, we present a fast implementation of the spectral clustering algorithm on a CPU-GPU heterogeneous platform. Our implementation takes advantage of the computational power of the multi-core CPU and the massive multithreading and SIMD capabilities of GPUs. Given the input as data points in high dimensional space, we propose a parallel scheme to build a sparse similarity graph represented in a standard sparse representation format. Then we compute the smallest k eigenvectors of the Laplacian matrix by utilizing the reverse communication interfaces of ARPACK software and cuSPARSE library, where k is typically very large. Moreover, we implement a very fast parallelized k-means algorithm on GPUs. Our implementation is shown to be significantly faster compared to the best known Matlab and Python implementations for each step. In addition, our algorithm scales to problems with a very large number of clusters.
OpenMP plays a growing role as a portable programming model to harness on-node parallelism, yet, existing data race checkers for OpenMP have high overheads and generate many false positives. In this paper, we propose ...
详细信息
ISBN:
(纸本)9781509021413
OpenMP plays a growing role as a portable programming model to harness on-node parallelism, yet, existing data race checkers for OpenMP have high overheads and generate many false positives. In this paper, we propose the first OpenMP data race checker, ARCHER, that achieves high accuracy, low overheads on large applications, and portability. ARCHER incorporates scalable happens-before tracking, exploits structured parallelism via combined static and dynamic analysis, and modularly interfaces with OpenMP runtimes. ARCHER significantly outperforms TSan and Intel® Inspector XE, while providing the same or better precision. It has helped detect critical data races in the Hypre library that is central to many projects at Lawrence Livermore National Laboratory and elsewhere.
The increasing complexity of high performance computing systems creates high demands on performance tools and human analysts due to an unmanageable volume of data gathered for performance analysis. A promising approac...
详细信息
ISBN:
(纸本)9781509021413
The increasing complexity of high performance computing systems creates high demands on performance tools and human analysts due to an unmanageable volume of data gathered for performance analysis. A promising approach for reducing data volume is classification of data from multiple processes into groups of similar behavior to aid in analyzing application performance and identifying hot spots. However, existing approaches for structural and temporal classification of performance data suffer from lack of scalability or produce misleading results. To address this problem, we present a novel and effective structural similarity measure to efficiently classify data from parallel processes and introduce a method for efficient storage of the classified data. Using four examples, we show how existing performance analysis techniques benefit from our structural classification. Finally, we present a case study with 15 applications on up to 65,536 parallel processes that demonstrates the generality and scalability of our classification approach.
The evolution of massively parallel supercomputers make palpable two issues in particular: the load imbalance and the poor management of data locality in applications. Thus, with the increase of the number of cores an...
详细信息
The maximum common subgraph of two graphs, G 1 and G 2 , is the largest subgraph in G 1 that is isomorphic to a subgraph in G 2 . Finding the maximum common subgraph of two given graphs is known to be a NP-complete ...
详细信息
ISBN:
(纸本)9781509036837
The maximum common subgraph of two graphs, G 1 and G 2 , is the largest subgraph in G 1 that is isomorphic to a subgraph in G 2 . Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem. However, as the size of the graph increases, the solution space of the maximal clique enumeration problem increases combinatorially. A serial solution to the computationally intensive problem of complete maximal clique enumeration is tedious. This paper presents a parallel approach using Graphic processing Unit to compute the maximum common subgraph of the given graphs. The parallel procedure achieves more than tenfold improvement in computational performance. As an application of the proposed parallel maximum common subgraph algorithm, two new tools, LIGANDMATCHER and GRAPHSCREEN are developed. These tools can be used to narrow down the large ligand search space to a small number in the screening phase of drug discovery process.
The duality between graphs and matrices means that many common graph analyses can be expressed with primitives such as generalized sparse matrix-vector multiplication (SpMSpV) and sparse matrix-matrix multiplication (...
详细信息
ISBN:
(纸本)9781509021413
The duality between graphs and matrices means that many common graph analyses can be expressed with primitives such as generalized sparse matrix-vector multiplication (SpMSpV) and sparse matrix-matrix multiplication (SpGEMM). Achieving high performance on these primitives is challenging due to limited arithmetic intensity, irregular memory accesses, and significant network communication requirements in the distributed setting. In this paper we implement four graph applications using GraphPad, our optimized multinode implementations of generalized linear algebra primitives such as SpMSpV and SpGEMM. GraphPad is highly flexible to accommodate multiple data layouts, partitioning strategies, and incorporates communication optimizations. Our performance at scale can exceed that of CombBLAS by up to 40×. In addition to GraphPad's performance in a distributed setting, it is also within 2× the performance of GraphMat, a high performance graph framework on a single node for four out of five benchmarks. We also show our communication optimizations and flexibility are critical for good performance on both HPC clusters and commodity cloud platforms.
In this paper, we propose a hybrid CPU+GPU data structure, that optimizes search operation for frequently accessed search keys. This is based on the working-set structure due to Badiu et al. [1]. The main idea is to m...
详细信息
暂无评论