检索结果-内蒙古大学图书馆

28th ACM SIGKDD Conference on Knowledge Discovery and data Mining (KKD)

作者： Zhang, Sean Ursekar, Varun Akoglu, Leman Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450393850

There is no shortage of outlier detection (OD) algorithms in the literature, yet a vast body of them are designed for a single machine. With the increasing reality of already cloud-resident datasets comes the need for distributed OD techniques. This area, however, is not only understudied but also short of public-domain implementations for practical use. This paper aims to fill this gap: We design SPARX, a data-parallel OD algorithm suitable for shared-nothing infrastructures, which we specifically implement in Apache Spark. Through extensive experiments on three real-world datasets, with several billions of points and millions of features, we show that existing open-source solutions fail to scale up;either by large number of points or high dimensionality, whereas SPARX yields scalable and effective performance. To facilitate practical use of OD on modern-scale datasets, we open-source SPARX under the Apache license.(1)

关键词： distributed outlier detection data-parallel algorithms Apache Spark

来源：评论

学校读者我要写书评

暂无评论

Efficient and Scalable Functional Dependency Discovery on Distributed data-parallel Platforms

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2019年第12期30卷 2663-2676页

作者： Zhu, Guanghui Wang, Qian Tang, Qiwei Gu, Rong Yuan, Chunfeng Huang, Yihua Nanjing Univ Natl Key Lab Novel Software Technol Nanjing 210008 Jiangsu Peoples R China

Functional dependencies (FDs) play a very important role in many data management tasks such as schema normalization, data cleaning, and query optimization. Meanwhile, there are ever-increasing application demands for efficient FD discovery on large-scale datasets. Unfortunately, due to huge runtime and memory overhead, the existing single-machine FD discovery algorithms are inefficient for large-scale datasets. Recently, distributed data-parallel computing has become the de facto standard for large-scale data processing. However, it is challenging to design an efficient distributed FD discovery algorithm. In this paper, we present SmartFD, which is an efficient and scalable algorithm for distributed FD discovery. First, we propose a novel attribute sorting-based algorithm framework. Next, to discover all the FDs grouped by a given attribute, we propose an efficient distributed algorithm Attribute-centric Functional Dependency Discovery (AFDD). In AFDD, we design an Fast Sampling and Early Aggregation (FSEA) mechanism to improve the efficiency of distributed sampling and propose a memory-efficient index-based method for distributed FD validation. Moreover, AFDD employs an attribute-parallel method to accelerate the pruning-and-generation of candidate FDs. Furthermore, we propose an adaptive switching strategy between distributed sampling and distributed validation based on the unified time-based efficiency metric. Also, we employ a distributed probing based method to make the switching strategy more accurate. Experimental results on Apache Spark reveal that SmartFD outperforms the state-of-the-art single-machine algorithm HyFD and the existing distributed algorithm HFDD with 3.2 & x00D7;-44.9 & x00D7;and 2.5 & x00D7;-455.7 & x00D7;speedup respectively. Moreover, SmartFD achieves good row scalability and column scalability. Additionally, SmartFD has sub-linear node scalability.

关键词： Distributed databases Scalability Remuneration Lattices Distributed algorithms Switches Query processing Functional dependency discovery distributed computing data-parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed data-parallel Platforms

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2017年第9期28卷 2539-2552页

作者： Gu, Rong Tang, Yun Tian, Chen Zhou, Hucheng Li, Guanru Zheng, Xudong Huang, Yihua Nanjing Univ State Key Lab Novel Software Technol Nanjing 210000 Jiangsu Peoples R China Microsoft Res Beijing 100084 Peoples R China

Matrix multiplication is a dominant but very time-consuming operation in many big data analytic applications. Thus its performance optimization is an important and fundamental research issue. The performance of large-scale matrix multiplication on distributed data-parallel platforms is determined by both computation and IO costs. For existing matrix multiplication execution strategies, when the execution concurrency scales up above a threshold, their execution performance deteriorates quickly because the increase of the IO cost outweighs the decrease of the computation cost. This paper presents a novel parallel execution strategy CRMM (Concurrent Replication-based Matrix Multiplication) along with a parallel algorithm, Marlin, for large-scale matrix multiplication on data-parallel platforms. The CRMM strategy exploits higher execution concurrency for sub-block matrix multiplication with the same IO cost. To further improve the performance of Marlin, we also propose a number of novel system-level optimizations, including increasing the concurrency of local data exchange by calling native library in batch, reducing the overhead of block matrix transformation, and reducing disk heavy shuffle operations by exploiting the semantics of matrix computation. We have implemented Marlin as a library along with a set of related matrix operations on Spark and also contributed Marlin to the open-source community. For large-sized matrix multiplication, Marlin outperforms existing systems including Spark MLlib, SystemML and SciDB, with about 1.29x, 3.53x and 2.21x speedup on average, respectively. The evaluation upon a real-world DNN workload also indicates that Marlin outperforms above systems by about 12.8x, 5.1x and 27.2x speedup, respectively.

关键词： parallel matrix multiplication data-parallel algorithms machine learning library

来源：评论

学校读者我要写书评

暂无评论

The Spy Element Method - A universal approach to complex computing on manycore processors

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2012年第3期3卷 169-180页

作者： Mazzeo, Marco D. Pompeu Fabra Univ Informat & Commun Technol Dept Barcelona 08018 Spain

This work presents a general approach, coined "Spy Element Method" (SEM), for parallelising workloads to run on manycore processors and featuring dynamic dependencies between different data items, such as graph traversals, remeshing methods and particle simulations. In the SEM, appropriately defined entities, denominated "spy elements", inspect their neighbourhood and without atomic memory operations update in parallel all the dynamic data dependencies, including those between themselves, between the original objects of the current problem and among entities of different kind. The application of the SEM to meshless simulation models obviates the use of binning algorithms relying on sorting or atomics, and concomitantly renders their implementation fully particle centric. On the NVIDIA GeForce GTX 480, an optimised particle-based fluid simulator runs 1.4-2.3 times faster when the binning aided by the state-of-the-art parallel sorter is eliminated and data updating is delegated to the SEM, and is at least 40 times faster than a highly optimised single-threaded version running on a single core of the Intel Xeon X5650 2.66 GHz. (c) 2012 Elsevier B.V. All rights reserved.

关键词： data-parallel algorithms Mesh less methods Mesh-based methods Many-core architectures CPU computing

来源：评论

学校读者我要写书评

暂无评论

Graphics processing unit based direct simulation Monte Carlo

引用

SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL 2012年第6期88卷 680-693页

作者： Gladkov, Denis Tapia, Jose-Juan Alberts, Samuel D'Souza, Roshan M. Univ Wisconsin Milwaukee Coll Engn & Appl Sci Dept Mech Engn Milwaukee WI 53201 USA Univ Wisconsin Milwaukee PhD Program Milwaukee WI 53201 USA

The direct simulation Monte Carlo (DSMC) is a computational method for fluid mechanics simulation in the regime of rarefied gas flow. It is a numerical solution of the Boltzmann equation based on an individual particle basis. Accurate simulations typically require particle numbers in the range of hundreds of thousands to millions. Such large simulations require an inordinate amount of time for processing using serial computing on central processing units (CPUs). In this paper we investigate data-parallel techniques on graphics processing units (GPUs) to execute very large scale DSMC simulations. We have designed and implemented Bird's method on a three-dimensional simulation domain that includes complex geometry interactions. We also have tested and verified the statistical and theoretical accuracy of our implementation. Our results show substantial performance improvements (nearly two orders of magnitude) over Bird's serial implementation without loss of accuracy.

关键词： bird data-parallel algorithms direct simulation Monte Carlo (DSMC) GPU parallel computing

来源：评论

学校读者我要写书评

暂无评论

Interactive machinability analysis of free-form surfaces using multiple-view image space techniques on the GPU

引用

ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING 2010年第6期26卷 703-710页

作者： Lysenko, Mikola D'Souza, Roshan M. Univ Wisconsin Dept Mech Engn Milwaukee WI 53201 USA Univ Wisconsin Dept Mech Engn Madison WI 53706 USA

In this paper we present a set of graphics hardware accelerated algorithms to interactively evaluate the machinability of complex free-form surfaces. These algorithms work in image space and easily interface with all common formats available on CAD systems. The running time of these algorithms is independent of the complexity of the surface to be analyzed and depends only on the size of the projected image of the surface and the largest available tool head. Interactive speed is achieved through clever use of data-parallel techniques that map nicely onto the programming model of modern programmable graphics processing units. We demonstrate a method for pre-calculating and storing the machinability of a surface within a texture to further reduce rendering costs. The algorithms are implemented and tested on a complex set of parts and their performance has been analyzed. (C) 2010 Elsevier Ltd. All rights reserved.

关键词： DfM Machinability GPGPU data-parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Interactive machinability analysis of free-form surfaces using multiple-view image space techniques on the GPU

Interactive machinability analysis of free-form surfaces usi...

引用

19th International Conference on Flexible Automation and Intelligent Manufacturing

作者： Lysenko, Mikola D'Souza, Roshan M. Univ Wisconsin Dept Mech Engn Milwaukee WI 53201 USA Univ Wisconsin Dept Mech Engn Madison WI 53706 USA

关键词： DfM Machinability GPGPU data-parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Chip Scale Photonic Interconnects for Energy-Performance Optimized Computing

Chip Scale Photonic Interconnects for Energy-Performance Opt...

引用

23rd Annual Meeting of the IEEE Photonics-Society

作者： Bergman, Keren Columbia Univ Dept Elect Engn New York NY 10027 USA

ISBN: (纸本)9781424453696

As computational performance continues to scale in chip multiprocessors with increasing numbers of cores, the gap between the available off-chip bandwidth and that which is required to appropriately feed the processors continues to widen under current architectures. For many high-performance computing applications, the bandwidth available for both on- and off-chip communications can play a vital role in efficient execution due to the use of data-parallel or data-centric algorithms. Electronic interconnected systems are increasingly bound by their communications infrastructure and the associated power dissipation of high-bandwidth data movement. Recent advances in chip-scale silicon photonic technologies have created the potential for developing optical interconnection networks that can offer highly energy efficient communications and significantly improve computing performance-per-Watt. This paper examines the design and performance of photonic networks-on-chip architectures that support both on-chip communication and off-chip memory access in an energy efficient manner.

关键词： chip multiprocessors chip scale photonic interconnects chip-scale silicon photonic technologies data-centric algorithms data-parallel algorithms electronic interconnected system energy-performance optimized computing high-bandwidth data movement high-perf

来源：评论

学校读者我要写书评

暂无评论

Sparse matrix computations on manycore GPU's

Sparse matrix computations on manycore GPU's

引用

45th ACM/IEEE Design Automation Conference

作者： Garland, Michael NVIDIA Corp Santa Clara CA 95050 USA

ISBN: (纸本)9781605581156

Modem microprocessors are becoming increasingly parallel devices, and GPUs are at the leading edge of this trend. Designing parallel algorithms for manycore chips like the GPU can present interesting challenges, particularly for computations on sparse data structures. One particularly common example is the collection of sparse matrix solvers and combinatorial graph algorithms that form the core of many physical simulation techniques. Although seemingly irregular, these operations can often be implemented with data parallel operations that map very well to massively parallel processors.

关键词： parallel programming data-parallel algorithms GPU computing sparse matrix-vector multiplication shortest path algorithms

来源：评论

学校读者我要写书评

暂无评论

Sparse Matrix Computations on Manycore GPU's 08

Sparse Matrix Computations on Manycore GPU's

引用

45th ACM/IEEE Design Automation Conference, vol.1

作者： Michael Garland NVIDIA Corporation Santa Clara CA

ISBN: (纸本)9781605581156

Modern microprocessors are becoming increasingly parallel devices, and GPUs are at the leading edge of this trend. Designing parallel algorithms for manycore chips like the GPU can present interesting challenges, particularly for computations on sparse data structures. One particularly common example is the collection of sparse matrix solvers and combinatorial graph algorithms that form the core of many physical simulation techniques. Although seemingly irregular, these operations can often be implemented with data parallel operations that map very well to massively parallel processors.

关键词： algorithms parallel programming data-parallel algorithms GPU computing Sparse matrix-vector multiplication Shortest path algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：