检索结果-内蒙古大学图书馆

arXiv 2017年

作者： Ben-David, Naama Blelloch, Guy E. Fineman, Jeremy T. Gibbons, Phillip B. Gu, Yan McGuffey, Charles Shun, Julian Carnegie Mellon University Georgetown University Uc Berkeley

The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)-sized implicit decomposition of a bounded-degree graph G on n nodes, which combined with read-only access to G enables fast answers to connectivity and biconnectivity queries on G. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on m edges, we also provide the first o(m) writes and O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

A Laboratory Based Course on GPU Programming: Methods, Pract...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Jawwad Ahmed Shamsi Systems Research Laboratory Computer Science - FAST National University of Computer and emerging Sciences Karachi Pakistan

Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant emphasis on developing practical hands-on skills, building skills for parallel algorithm design and implementation, and utilizing GPUs for solving computationally expensive problems. The paper explains the goals of the course and elaborates on course contents and students' assessments. Student feedback reveals effective learning and improved utilization of GPUs by students. This paper is useful for the community members who would like to teach GPU programming as an elective course in parallel computing. The course can either be offered at the senior undergraduate level or at the graduate level.

关键词： Graphics processing units Programming parallel algorithms Computer architecture Heuristic algorithms Education

来源：评论

学校读者我要写书评

暂无评论

More iterations per second, same quality – Why asynchronous algorithms may drastically outperform traditional ones

arXiv

引用

arXiv 2017年

作者： Hannah, Robert Yin, Wotao Department of Mathematics University of California Los AngelesCA90095 United States

In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock [1], that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM, etc.). In asynchronous-parallel algorithms, the computing nodes simply use the most recent information that they have access to, instead of waiting for a full update from all nodes in the system. This means that nodes do not have to waste time waiting for information, which can be a major bottleneck, especially in distributed systems. When the system has p nodes, asynchronous algorithms may complete Θ(ln(p)) more iterations than synchronous algorithms in a given time period ("more iterations per second"). Although asynchronous algorithms may compute more iterations per second, there is error associated with using outdated information. How many more iterations in total are needed to compensate for this error is still an open question. The main results of this paper aim to answer this question. We prove, loosely, that as the size of the problem becomes large, the number of additional iterations that asynchronous algorithms need becomes negligible compared to the total number ("same quality" of the iterations). Taking these facts together, our results provide solid evidence of the potential of asynchronous algorithms to vastly speed up certain distributed computations. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Concurrent Reflective Abstract State Machines

Concurrent Reflective Abstract State Machines

引用

International Symposium on Symbolic and Numeric algorithms for Scientific Computing (SYNASC)

作者： Klaus-Dieter Schewe Christian Doppler Laboratory for Client-Centric Cloud Computing Linz Austria

ISBN: (纸本)9781538626276

The core of a distributed system can be characterised by autonomously acting agents, where each agent executes its own program, uses shared resources and communicates with the others, but otherwise is totally oblivious to the behaviour of the other agents. In a distributed adaptive system (DAS) agents may change their programs, enter or leave the collection at any time thereby changing the behaviour of the overall system. The behavioural theory of DAS provides an axiomatic definition plus a proof that concurrent reflective abstract state machines (crASMs) captures all systems stipulated by the axioms. In this paper we take a closer look into crASMs emphasising the tree background structure that is needed for handling the manipulation of self-representations.

关键词： Silicon parallel algorithms Vocabulary Adaptive systems Adaptation models Linguistics Concurrent computing

来源：评论

学校读者我要写书评

暂无评论

parallelCharMax: An Effective Maximal Frequent Itemset Mining Algorithm Based on MapReduce Framework

ParallelCharMax: An Effective Maximal Frequent Itemset Minin...

引用

ACS/IEEE International Conference on Computer Systems and Applications

作者： Rania Mkhinini Gahar Olfa Arfaoui Minyar Sassi Hidri Nejib Ben Hadj-Alouane University of Tunis El Manar National Engineering School of Tunis Tunis Tunisia Imam Abdulrahman Bin Faisal University Dammam Arabie Saoudite

Nowadays, the explosive growth in data collection in business and scientific areas has required the need to analyze and mine useful knowledge residing in these data. The recourse to data mining techniques seems to be inescapable in order to extract useful and novel patterns/models from large datasets. In this context, frequent itemsets (patterns) play an essential role in many data mining tasks that try to find interesting patterns from datasets. However, conventional approaches for mining frequent itemsets in Big Data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm, called parallelCharMax, that is based on a powerful sequential algorithm, called Charm, and computes the maximal frequent itemsets that are considered perfect summaries of the frequent ones. The proposed algorithm has been implemented using MapReduce framework. The experimental component of the study shows the efficiency and the performance of the proposed algorithm compared with well known algorithms such as MineWithRounds and HMBA.

关键词： Itemsets Data mining parallel algorithms Task analysis Partitioning algorithms Classification algorithms

来源：评论

学校读者我要写书评

暂无评论

Hybrid computing for intra prediction in HEVC

引用

Civil-Comp Proceedings 2017年 111卷

作者： Galiano, V. Herranz, V. Migallon, H. Lopez-Granado, O. Pinol, P. Malumbres, M.P. Miguel Hernández University Elche Spain

The HEVC video coding standard designed by the Joint Collaborative Team on Video Coding requires nearly 70% more time than the previous standard H.264/AVC to encode a video sequence, because it is computationally more complex than its predecessor. We can take advantage of many-core architectures to reduce the total coding time, and thus, in this paper, we propose the use of an hybrid architecture where intensive computing of intra-picture prediction is performed in GPUs. We have analyzed the sequential intra prediction algorithm in HEVC and we propose a parallel architecture where the sequential thread can be highly parallelized in order to reduce the total coding time. © Civil-Comp Press, 2017.

关键词： Video signal processing Codes (symbols) Forecasting Graphics processing unit Image coding parallel algorithms parallel architectures Program processors Collaborative teams HEVC Hybrid architectures Intensive computing Intra Prediction Intra prediction algorithms Many core architecture Video coding standard

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for determining the communication radius of an automatic light trap based on balltree structure

A parallel algorithm for determining the communication radiu...

引用

8th International Conference on Knowledge and Systems Engineering (KSE)

作者： Giang Nguyen Thi Phuong Huong Hoang Luong Tai Huu Pham Hiep Xuan Huynh Ind Univ Ho Chi Minh City Ho Chi Minh City Vietnam Cantho Univ CUSC Can Tho Vietnam Cantho Univ DREAM CTU IRD CICT Can Tho Vietnam

ISBN: (纸本)9781467389297

Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an automatic BPH light trap surveillance network can eectively communicate. In this paper, we propose a new approach to determine the communication radius of an automatic light trap based on balltree structure. This approach will propose a parallel algorithm for implementing the balltree structure (CudaBalltree) and determining the communication radius of an automatic light trap by using CUDA NVIDA platform.

关键词： Graphics processing units parallel algorithms Temperature sensors Surveillance Cameras parallel algorithms Temperature Sensor Device Component Cameras Traps Graphics Processing Unit AUTOMATIC Radius Radius Light Communication Research

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Bayesian Networks Structure Learning with Applications to Systems Biology

Parallel Algorithms for Bayesian Networks Structure Learning...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Olga Nikolova Bioinformatics and Computational Biology Department of Computer Engineering Iowa State University Ames IA USA

Bayesian networks (BN) are probabilistic graphical models which are widely utilized in modeling complex biological interactions in the cell. Learning the structure of a BN is an NP-hard problem and existing exact and heuristic solutions do not scale to large enough domains to allow for meaningful modeling of many biological processes. In this work, we present efficient parallel algorithms which push the scale of both exact and heuristic BN structure learning. We demonstrate the applicability of our methods by implementations on an IBM Blue Gene/L and an AMD Opteron cluster, and discuss their significance for future applications to systems biology.

关键词： parallel algorithms Hypercubes Program processors Bayesian methods Lattices Computational modeling Markov processes

来源：评论

学校读者我要写书评

暂无评论

Performance comparison between state-of-the-art point-cloud based collision detection approaches on the CPU and GPU

Performance comparison between state-of-the-art point-cloud ...

引用

4th IFAC Symposium on Telematics Applications (TA)

作者： Schauer, Johannes Bedkowski, Janusz Majek, Karol Nuechter, Andreas Julius Maximilians Univ Wurzburg Informat Robot & Telemat 7 D-97074 Wurzburg Germany Inst Math Machines Ul Krzywickiego 34 PL-02078 Warsaw Poland

We present two fundamentally different approaches to detect collisions between two point clouds and compare their performance on multiple datasets. A collision between points happens if they are closer to each other than a given threshold radius. One approach utilizes the main CPU with a k-d tree datastructure to efficiently carry out fixed range searches around points in 3D while the other mainly executes on a GPU using a regular grid decomposition technique implemented in the CUDA framework. We will show how massively parallel 3D range searches on a grid based datastructure on a GPU performs similarly well as a tree based approach on the CPU with orders of magnitude less parallelization. We also show how each method scales with varying input sizes and how they perform differently well depending on the spatial structure of the input data. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： k-d tree CUDA parallel algorithms 3D point clouds regular grid decompositio

来源：评论

学校读者我要写书评

暂无评论

A parallelized Method for Continuous-Time Models with Dependence on Calculation Order

A Parallelized Method for Continuous-Time Models with Depend...

引用

8th IFAC Symposium on Advances in Automotive Control (AAC)

作者： Sata, Kota Azuma, Shun-ichi Ohata, Akira Toyota Motor Co Ltd Adv Unit Management System Dev Div Shizuoka Japan Kyoto Univ Dept Syst Sci Kyoto Japan Technova Inc Tokyo Japan

The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many-core architecture is the only way to decrease execution time. However, when implementing the engine control software, various problems occur in utilization of the multi- / many-core ECU. One of the biggest problems is sequential structure of control software because the software can only execute with one core on the multi- / many-core ECU. The purpose of this paper is to describe the parallelized control design method, which has decomposed sequential structure and decreases execution time in the embedded multi- / many-core production ECU. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： automotive control control design execution times electronic control units internal combustion engine parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：