检索结果-内蒙古大学图书馆

A fully parallelized scheme of constructing independent spanning trees on Mobius cubes

JOURNAL OF SUPERCOMPUTING 2015年第3期71卷 952-965页

作者： Yang, Jinn-Shyong Wu, Meng-Ru Chang, Jou-Ming Chang, Yu-Huei Natl Taipei Univ Business Dept Informat Management Taipei Taiwan Natl Taipei Univ Business Inst Informat & Decis Sci Taipei Taiwan

A set of spanning trees in a graph is said to be independent (ISTs for short) if all the trees are rooted at the same node and for any other node , the paths from to in any two trees are node-disjoint except the two end nodes and . It was conjectured that for any -connected graph there exist ISTs rooted at an arbitrary node. Let be the number of nodes in the -dimensional Mobius cube . Recently, for constructing ISTs rooted at an arbitrary node of , Cheng et al. (Comput J 56(11):1347-1362, 2013) and (J Supercomput 65(3):1279-1301, 2013), respectively, proposed a sequential algorithm to run in time and a parallel algorithm that takes time using processors. However, the former algorithm is executed in a recursive fashion and thus is hard to be parallelized. Although the latter algorithm can simultaneously construct ISTs, it is not fully parallelized for the construction of each spanning tree. In this paper, we present a non-recursive and fully parallelized approach to construct ISTs rooted at an arbitrary node of in time using nodes of as processors. In particular, we derive useful properties from the description of paths in ISTs, which make the proof of independency to become easier than ever before.

关键词： Independent spanning trees Interconnection networks Mobius cubes parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Correlating Centralities of Social Networks

Correlating Centralities of Social Networks

引用

IEEE International Conference on Advanced Networks and Telecommuncations Systems

作者： Neelaabh Gupta Anagh Narain Akshat Arora Dolly Sharma Computer Science Department Shiv Nadar University Gautam Buddha Nagar India

ISBN: (纸本)9781509021949

Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-world social network data sets such as Video Sharing Networks, Social Interaction Network and Co-Authorship Networks to examine their effects on them. We carry out the correlation analysis of these centralities and plot the results to recommend when to use those centrality measures. Additionally, we introduce a new centrality measure - Cohesion Centrality based on the cohesiveness of a graph, develop its sequential algorithm and further devise a parallel algorithm to implement it.

关键词： graph theory social networking (online) Graph theory Social Networks Cohesion sequential algorithm Correlation analysis parallel algorithms Social Capital Video sharing Network Network analysis network data

来源：评论

学校读者我要写书评

暂无评论

Speculative computation in IEC 61499 function blocks execution — Modeling and simulation 14th

Speculative computation in IEC 61499 function blocks executi...

引用

IEEE International Conference on Industrial Informatics (INDIN)

作者： Dmitrii Drozdov Victor Dubinin Valeriy Vyatkin Penza State University Penza Russian Federation Luleå University of Technology Luleå Sweden Aalto University Helsinki Finland

ISBN: (纸本)9781509028719

In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and to reduce response time on input events. Data and control dependencies in FB systems are recognized and defined as a basis for organizing speculative execution of FB algorithms. A simulation model of FB systems with speculative execution based on timed stochastic Petri nets is considered. In addition, the paper discusses the results of simulation experiments conducted in CPN Tools.

关键词： Petri nets parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimization of Computation-Intensive Applications in cc-NUMA Architecture

Optimization of Computation-Intensive Applications in cc-NUM...

引用

International Conference on Networking and Network Applications (NaNA)

作者： Ming Zhang Naijie Gu Kaixin Ren USTC and Loongson Technology Associated Laboratory Hefei China

ISBN: (纸本)9781467398046

Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing nodes are connected with network, the latency and bandwidth of network perform much worse than Hyper Transport (HT) and PCI-Express (PCI-E) bus. In order to enhance the performance of applications, a Hybrid parallel Framework for Computation-intensive Applications (HPFCA) was proposed. Task distribution, data storage, multicore parallelism and kernel optimization were discussed in the HPFCA. "MPI+OpenMP/Pthreads" mechanism was used for multi-node platforms. MPI was used for distributed memory parallelism, and "OpenMP/Pthreads" was used for shared memory parallelism. Moreover, GEMM and FFT, the representatives of the computation-intensive applications in the Godson-3B, were studied. According to the HPFCA, the parallel algorithms of GEMM and FFT were optimized. Finally, experimental results demonstrated that HPFCA could bring ideal performance in the Godson-3B.

关键词： Kernel Multicore processing Optimization Bandwidth parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters 16

PIPES: A Language and Compiler for Task-Based Programming on...

引用

Supercomputing Conference

作者： Martin Kong Louis-Noël Pouchet P. Sadayappan Vivek Sarkar Rice University Ohio State University

ISBN: (纸本)9781467388153

Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and control-flow relations between tasks, offering the runtime maximal freedom to place and schedule tasks. While productivity is increased, high-performance execution remains challenging: the implementation of parallel algorithms typically requires specific task placement and communication strategies to reduce internode communications and exploit data locality. In this work, we present a new macro-dataflow programming environment for distributed-memory clusters, based on the Intel Concurrent Collections (CnC) runtime. Our language extensions let the user define virtual topologies, task mappings, task-centric data placement, task and communication scheduling, etc. We introduce a compiler to automatically generate Intel CnC C++ run-time, with key automatic optimizations including task coarsening and coalescing. We experimentally validate our approach on a variety of scientific computations, demonstrating both productivity and performance.

关键词： Runtime C++ languages Programming Tuners Productivity parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

FAULT TOLERANT COMPUTATION WITH THE SPARSE GRID COMBINATION TECHNIQUE

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2015年第3期37卷 C331-C353页

作者： Harding, Brendan Hegland, Markus Larson, Jay Southern, James Australian Natl Univ Inst Math Sci Acton ACT 2601 Australia Fujitsu Labs Europe Hayes UB4 8FE Middx England

This paper continues to develop a fault tolerant extension of the sparse grid combination technique recently proposed in [B. Harding and M. Hegland, ANZIAM J. Electron. Suppl., 54 (2013), pp. C394-C411]. This approach to fault tolerance is novel for two reasons: First, the combination technique adds an additional level of parallelism, and second, it provides algorithm-based fault tolerance so that solutions can still be recovered if failures occur during computation. Previous work indicates how the combination technique may be adapted for a low number of faults. In this paper we develop a generalization of the combination technique for which arbitrary collections of coarse approximations may be combined to obtain an accurate approximation. A general fault tolerant combination technique for large numbers of faults is a natural consequence of this work. Using a renewal model for the time between faults on each node of a high performance computer, we also provide bounds on the expected error for interpolation with this algorithm in the presence of faults. Numerical experiments solving the scalar advection PDE demonstrate that the algorithm is resilient to faults on a real application. It is observed that the time to solution is not significantly affected by the presence of (simulated) faults. Additionally the expected error increases with the number of faults but is relatively small even for high fault rates. A comparison with traditional checkpoint-restart methods applied to the combination technique shows that our approach is highly scalable with respect to the number of faults.

关键词： exascale computing algorithm-based fault tolerance sparse grid combination technique parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Interactive model-based search for global optimization

引用

JOURNAL OF GLOBAL OPTIMIZATION 2015年第3期61卷 479-495页

作者： Wang, Yuting Garcia, Alfredo Univ Virginia Charlottesville VA 22903 USA

Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too little emphasis is put on exploration, the globally optimal solution may not be identified. Increasing the allocation of computational effort to exploration increases the chances of identifying a globally optimal solution but it also slows down convergence. Thus, in a single-thread implementation of model-based search exploration and exploitation are substitutes. In this paper we propose a new algorithmic design for global optimization based upon multiple interacting threads. In this design, each thread implements a model-based search in which the allocation of exploration versus exploitation effort does not vary over time. Threads interact through a simple acceptance-rejection rule preventing duplication of search efforts. We show the proposed design provides a speedup effect which is increasing in the number of threads. Thus, in the proposed algorithmic design, exploration is a complement rather than a substitute to exploitation.

关键词： Model-based search Multi-start local search parallel algorithms Cross-entropy method

来源：评论

学校读者我要写书评

暂无评论

A Partitioning Based Algorithm to Fuzzy Tricluster

引用

MATHEMATICAL PROBLEMS IN ENGINEERING 2015年第1期2015卷 1-10页

作者： Liu, Yongli Yang, Tengfei Fu, Lili Henan Polytech Univ Sch Comp Sci & Technol Jiaozuo 454000 Henan Peoples R China

Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of three-dimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a three-dimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on MovieLens demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches.

关键词： parallel algorithms FUZZY systems MEMBERSHIP ACQUISITION of data FEATURE extraction (Data processing)

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated SVD Algorithm, Based on QR Factorization and Givens Rotations, for DWI Denoising

A GPU-Accelerated SVD Algorithm, Based on QR Factorization a...

引用

International IEEE Conference on Signal-Image Technologies and Internet-Based System

作者： Livia Marcellino Guglielmo Navarra Dept. of Science and Technology University of Naples Parthenope Naples Italy

In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel version of the QR factorization by means Givens plane rotations using the Sameh and Kuck scheme. The parallel algorithm is driven by an outer loop executed on the CPU. Therefore, threads and blocks configuration is organized in order to use the shared memory and avoid multiple accesses to global memory. However, the main kernel provides coalesced accesses to global memory using contiguous indices. As case study, we consider the application of the SVD in the Overcomplete Local Principal Component Analysis (OLPCA) algorithm for the Diffusion Weighted Imaging (DWI) denoising process. Our results show significant improvements in terms of performances with respect to the CPU version that encourage its usability for this expensive application.

关键词： Matrix decomposition Graphics processing units Central Processing Unit Noise reduction Principal component analysis parallel algorithms Image processing

来源：评论

学校读者我要写书评

暂无评论

Influence of parallel metrics in the analysis of parallel metaheuristic algorithms

Influence of parallel metrics in the analysis of parallel me...

引用

2011 11th International Conference on Intelligent Systems Design and Applications, ISDA'11

作者： García, Aracelys Luque, Gabriel Alba, Enrique Departamento de IGSW Facultad 1 Universidad de Las Ciencias Informáticas La Habana Cuba Depto. de Lenguajes Y Ciencias de la Computatión Universidad de Málaga Málaga Spain

ISBN: (纸本)9781457716751

High computational requirements of current problems have driven most researches towards efficient processing formulations which require the use of multiple processors interconnected, this is the foundation of the parallel processing mechanism. Among the metrics to measure the performance of parallel algorithms, the most important and used is the speedup, but in the scientific community does not exist a consent on its definition and use. The aim of this work is to study different alternatives evaluating parallel metaheuristics. This report presents the results of several experimental tests to show the use of the speedup evaluating the same parallel distributed Genetic Algorithm in different ways, to solve MAXSAT problem. Our experiments show that depending on how the algorithm speedup is evaluated, different results can be obtained. Taking into account the test results we can conclude that the best scenario for evaluating parallel algorithms is comparing algorithms with the same accuracy, defining the quality of the solutions as stop condition, because all executions reach the optimal value allowing fair comparisons. © 2011 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：