A set of spanning trees in a graph is said to be independent (ISTs for short) if all the trees are rooted at the same node and for any other node , the paths from to in any two trees are node-disjoint except the two e...
详细信息
A set of spanning trees in a graph is said to be independent (ISTs for short) if all the trees are rooted at the same node and for any other node , the paths from to in any two trees are node-disjoint except the two end nodes and . It was conjectured that for any -connected graph there exist ISTs rooted at an arbitrary node. Let be the number of nodes in the -dimensional Mobius cube . Recently, for constructing ISTs rooted at an arbitrary node of , Cheng et al. (Comput J 56(11):1347-1362, 2013) and (J Supercomput 65(3):1279-1301, 2013), respectively, proposed a sequential algorithm to run in time and a parallel algorithm that takes time using processors. However, the former algorithm is executed in a recursive fashion and thus is hard to be parallelized. Although the latter algorithm can simultaneously construct ISTs, it is not fully parallelized for the construction of each spanning tree. In this paper, we present a non-recursive and fully parallelized approach to construct ISTs rooted at an arbitrary node of in time using nodes of as processors. In particular, we derive useful properties from the description of paths in ISTs, which make the proof of independency to become easier than ever before.
Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-wo...
详细信息
ISBN:
(纸本)9781509021949
Centrality is an important measure to identify the most important actors in a network. This paper discusses the various Centrality Measures used in Social Network Analysis. These measures are tested on complex real-world social network data sets such as Video Sharing Networks, Social Interaction Network and Co-Authorship Networks to examine their effects on them. We carry out the correlation analysis of these centralities and plot the results to recommend when to use those centrality measures. Additionally, we introduce a new centrality measure - Cohesion Centrality based on the cohesiveness of a graph, develop its sequential algorithm and further devise a parallel algorithm to implement it.
In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and ...
详细信息
ISBN:
(纸本)9781509028719
In this paper a speculative computation method for IEC 61499 function block (FB) systems is proposed to increase the level of parallelism when executing the FB system and thus to increase system's performance and to reduce response time on input events. Data and control dependencies in FB systems are recognized and defined as a basis for organizing speculative execution of FB algorithms. A simulation model of FB systems with speculative execution based on timed stochastic Petri nets is considered. In addition, the paper discusses the results of simulation experiments conducted in CPN Tools.
Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing n...
详细信息
ISBN:
(纸本)9781467398046
Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing nodes are connected with network, the latency and bandwidth of network perform much worse than Hyper Transport (HT) and PCI-Express (PCI-E) bus. In order to enhance the performance of applications, a Hybrid parallel Framework for Computation-intensive Applications (HPFCA) was proposed. Task distribution, data storage, multicore parallelism and kernel optimization were discussed in the HPFCA. "MPI+OpenMP/Pthreads" mechanism was used for multi-node platforms. MPI was used for distributed memory parallelism, and "OpenMP/Pthreads" was used for shared memory parallelism. Moreover, GEMM and FFT, the representatives of the computation-intensive applications in the Godson-3B, were studied. According to the HPFCA, the parallel algorithms of GEMM and FFT were optimized. Finally, experimental results demonstrated that HPFCA could bring ideal performance in the Godson-3B.
Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and co...
详细信息
ISBN:
(纸本)9781467388153
Applications running on clusters of shared-memory computers are often implemented using OpenMP+MPI. Productivity can be vastly improved using task-based programming, a paradigm where the user expresses the data and control-flow relations between tasks, offering the runtime maximal freedom to place and schedule tasks. While productivity is increased, high-performance execution remains challenging: the implementation of parallel algorithms typically requires specific task placement and communication strategies to reduce internode communications and exploit data locality. In this work, we present a new macro-dataflow programming environment for distributed-memory clusters, based on the Intel Concurrent Collections (CnC) runtime. Our language extensions let the user define virtual topologies, task mappings, task-centric data placement, task and communication scheduling, etc. We introduce a compiler to automatically generate Intel CnC C++ run-time, with key automatic optimizations including task coarsening and coalescing. We experimentally validate our approach on a variety of scientific computations, demonstrating both productivity and performance.
This paper continues to develop a fault tolerant extension of the sparse grid combination technique recently proposed in [B. Harding and M. Hegland, ANZIAM J. Electron. Suppl., 54 (2013), pp. C394-C411]. This approach...
详细信息
This paper continues to develop a fault tolerant extension of the sparse grid combination technique recently proposed in [B. Harding and M. Hegland, ANZIAM J. Electron. Suppl., 54 (2013), pp. C394-C411]. This approach to fault tolerance is novel for two reasons: First, the combination technique adds an additional level of parallelism, and second, it provides algorithm-based fault tolerance so that solutions can still be recovered if failures occur during computation. Previous work indicates how the combination technique may be adapted for a low number of faults. In this paper we develop a generalization of the combination technique for which arbitrary collections of coarse approximations may be combined to obtain an accurate approximation. A general fault tolerant combination technique for large numbers of faults is a natural consequence of this work. Using a renewal model for the time between faults on each node of a high performance computer, we also provide bounds on the expected error for interpolation with this algorithm in the presence of faults. Numerical experiments solving the scalar advection PDE demonstrate that the algorithm is resilient to faults on a real application. It is observed that the time to solution is not significantly affected by the presence of (simulated) faults. Additionally the expected error increases with the number of faults but is relatively small even for high fault rates. A comparison with traditional checkpoint-restart methods applied to the combination technique shows that our approach is highly scalable with respect to the number of faults.
Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too...
详细信息
Single-thread algorithms for global optimization differ in the way computational effort between exploitation and exploration is allocated. This allocation ultimately determines overall performance. For example, if too little emphasis is put on exploration, the globally optimal solution may not be identified. Increasing the allocation of computational effort to exploration increases the chances of identifying a globally optimal solution but it also slows down convergence. Thus, in a single-thread implementation of model-based search exploration and exploitation are substitutes. In this paper we propose a new algorithmic design for global optimization based upon multiple interacting threads. In this design, each thread implements a model-based search in which the allocation of exploration versus exploitation effort does not vary over time. Threads interact through a simple acceptance-rejection rule preventing duplication of search efforts. We show the proposed design provides a speedup effect which is increasing in the number of threads. Thus, in the proposed algorithmic design, exploration is a complement rather than a substitute to exploitation.
Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features members...
详细信息
Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of three-dimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a three-dimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on MovieLens demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches.
In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel vers...
详细信息
In this work, we present a parallel implementation of the Singular Value Decomposition (SVD) method on Graphics Processing Units (GPUs) using CUDA programming model. Our approach is based on an iterative parallel version of the QR factorization by means Givens plane rotations using the Sameh and Kuck scheme. The parallel algorithm is driven by an outer loop executed on the CPU. Therefore, threads and blocks configuration is organized in order to use the shared memory and avoid multiple accesses to global memory. However, the main kernel provides coalesced accesses to global memory using contiguous indices. As case study, we consider the application of the SVD in the Overcomplete Local Principal Component Analysis (OLPCA) algorithm for the Diffusion Weighted Imaging (DWI) denoising process. Our results show significant improvements in terms of performances with respect to the CPU version that encourage its usability for this expensive application.
High computational requirements of current problems have driven most researches towards efficient processing formulations which require the use of multiple processors interconnected, this is the foundation of the para...
详细信息
暂无评论