The prototyping and the development of computational codes for biological models, in terms of reliability, efficient and portable building blocks allow to simulate real cerebral behaviours and to validate theories and...
详细信息
The prototyping and the development of computational codes for biological models, in terms of reliability, efficient and portable building blocks allow to simulate real cerebral behaviours and to validate theories and experiments. A critical issue is the tuning of a model by means of several numerical simulations with the aim to reproduce real scenarios. This requires a huge amount of computational resources to assess the impact of parameters that influence the neuronal response. In this paper, we describe how parallel tools are adopted to simulate the so- called depolarization block of a CA1 pyramidal cell of hippocampus. Here, the high performance computing techniques are adopted in order to achieve a more efficient model simulation. Finally, we analyse the performance of this neural model, investigating the scalability and benefits on multi-core and on parallel and distributed architectures.
The higher speed is the eternal pursuit of any chemometric algorithm. In order to take full advantage of the multi-core processor's computing resources (the prevailing current of personal computer) and accelerate ...
详细信息
The higher speed is the eternal pursuit of any chemometric algorithm. In order to take full advantage of the multi-core processor's computing resources (the prevailing current of personal computer) and accelerate the time-consuming algorithms in chemometrics, a novel multi-core computing method is introduced. Leave-one-out cross-validation is taken as an example to show the powerful capability of the multi-core computing. The comparison results show that the execution time drops rapidly with the increasing number of computingcores, which demonstrates that the multi-core computing is a promising tool for solving computing-intensive and data-intensive problems in chemometrics. (C) 2008 Published by Elsevier B.V.
As one of the key steps in the processing of airborne light detection and ranging (LiDAR) data, filtering often consumes a huge amount of time and physical memory. Conventional sequential algorithms are often ineffici...
详细信息
As one of the key steps in the processing of airborne light detection and ranging (LiDAR) data, filtering often consumes a huge amount of time and physical memory. Conventional sequential algorithms are often inefficient in filtering massive point clouds, due to their huge computational cost and Input/Output (I/O) bottlenecks. The progressive TIN (Triangulated Irregular Network) densification (PTD) filter is a commonly employed iterative method that mainly consists of the TIN generation and the judging functions. However, better quality from the progressive process comes at the cost of increasing computing time. Fortunately, it is possible to take advantage of state-of-the-art multi-core computing facilities to speed up this computationally intensive task. A streaming framework for filtering point clouds by encapsulating the PTD filter into independent computing units is proposed in this paper. Through overlapping multiple computing units and the I/O events, the efficiency of the proposed method is improved greatly. More importantly, this framework is adaptive to many filters. Experiments suggest that the proposed streaming PTD (SPTD) is able to improve the performance of massive point clouds processing and alleviate the I/O bottlenecks. The experiments also demonstrate that this SPTD allows the quick processing of massive point clouds with better adaptability. In a 12-core environment, the SPTD gains a speedup of 7.0 for filtering 249 million points.
This paper presents an efficient and scalable experimental environment for distributed execution of replicated simulators. By taking a performance-centered approach, the proposed technique makes the best use of distri...
详细信息
This paper presents an efficient and scalable experimental environment for distributed execution of replicated simulators. By taking a performance-centered approach, the proposed technique makes the best use of distributed hardware resources for faster data collection. Accordingly, the primary contribution of this work is to describe how the environment improves scalability and utilizes distributed hardware resources efficiently. To do this, we suggest a new concept of single simulation multiple scenarios and propose a distributed execution simulation framework regarding the following three aspects: (1) layered architecture model design;(2) protocol definitions interacting with them;and (3) framework implementation. The proposed model architecture and protocol definitions guarantee a straightforward structural scalability and an efficient load-balanced utilization between hardware resources. Moreover, the framework operates simulation execution automatically without users' extra work. In order to prove the efficiency of the proposed framework, we performed three extensive experiments with different models, that is, different systems. The experimental results show that simulation performance increases proportionally with the number of hardware resources, minimizing the overhead of the proposed framework's utilization.
Gaussian elimination is used in many applications and in particular in the solution of systems of linear equations. This paper presents mathematical performance models and analysis of four parallel Gaussian Eliminatio...
详细信息
Gaussian elimination is used in many applications and in particular in the solution of systems of linear equations. This paper presents mathematical performance models and analysis of four parallel Gaussian Elimination methods ( precisely the Original method and the new Meet in the Middle -MiM- algorithms and their variants with SIMD vectorization) on multi-core systems. Analytical performance models of the four methods are formulated and presented followed by evaluations of these models with modern multi-core systems' operation latencies. Our results reveal that the four methods generally exhibit good performance scaling with increasing matrix size and number of cores. SIMD vectorization only makes a large difference in performance for low number of cores. For a large matrix size ( n >= 16 K), the performance difference between the MiM and Original methods falls from 16 k with four cores to 4x with 16 K cores. The efficiencies of all four methods are low with 1 K cores or more stressing a major problem of multi-core systems where the network-on-chip and memory latencies are too high in relation to basic arithmetic operations. Thus Gaussian Elimination can greatly benefit from the resources of multi-core systems, but higher performance gains can be achieved if multi-core systems can be designed with lower memory operation, synchronization, and interconnect communication latencies, requirements of utmost importance and challenge in the exascale computing age. (C) 2013 Production and hosting by Elsevier B.V. on behalf of King Saud University.
This paper addresses performance issues encountered in parallel functional gate-level simulation executed on multi-core machine. It demonstrates that a straightforward application of the multi-core simulation on a mul...
详细信息
ISBN:
(纸本)9781479932467
This paper addresses performance issues encountered in parallel functional gate-level simulation executed on multi-core machine. It demonstrates that a straightforward application of the multi-core simulation on a multi-core machine does not improve simulation performance. This is due to unbalanced partitioning, lack of sufficient concurrency in the design partitions, overhead due to communication between partitions, and synchronization overhead imposed by the simulator. We propose, implement and automate a generic (partitioning-independent) prediction-based solution to eliminate or minimize communication and synchronization overhead in an event-driven functional gate-level simulation on a multi-core machine. We demonstrate speedup obtained with this method on a set of real Opensource designs.
Interrupt affinitization of network interface cards (NICs) is a fundamental composition that defines how packets are processed by which CPU-cores on multi-core platforms. In this paper, we propose a simple port-config...
详细信息
ISBN:
(纸本)9781467309219;9781467309202
Interrupt affinitization of network interface cards (NICs) is a fundamental composition that defines how packets are processed by which CPU-cores on multi-core platforms. In this paper, we propose a simple port-configuration assisted scheme to attain an optimal affinitization for packet forwarding applications. Experiments ranging from bridging, routing, flow tracking to deep packet inspection are conducted to show the performance impacts utilizing different affinitization approaches. As a result, our proposed scheme achieves the same performance level as the best fixed affinitization scheme. In addition, the effectiveness of interrupt balancing is demonstrated for our scheme to be superior to the widely-deployed irqbalance with varying network settings.
In this work, we discuss a family of parallel implicit time integrators for multi-core and potentially multi-node or multi-gpgpu systems. The method is an extension of Revisionist Integral Deferred Correction (RIDC) b...
详细信息
In this work, we discuss a family of parallel implicit time integrators for multi-core and potentially multi-node or multi-gpgpu systems. The method is an extension of Revisionist Integral Deferred Correction (RIDC) by Christlieb, Macdonald and Ong (SISC-2010) which constructed parallel explicit time integrators. The key idea is to re-write the defect correction framework so that, after initial startup costs, each correction loop can be lagged behind the previous correction loop in a manner that facilitates running the predictor and correctors in parallel. In this paper, we show that RIDC provides a framework to use p cores to generate a pth-order implicit solution to an initial value problem (IVP) in approximately the same wall clock time as a single core, backward Euler implementation (pa parts per thousand currency sign12). The construction, convergence and stability of the schemes are presented, along with supporting numerical evidence.
Given the recent emergence of multi-core and distributed computing that is transforming mainstream application areas in industry, demand is rising for teaching more parallelism and concurrency in CS curricula. We argu...
详细信息
ISBN:
(纸本)9781450305006
Given the recent emergence of multi-core and distributed computing that is transforming mainstream application areas in industry, demand is rising for teaching more parallelism and concurrency in CS curricula. We argue for teaching these topics incrementally in CS courses at all undergraduate levels, and propose a comprehensive approach involving flexible teaching modules with experiential programming exercises, technical and instructor supplementary materials, and an online community of educators to support adopters and module contributors. Progress on developing these materials and online resources is reported.
暂无评论