Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor network...
详细信息
Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.
In this work, novel circuits based on memristors for implementing electronic synapse and artificial neuron are designed. First, two simple synaptic circuits for implementing weighting calculations of voltage and curre...
详细信息
In this work, novel circuits based on memristors for implementing electronic synapse and artificial neuron are designed. First, two simple synaptic circuits for implementing weighting calculations of voltage and current modes using twin memristors are proposed. A synaptic weighting operation is defined as a difference function between the twin memristors, which can be adjusted in reverse by applying programmed signals and conducting positive, zero, and negative synaptic weights. Second, two neuron circuits using the proposed memristor synapses, in which parallel computing and programming can be achieved, are designed. Finally, performances of the proposed memristor synapses and neuron circuits, such as weight programming, neuron computing, and parallel operation, are analyzed through PSpice simulations. (C) 2018 Elsevier B.V. All rightsreserved.
The numerical nonreproducibility in parallel molecular dynamics (MD) simulations, which relates to the non-associate accumulation of float point data, leads to great challenges for development, debugging and validatio...
详细信息
The numerical nonreproducibility in parallel molecular dynamics (MD) simulations, which relates to the non-associate accumulation of float point data, leads to great challenges for development, debugging and validation. The most common solutions to this problem are using a high-precision data type or operation sorting, but these solutions are accompanied by significant computational overhead. This paper analyzes the sources of nonreproducibility in parallel MD simulations in detail. Two general solutions, namely, sorting by force component value and using an 80-bit long double data type, are implemented and evaluated in LAMMPS. To optimize the computational cost, a full-list based method with operation order sorted by particle distance is proposed, which is inspired by the spatial characteristics of MD simulations. An experiment on a system with constant energy dynamics shows that the new method can ensure reproducibility at any parallelism with an extra 50% computational overhead. (C) 2019 Published by Elsevier B.V.
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems r...
详细信息
We present an OpenACC-based parallelization implementation of stochastic algorithms for simulating biochemical reaction networks on modern GPUs (graphics processing units). To investigate the effectiveness of using Op...
详细信息
We present an OpenACC-based parallelization implementation of stochastic algorithms for simulating biochemical reaction networks on modern GPUs (graphics processing units). To investigate the effectiveness of using OpenACC for leveraging the massive hardware parallelism of the GPU architecture, we carefully apply OpenACC's language constructs and mechanisms to implementing a parallel version of stochastic simulation algorithms on the GPU. Using our OpenACC implementation in comparison to both the NVidia CUDA and the CPU-based implementations, we report our initial experiences on OpenACC's performance and programming productivity in the context of GPU-accelerated scientific computing.
Since first introduced in 2008 with the 1.0 specification, OpenCL has steadily evolved over the decade to increase its support for heterogeneous parallel systems. In this paper, we accelerate stochastic simulation of ...
详细信息
Since first introduced in 2008 with the 1.0 specification, OpenCL has steadily evolved over the decade to increase its support for heterogeneous parallel systems. In this paper, we accelerate stochastic simulation of biochemical reaction networks on modern GPUs (graphics processing units) by means of the OpenCL programming language. In implementing the OpenCL version of the stochastic simulation algorithm, we carefully apply its data-parallel execution model to optimize the performance provided by the underlying hardware parallelism of the modern GPUs. To evaluate our OpenCL implementation of the stochastic simulation algorithm, we perform a comparative analysis in terms of the performance using the CPU-based cluster implementation and the NVidia CUDA implementation. In addition to the initial report on the performance of OpenCL on GPUs, we also discuss applicability and programmability of OpenCL in the context of GPU-based scientific computing.
Brain strokes are one of the leading causes of disability and mortality in adults in developed countries. Ischemic stroke (85% of total cases) and hemorrhagic stroke (15%) must be treated with opposing therapies, and ...
详细信息
Brain strokes are one of the leading causes of disability and mortality in adults in developed countries. Ischemic stroke (85% of total cases) and hemorrhagic stroke (15%) must be treated with opposing therapies, and thus, the nature of the stroke must be determined quickly in order to apply the appropriate treatment. Recent studies in biomedical imaging have shown that strokes produce variations in the complex electric permittivity of brain tissues, which can be detected by means of microwave tomography. Here, we present some synthetic results obtained with an experimental microwave tomography-based portable system for the early detection andmonitoring of brain strokes. The determination of electric permittivity first requires the solution of a coupled forward-inverse problem. We make use of massive parallel computation from domain decomposition method and regularization techniques for optimization methods. Synthetic data are obtained with electromagnetic simulations corrupted by noise, which have been derived from measurements errors of the experimental imaging system. Results demonstrate the possibility to detect hemorrhagic strokes with microwave systems when applying the proposed reconstruction algorithm with edge preserving regularization.
Saccadic eye movements move the high-resolution fovea to point at regions of interest. Saccades can only be generated serially (i.e., one at a time). However, what remains unclear is the extent to which saccades are p...
详细信息
Saccadic eye movements move the high-resolution fovea to point at regions of interest. Saccades can only be generated serially (i.e., one at a time). However, what remains unclear is the extent to which saccades are programmed in parallel (i.e., a series of such moments can be planned together) and how far ahead such planning occurs. In the current experiment, we investigate this issue with a saccade contingent preview paradigm. Participants were asked to execute saccadic eye movements in response to seven small circles presented on a screen. The extent to which participants were given prior information about target locations was varied on a trial-by-trial basis: participants were aware of the location of the next target only, the next three, five, or all seven targets. The addition of new targets to the display was made during the saccade to the next target in the sequence. The overall time taken to complete the sequence was decreased as more targets were available up to all seven targets. This was a result of a reduction in the number of saccades being executed and a reduction in their saccade latencies. Surprisingly, these results suggest that, when faced with a demand to saccade to a large number of target locations, saccade preparation about all target locations is carried out in parallel.
A fundamental problem in parallel and distributed processing is the partial serialization that is imposed due to the need for mutually exclusive access to common resources. In this article, we investigate the problem ...
详细信息
A fundamental problem in parallel and distributed processing is the partial serialization that is imposed due to the need for mutually exclusive access to common resources. In this article, we investigate the problem of optimally scheduling (in terms of makespan) a set of jobs, where each job consists of the same number L of unit-duration tasks, and each task either accesses exclusively one resource from a given set of resources or accesses a fully shareable resource. We develop and establish the optimality of a fast polynomial-time algorithm to find a schedule with the shortest makespan for any number of jobs and for any number of resources for the case of L = 2. In the notation commonly used for job-shop scheduling problems, this result means that the problem J vertical bar d(ij) = 1, n(j) =2 vertical bar C-max ax is polynomially solvable, adding to the polynomial solutions known for the problems J2 vertical bar n(j) <= 2 vertical bar C-max and J2 vertical bar d(ij) = 1 vertical bar C-max (whereas other closely related versions such as J2 vertical bar n(j)<= 3 vertical bar C-max, J2 vertical bar d(ij) is an element of {1,2}C-max, J3 vertical bar d(ij) =1 vertical bar C-max, J3 vertical bar d(ij) =1 vertical bar and J vertical bar d(ij) =1, n(j) <= 3 vertical bar C-max are all known to be NP-complete). For the general case L > 2 (i.e., for the job-shop problem J vertical bar d(ij) =1, nj = L >2 vertical bar C-max) we present a competitive heuristic and provide experimental comparisons with other heuristic versions and, when possible, with the ideal integer linear programming formulation.
parallel hardware is today's reality and language extensions that ease exploiting its promised performance ourish. For most mainstream languages, one or more tailored solutions exist that address the specific need...
详细信息
暂无评论