Artificial neural networks are inspired by biological neural networks formed by many real neurons with spiking activities. It is important to simulate the spiking activities under different conditions. It is well know...
详细信息
ISBN:
(纸本)9781538643013
Artificial neural networks are inspired by biological neural networks formed by many real neurons with spiking activities. It is important to simulate the spiking activities under different conditions. It is well known that the Hodgkin-Huxley (HH) equations can be used for simulation. However, we usually don't know the conductance of ion channels in the equation, which is required for simulation. In this paper, we develop a parallel genetic algorithm to estimate the conductance with a visual software tool. By fitting the experimental data, it is shown that when the number of individuals in the genetic algorithm is above 2000, the 5th generation can yield a near optimal solution and achieve a good fitting result.
Withthe increasing performance requirements of network data interaction on chip, the traditional Direct Memory Access (DMA) often performs with low efficiency of multi-module collaboration due to the competition of b...
详细信息
ISBN:
(数字)9781728147437
ISBN:
(纸本)9781728147444
Withthe increasing performance requirements of network data interaction on chip, the traditional Direct Memory Access (DMA) often performs with low efficiency of multi-module collaboration due to the competition of bus control, thereby reducing the throughput rate of the bus. Aiming at the working characteristics of inter-core communication and data interaction between multiple modules, this paper designs a packet transmission module Bi-Transfer that supports bidirectional data flow between multiple modules It also realizes chaining transmission of input and output data in parallel with various configuration modes and flexible data scheduling modes. this module can complete the functions of data movement, inter-core communication, task management, etc., and use descriptors to link load data packets to uniformly conFigure the data interaction mode of the on-chip network. In the experimental stage, function planning and code design are carried out first, then the timing simulation and data recording are described. Finally, the performance characteristics of this new data interaction module are discussed. According to statistical observations, increasing the number of channels in the design module can significantly increase the bandwidth of the bus. Under the given clock frequency, bus width and data transmission requirements, the bus operating bandwidth in the four-channel working mode can reach up to 4690 MB/s at most, which is nearly 400 MB/s beyond the ordinary DMA bus bandwidth. In the case of fewer channels, this module can fulfill the data scheduling requirements of most applications and provide the data moving function of ordinary DMA.
With increasing easy access to internet there is an emergence of e-commerce and social media portals. there is a huge surge in the production of the human sentiments in form of customer reviews and feedback on these p...
详细信息
ISBN:
(纸本)9781728106465
With increasing easy access to internet there is an emergence of e-commerce and social media portals. there is a huge surge in the production of the human sentiments in form of customer reviews and feedback on these platforms. As per a survey approximately 2,500,000 Terabytes of data is created every day and 90 % of the data which exist today has been created in the past 2 years only. this huge amount of data created is called big data. the main problem associated withthis huge data is that it is in unstructured form. So, to gain information, first we must process it using various methods. this paper proposes a Human sentiment analysis model (HSAM), which can perform sentiment analysis on any given data set.
the reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallelalgorithms exist, the Hessenberg reduction is one of the bottlenecks ...
详细信息
ISBN:
(纸本)9783319780245;9783319780238
the reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallelalgorithms exist, the Hessenberg reduction is one of the bottlenecks in AED, a main part in state-of-the-art software for the distributed multishift QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the sensitivity of its algorithmic parameters. the proposed algorithm is faster than LAPACK for all problem sizes and faster than ScaLAPACK for the relatively small problem sizes typical for AED.
作者:
Zafari, AfshinUppsala Univ
Div Comp Sci Dept Informat Technol Lagerhyddsvagen 2 S-75237 Uppsala Sweden
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software...
详细信息
ISBN:
(纸本)9783319780245;9783319780238
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks to provide these outcomes to the programmer, while making the underlying hardware architecture transparent to her. However, since programs are not portable between these frameworks, using one framework or the other is still a vital decision by the programmer whose concerns are expandability, adaptivity, maintainability and interoperability of the programs. In this work, we propose a unified programming interface that a programmer can use for working with different task based parallel frameworks transparently. In this approach we abstract the common concepts of task based parallel programming and provide them to the programmer in a single programming interface uniformly for all frameworks. We have tested the interface by running programs which implement matrix operations within frameworks that are optimized for shared and distributed memory architectures and accelerators, while the cooperation between frameworks is configured externally with no need to modify the programs. Further possible extensions of the interface and future potential research are also described.
Multi-objective optimization is a powerful tool that has been successfully applied to many fields but has seen minimal use in the design and development of nuclear power plant systems. When applied to design, multi-ob...
详细信息
ISBN:
(纸本)9780791851517
Multi-objective optimization is a powerful tool that has been successfully applied to many fields but has seen minimal use in the design and development of nuclear power plant systems. When applied to design, multi-objective optimization involves the manipulation of key design parameters in order to develop optimal designs. these design parameters include continuous and/or discrete variables and represent the physical design specifications. they are modified across a specific design space to accomplish a number of set objective functions, representing the goals for both system design and performance, which conflict and cannot be combined into a single objective function. In this paper, a non-dominated sorting genetic algorithm (NSGA) and parallelprocessing in Python 3 were used to optimize the design of the passive endothermic reaction cooling system (PERCS) model developed in RELAP5/MOD 3.3. this system has been proposed as a retrofit to currently operating light water reactors (LWR) and is designed to remove decay heat from the reactor core via the endothermic decomposition of magnesium carbonate (MgCO3) and natural circulation of the reactor coolant. the PERCS design is currently a shell-and-tube heat exchanger, withthe coolant flowing through the tube side and MgCO3 on the shell side. During a station blackout (SBO), the PERCS initially keeps the reactor core outlet temperature from exceeding 635 K and then reduces it to below 620 K for 30 days. the optimization of the PERCS was performed withthree different objectives: (1) minimization of equipment costs, (2) minimization of deviation of the core outlet temperature during a SBO from its normal operation steady-state value, and (3) minimization of fractional consumption of MgCO3, a metric that is measurable and directly related to the operating time of the PERCS. the manipulated parameters of the optimization include the radius of the PERCS shell, the pitch, hydraulic diameter, thickness and length of the PER
In this paper, we are extending the usage of CORDIC algorithm from trigonometric mode, which has been the primary use of it from a very long time, to hyperbolic mode. Here, we have implemented Radix '4' hyperb...
详细信息
ISBN:
(纸本)9789811086366;9789811086359
In this paper, we are extending the usage of CORDIC algorithm from trigonometric mode, which has been the primary use of it from a very long time, to hyperbolic mode. Here, we have implemented Radix '4' hyperbolic CORDIC in vectoring mode for fast and efficient computing of Square-Root of a number. the simulation results are implemented on FPGA platform. Based on simulation results, comparison of Radix '4' CORDIC with Radix '2' CORDIC is presented too.
Use of computational cluster for large-scale Big Data processing has attracted attention as a technology trend for its time efficiency. Modern cluster equipped with latest multi, many-core distributed shared architect...
详细信息
ISBN:
(纸本)9781728106465
Use of computational cluster for large-scale Big Data processing has attracted attention as a technology trend for its time efficiency. Modern cluster equipped with latest multi, many-core distributed shared architecture, high speed interconnect and file system, ensures high performance using message passing and multi-threading parallel approaches, also handles batch, micro-batch and stream processing of high dimensional massive dataset but running data-intensive Big Data application on compute-centric cluster imposes challenges to its performance because of several runtime overheads. In order to alleviate these bottlenecks and exploit full potential of the cluster a state of the practice, performance-oriented technical analysis covering all relevant aspects is presented in the context of Terascale Big data processing on TeraFLOPS cluster PARAM-Kanchenjunga, with identification of major factors influencing the performance or sources of these overheads related to computation, communication or IPC, memory, I/O contention, scheduling, load imbalance, synchronization, latency and network jitter;by determining their impact. As existing approaches found insufficient, to achieve possible speedup advance methods with a variety of alternatives as RDMA enabled libraries, PFS, MPI-Integrated extensions, loop tiling, hybrid parallelization are provided to consider for optimization purposes. this paper will assist to prepare performance aware design of experiments and performance modeling.
Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. this paper describes and analyzes OuterSPACE, an accelerator targeted at applications that involv...
详细信息
ISBN:
(纸本)9781538636596
Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. this paper describes and analyzes OuterSPACE, an accelerator targeted at applications that involve large sparse matrices. OuterSPACE is a highly-scalable, energy-efficient, reconfigurable design, consisting of massively parallel Single Program, Multiple Data (SPMD)style processing units, distributed memories, high-speed crossbars and High Bandwidth Memory (HBM). We identify redundant memory accesses to non-zeros as a key bottleneck in traditional sparse matrix-matrix multiplication algorithms. To ameliorate this, we implement an outer product based matrix multiplication technique that eliminates redundant accesses by decoupling multiplication from accumulation. We demonstrate that traditional architectures, due to limitations in their memory hierarchies and ability to harness parallelism in the algorithm, are unable to take advantage of this reduction without incurring significant overheads. OuterSPACE is designed to specifically overcome these challenges. We simulate the key components of our architecture using gem5 on a diverse set of matrices from the University of Florida's SuiteSparse collection and the Stanford Network Analysis Project and show a mean speedup of 7.9x over Intel Math Kernel Library on a Xeon CPU, 13.0x against cuSPARSE and 14.0x against CUSP when run on an NVIDIA K40 GPU, while achieving an average throughput of 2.9 GFLOPS within a 24 W power budget in an area of 87 mm2.
the proceedings contain 30 papers. the special focus in this conference is on TAPEMS 2016 and BigTrust 2016. the topics include: An in-memory event tracing extension to the open trace format 2;network-aware optimizati...
ISBN:
(纸本)9783319499550
the proceedings contain 30 papers. the special focus in this conference is on TAPEMS 2016 and BigTrust 2016. the topics include: An in-memory event tracing extension to the open trace format 2;network-aware optimization of MPDATA on homogeneous multi-core clusters with heterogeneous network;formalizing data locality in task parallel applications;improving the energy efficiency of evolutionary multi-objective algorithms;a parallel model for heterogeneous cluster;comparative analysis of OpenACC compilers;traffic sign recognition based on parameter-free detector and multi-modal representation;reversible data hiding using non-local means prediction;secure data access in hadoop using elliptic curve cryptography;statistical analysis of CCM.M-K1 international comparison based on monte carlo method;redundancy elimination in the ExaStencils code generator;a dataflow IR for memory efficient RIPL compilation to FPGAs;exploring a distributed iterative reconstructor based on split bregman using PETSc;implementation of the beamformer algorithm for the NVIDIA jetson;efficiency of GPUs for relational database engine processing;geocon;a middleware for location-aware ubiquitous applications and cellular ANTomata as engines for highly parallel pattern processing.
暂无评论