In this paper we evaluate a new coalesced data and kernel scheme used to reduce the execution costs of cardiac simulations that run on multi-GPU environments. the new scheme was tested for an important part of the sim...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In this paper we evaluate a new coalesced data and kernel scheme used to reduce the execution costs of cardiac simulations that run on multi-GPU environments. the new scheme was tested for an important part of the simulator, the solution of the systems of Ordinary Differential Equations (ODEs). the results have shown that the proposed scheme is very effective. the execution time to solve the systems of ODEs on the multi-GPU environment was reduced by half, when compared to a scheme that does not implemented the proposed data and kernel coalescing. As a result, the total execution time of cardiac simulations was 25% faster.
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduct...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduction and analysis of CCM.M-K1 measurement result was given out and commercial software named @RISK was used to purse numerical simulation and the result was compared withthe final report of CCM.M-K1, which showed that differences between results of these two were negligible.
this paper presents the physical implementation of the digital part of an OFDM-based baseband modem for point-to-multipoint fixed broadband wireless access (FBWA) solutions. It is compliant withthe corresponding IEEE...
详细信息
ISBN:
(纸本)9781467396806
this paper presents the physical implementation of the digital part of an OFDM-based baseband modem for point-to-multipoint fixed broadband wireless access (FBWA) solutions. It is compliant withthe corresponding IEEE 802.16 standard and compatible to a fixed WiMAX profile. the adopted realization approach is based on an array of processing elements belonging to a case of computing systems characterized by having hundreds of embedded processing elements and memories (massively parallel processor arrays). the approach offers the performance, the computational density and the programmability needed for the implementation of modern wireless communication systems.
Computer scientists and programmers face the difficultly of improving the scalability of their applications while using conventional programming techniques only. As a base-line hypothesis of this paper we assume that ...
详细信息
ISBN:
(纸本)9781509028252
Computer scientists and programmers face the difficultly of improving the scalability of their applications while using conventional programming techniques only. As a base-line hypothesis of this paper we assume that an advanced runtime system can be used to take full advantage of the available parallel resources of a machine in order to achieve the highest parallelism possible. In this paper we present the capabilities of HPX - a distributed runtime system for parallel applications of any scale - to achieve the best possible scalability through asynchronous task execution [1]. OP2 is an active library which provides a framework for the parallel execution for unstructured grid applications on different multi-core/many-core hardware architectures [2]. OP2 generates code which uses OpenMP for loop parallelization within an application code for both single-threaded and multi-threaded machines. In this work we modify the OP2 code generator to target HPX instead of OpenMP, i.e. port the parallel simulation backend of OP2 to utilize HPX. We compare the performance results of the different parallelization methods using HPX and OpenMP for loop parallelization within the Airfoil application. the results of strong scaling and weak scaling tests for the Airfoil application on one node with up to 32 threads are presented. Using HPX for parallelization of OP2 gives an improvement in performance by 5%-21%. By modifying the OP2 code generator to use HPX's parallelalgorithms, we observe scaling improvements by about 5% as compared to OpenMP. To fully exploit the potential of HPX, we adapted the OP2 API to expose a future and dataflow based programming model and applied this technique for parallelizing the same Airfoil application. We show that the dataflow oriented programming model, which automatically creates an execution tree representing the algorithmic data dependencies of our application, improves the overall scaling results by about 21% compared to OpenMP. Our results show
In this paper we present an approach to the parallel simulation of the heart electrical activity using the finite element method withthe help of the FEniCS automated scientific computing framework. FEniCS allows scie...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
In this paper we present an approach to the parallel simulation of the heart electrical activity using the finite element method withthe help of the FEniCS automated scientific computing framework. FEniCS allows scientific software development using the near-mathematical notation and provides automatic parallelization on MPI clusters. We implemented the ten Tusscher-Panfilov (TP06) cell model of cardiac electrical activity. the scalability testing of the implementation was performed using up to 240 CPU cores and the 95 times speedup was achieved. We evaluated various combinations of the Krylov parallel linear solvers and the preconditioners available in FEniCS. the best performance was provided by the conjugate gradient method and the biconjugate gradient stabilized method solvers withthe successive over-relaxation preconditioner. Since the FEniCS-based implementation of TP06 model uses notation close to the mathematical one, it can be utilized by computational mathematicians, biophysicists, and other researchers without extensive parallel computing skills.
In today's data-driven world, economy and research depend on the analysis of empirical datasets to guide decision making. these applications often encompass a rich variety of data types and special purpose process...
详细信息
ISBN:
(纸本)9789897581939
In today's data-driven world, economy and research depend on the analysis of empirical datasets to guide decision making. these applications often encompass a rich variety of data types and special purpose processing models. We believe, the database system of the future will integrate flexible processing and storage of a variety of data types in a scalable and integrated end-to-end solution. In this paper, we propose a database system architecture that is designed from the core to support these goals. In the discussion we will especially focus on the multi-domain programming concept of the proposed architecture that exploits domain specific knowledge to guide compiler based optimization.
the aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. the great challenge is to maintain the scalability and efficiency of massivel...
详细信息
ISBN:
(纸本)9781509051465
the aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. the great challenge is to maintain the scalability and efficiency of massively parallel and distributed computational system when the intensive big data processed by its applications is widely increased. Besides, the proposed middleware implements a new cooperative micro-services team works model for massively parallel and distributed computing. this model is constituted by distributed micro-services as Micro-service Virtual processing Units (MsVPUs) with integrated load balancing service and an AMQP communication protocol that grant HPC. the paper shows the proposed distributed computational scheme and its integrated middleware accompanying by some experimental results.
the Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. this relies on efficient sorting algorith...
详细信息
ISBN:
(纸本)9783319321493;9783319321486
the Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. this relies on efficient sorting algorithms. We present a bucket sort algorithm with small memory footprint for the PIC method targeting Graphics processing Units (GPUs). Our sorting algorithm shows an increased performance withthe amount of storage provided and withthe orderliness of the particles. For our application where particles are presorted it performs better and requires less memory than other sorting algorithms in the literature. the overall PIC algorithm performs at its best if the sorting is applied.
the proceedings contain 70 papers. the topics discussed include: effect of the memristor threshold current on memristor-based min-max circuits;STBC-OFDM communication systems with sub-sampling support;parallel applica...
ISBN:
(纸本)9781467396806
the proceedings contain 70 papers. the topics discussed include: effect of the memristor threshold current on memristor-based min-max circuits;STBC-OFDM communication systems with sub-sampling support;parallel application placement onto 3-D reconfigurable architectures;optimized built-in self-calibration of RF SoCs;Opamp-based synthesis of a fractional order switched system;designing Moore FSM with unstandard representation of state codes;efficient baseband modem physical implementation for fixed broadband wireless access networks;designing LUT-based mealy FSM with transformation of collections of output functions;wideband common gate LNA with novel input matching technique;miniature high resolution FMCW SAR system;development of the coincidence sorter for the INSIDE PET system;the VINEYARD project: versatile integrated accelerator-based heterogeneous data centres;voltage control of single-phase induction motors using asymmetrical PWM and fuzzy logic;PWL function-based model of a beta cell;and a novel signal processing method based on the frequency modality for intra-body medical instrument tracking.
this paper presents an overview of the ATLAS Fast TracKer (FTK) processor, reporting the design of the system, its expected performance, and the current integration status. the FTK is an upgrade of the trigger system ...
详细信息
ISBN:
(纸本)9781467396806
this paper presents an overview of the ATLAS Fast TracKer (FTK) processor, reporting the design of the system, its expected performance, and the current integration status. the FTK is an upgrade of the trigger system of the ATLAS experiment. the system is designed to reduce the event rate from the proton-proton collisions occurring at 40 MHz to about 1 kHz for the expected LHC luminosity (2x10(34) cm(-2)s(-1)). To achieve this selection rate, the FTK system must exploit an intensive use of particle tracking. To this purpose, a dedicated hardware tracker has been designed: the FTK processor. To achieve the required performance, FTK uses a combination of custom VLSI chips and latest generation FPGAs, all embedded in dedicated boards, and it exploits a fully parallel architecture. FTK provides track reconstruction based on the full silicon (inner) detector with resolution comparable to the offline reconstruction with a latency of approximately 100 mu s.
暂无评论