this paper presents the physical implementation of the digital part of an OFDM-based baseband modem for point-to-multipoint fixed broadband wireless access (FBWA) solutions. It is compliant withthe corresponding IEEE...
详细信息
ISBN:
(纸本)9781467396806
this paper presents the physical implementation of the digital part of an OFDM-based baseband modem for point-to-multipoint fixed broadband wireless access (FBWA) solutions. It is compliant withthe corresponding IEEE 802.16 standard and compatible to a fixed WiMAX profile. the adopted realization approach is based on an array of processing elements belonging to a case of computing systems characterized by having hundreds of embedded processing elements and memories (massively parallel processor arrays). the approach offers the performance, the computational density and the programmability needed for the implementation of modern wireless communication systems.
In this paper we present an approach to the parallel simulation of the heart electrical activity using the finite element method withthe help of the FEniCS automated scientific computing framework. FEniCS allows scie...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
In this paper we present an approach to the parallel simulation of the heart electrical activity using the finite element method withthe help of the FEniCS automated scientific computing framework. FEniCS allows scientific software development using the near-mathematical notation and provides automatic parallelization on MPI clusters. We implemented the ten Tusscher-Panfilov (TP06) cell model of cardiac electrical activity. the scalability testing of the implementation was performed using up to 240 CPU cores and the 95 times speedup was achieved. We evaluated various combinations of the Krylov parallel linear solvers and the preconditioners available in FEniCS. the best performance was provided by the conjugate gradient method and the biconjugate gradient stabilized method solvers withthe successive over-relaxation preconditioner. Since the FEniCS-based implementation of TP06 model uses notation close to the mathematical one, it can be utilized by computational mathematicians, biophysicists, and other researchers without extensive parallel computing skills.
In today's data-driven world, economy and research depend on the analysis of empirical datasets to guide decision making. these applications often encompass a rich variety of data types and special purpose process...
详细信息
ISBN:
(纸本)9789897581939
In today's data-driven world, economy and research depend on the analysis of empirical datasets to guide decision making. these applications often encompass a rich variety of data types and special purpose processing models. We believe, the database system of the future will integrate flexible processing and storage of a variety of data types in a scalable and integrated end-to-end solution. In this paper, we propose a database system architecture that is designed from the core to support these goals. In the discussion we will especially focus on the multi-domain programming concept of the proposed architecture that exploits domain specific knowledge to guide compiler based optimization.
the aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. the great challenge is to maintain the scalability and efficiency of massivel...
详细信息
ISBN:
(纸本)9781509051465
the aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. the great challenge is to maintain the scalability and efficiency of massively parallel and distributed computational system when the intensive big data processed by its applications is widely increased. Besides, the proposed middleware implements a new cooperative micro-services team works model for massively parallel and distributed computing. this model is constituted by distributed micro-services as Micro-service Virtual processing Units (MsVPUs) with integrated load balancing service and an AMQP communication protocol that grant HPC. the paper shows the proposed distributed computational scheme and its integrated middleware accompanying by some experimental results.
the Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. this relies on efficient sorting algorith...
详细信息
ISBN:
(纸本)9783319321493;9783319321486
the Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. this relies on efficient sorting algorithms. We present a bucket sort algorithm with small memory footprint for the PIC method targeting Graphics processing Units (GPUs). Our sorting algorithm shows an increased performance withthe amount of storage provided and withthe orderliness of the particles. For our application where particles are presorted it performs better and requires less memory than other sorting algorithms in the literature. the overall PIC algorithm performs at its best if the sorting is applied.
the proceedings contain 70 papers. the topics discussed include: effect of the memristor threshold current on memristor-based min-max circuits;STBC-OFDM communication systems with sub-sampling support;parallel applica...
ISBN:
(纸本)9781467396806
the proceedings contain 70 papers. the topics discussed include: effect of the memristor threshold current on memristor-based min-max circuits;STBC-OFDM communication systems with sub-sampling support;parallel application placement onto 3-D reconfigurable architectures;optimized built-in self-calibration of RF SoCs;Opamp-based synthesis of a fractional order switched system;designing Moore FSM with unstandard representation of state codes;efficient baseband modem physical implementation for fixed broadband wireless access networks;designing LUT-based mealy FSM with transformation of collections of output functions;wideband common gate LNA with novel input matching technique;miniature high resolution FMCW SAR system;development of the coincidence sorter for the INSIDE PET system;the VINEYARD project: versatile integrated accelerator-based heterogeneous data centres;voltage control of single-phase induction motors using asymmetrical PWM and fuzzy logic;PWL function-based model of a beta cell;and a novel signal processing method based on the frequency modality for intra-body medical instrument tracking.
this paper presents an overview of the ATLAS Fast TracKer (FTK) processor, reporting the design of the system, its expected performance, and the current integration status. the FTK is an upgrade of the trigger system ...
详细信息
ISBN:
(纸本)9781467396806
this paper presents an overview of the ATLAS Fast TracKer (FTK) processor, reporting the design of the system, its expected performance, and the current integration status. the FTK is an upgrade of the trigger system of the ATLAS experiment. the system is designed to reduce the event rate from the proton-proton collisions occurring at 40 MHz to about 1 kHz for the expected LHC luminosity (2x10(34) cm(-2)s(-1)). To achieve this selection rate, the FTK system must exploit an intensive use of particle tracking. To this purpose, a dedicated hardware tracker has been designed: the FTK processor. To achieve the required performance, FTK uses a combination of custom VLSI chips and latest generation FPGAs, all embedded in dedicated boards, and it exploits a fully parallel architecture. FTK provides track reconstruction based on the full silicon (inner) detector with resolution comparable to the offline reconstruction with a latency of approximately 100 mu s.
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling too...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling tools have been developed to understand these characteristics, the interplay between task scheduling and data reuse in the cache hierarchy has not been explored. these interactions are particularly intriguing due to the flexibility task-based runtimes have in scheduling tasks, which may allow them to improve cache behavior. this work presents StatTask, a novel statistical cache model that can predict cache behavior for arbitrary task schedules and cache sizes from a single execution, without programmer annotations. StatTask enables fast and accurate modeling of data locality in task-based applications for the first time. We demonstrate the potential of this new analysis to scheduling by examining applications from the BOTS benchmarks suite, and identifying several important opportunities for reuse-aware scheduling.
the rapid growth of supercomputer technologies became a driver for the development of natural sciences. Most of the discoveries in astronomy, in physics of elementary particles, in the design of new materials in the D...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
the rapid growth of supercomputer technologies became a driver for the development of natural sciences. Most of the discoveries in astronomy, in physics of elementary particles, in the design of new materials in the DNA research are connected with numerical simulation and with supercomputers. Supercomputer simulation became an important tool for the processing of the great volume of the observation and experimental data accumulated by the mankind. Modern scientific challenges put the actuality of the works in computer systems and in the scientific software design to the highest level. the architecture of the future exascale systems is still being discussed. Nevertheless, it is necessary to develop the algorithms and software for such systems right now. It is necessary to develop software that is capable of using tens and hundreds of thousands of processors and of transmitting and storing of large volumes of data. In the present work the technology for the development of such algorithms and software is proposed. As an example of the use of the technology, the process of the software development is considered for some problems of astrophysics.
We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen's algorithms) on multicore CPUs with a Graphics processing Unit (GPU). though such algorithms are needed in many scien...
详细信息
ISBN:
(纸本)9783319321493;9783319321486
We study the performance of dense symmetric indefinite factorizations (Bunch-Kaufman and Aasen's algorithms) on multicore CPUs with a Graphics processing Unit (GPU). though such algorithms are needed in many scientific and engineering simulations, obtaining high performance of the factorization on the GPU is difficult because the pivoting that is required to ensure the numerical stability of the factorization leads to frequent synchronizations and irregular data accesses. As a result, until recently, there has not been any implementation of these algorithms on hybrid CPU/GPU architectures. To improve their performance on the hybrid architecture, we explore different techniques to reduce the expensive communication and synchronization between the CPU and GPU, or on the GPU. We also study the performance of an LDLT factorization with no pivoting combined withthe preprocessing technique based on Random Butterfly Transformations. though such transformations only have probabilistic results on the numerical stability, they avoid the pivoting and obtain a great performance on the GPU.
暂无评论