The paper presents the use of a membrane computing model for specifying a synthetic biology pulse generator example and discusses some simulation results produced by the tools associated with this model and compare th...
详细信息
The paper presents the use of a membrane computing model for specifying a synthetic biology pulse generator example and discusses some simulation results produced by the tools associated with this model and compare their performances. The results show the potential of the simulation approach over the other analysis tools like model checkers.
Graphics Processing Units (GPUs) have a huge number of cores to speed up graphical computations and they are being used in a wide area of general-purpose applications that require high performances. In this paper, GPU...
详细信息
Graphics Processing Units (GPUs) have a huge number of cores to speed up graphical computations and they are being used in a wide area of general-purpose applications that require high performances. In this paper, GPU computing is exploited to model the signal propagation and the interference in large RFID systems, which are a promising solution for achieving pervasive computing since they offer the automatic object identification. The speedup of the parallel algorithm is evaluated with respect to a sequential version. Two popular frameworks for general-purpose computing on GPU are considered in the comparison, i.e. CUDA and OpenCL, and distinct implementations are provided for them, highlighting their differences in code optimization and performance.
We present a new high speed cycle-approximate simulator, addressing an important, neglected category of multicore systems: deeply-embedded cache-incoherent MPSoCs. We take advantage of the unique properties of these s...
详细信息
ISBN:
(纸本)9781479901036
We present a new high speed cycle-approximate simulator, addressing an important, neglected category of multicore systems: deeply-embedded cache-incoherent MPSoCs. We take advantage of the unique properties of these systems to increase the parallelism of the simulation. In doing so we achieve performance not possible using previous simulation techniques, without compromising the accuracy of the results. We present quantitative performance results across a large range of simulated NoC designs, comprising 1 to 64 cores. On average we simulate at 5.9 MIPS, with simulation speeds reaching 373 MIPS in the best case. Comparing against FPGA implementations we demonstrate that the simulator manages this with an average timing error of only 2.1%.
Task graphs provide an efficient model of computation for specification, analysis, and implementation of concurrent applications. In this paper, we present a novel approach for mapping the class of series-parallel tas...
详细信息
ISBN:
(纸本)9781479901036
Task graphs provide an efficient model of computation for specification, analysis, and implementation of concurrent applications. In this paper, we present a novel approach for mapping the class of series-parallel task graphs onto multi-core architectures based on pattern matching. Both the topology of the graph and the state of the tasks are encoded as a stream of tokens, which is iteratively rewritten at multiple positions in parallel. Hence, our technique is most useful for compute-intensive applications that must adapt to frequently varying and unpredictable workload at runtime. Several complex examples have been evaluated on a multi-core architecture and the experimental results show the effectiveness of our approach.
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of two physical systems. As a first b...
详细信息
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of two physical systems. As a first benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. The second application we consider is a reactive fluid-dynamics problem for which we resolve the full Navier-Stokes compressible set of equations without resorting to a turbulence model. The results show that the performances of an Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor in cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. All source codes are provided for inspection and for double-checking the results.
Long Term Evolution (LTE) of UMTS Terrestrial Radio Access and Radio Access Network is a Fourth Generation wireless broadband technology which is capable of providing backward compatibility with 2G (Second Generation)...
详细信息
Long Term Evolution (LTE) of UMTS Terrestrial Radio Access and Radio Access Network is a Fourth Generation wireless broadband technology which is capable of providing backward compatibility with 2G (Second Generation) and 3G (Third Generation) technologies. LTE is able to deliver high data rate and low latency with reduced cost. It uses the frame structure as Time Division Duplexing (TDD) Land Frequency Division Duplexing (FDD). The broadcast channel in LTE is Physical Broadcast Channel (PBCH) information is divided into two categories one is Master Information Block (MIB) consists of a limited number of the most frequently transmitted parameters essential for initial access to the cell, and is carried on the PBCH and the other System Information Blocks (SIB) at the physical layer are multiplexed together with unicast data transmitted. The main objective of this paper is the realization of receiver architectures for PBCH in LTE considering Single Input Single Output (SISO), Multiple Input Single Output (MISO), Single Input Multi Output (SIMO), and Multiple Input Multiple Output (MIMO). The Receiver processing steps involves as channel de-estimation, demodulation, minimum mean square error (MMSE) and among the received data minimum value is calculated using the comparator at receiver side. By applying the following VLSI DSP methods folding to reduce the number of resource elements required and another method consider to reduce the delay is Super scalar processing method and the results are compared between the direct method, folding method and super scalar method. Based on simulation and implementation, results are discussed in terms of Register Transfer Level (RTL) design, Field Programmable Gate Arrays (FPGA) editor, power estimation and resource estimation. To simulate all the modules of all PBCH channel Verilog code, Modelsim is used. For synthesis and implementation of the above architecture PlanAhead 13.4 tool on Virtex-5, xc5vlx50tff1136-1 device board is used
As Multi-Processor systems-on-Chip (MPSoC) architectures become more and more complex, Design Space Exploration (DSE) becomes the only viable solution for finding the pareto-optimal designs. To evaluate each solution ...
详细信息
ISBN:
(纸本)9781479901036
As Multi-Processor systems-on-Chip (MPSoC) architectures become more and more complex, Design Space Exploration (DSE) becomes the only viable solution for finding the pareto-optimal designs. To evaluate each solution with real dataset, DSE has to simulate the design under test, which is modeled as a Virtual Platform usually written in SystemC. However, the simulation is a very slow task which includes non-productive time periods like system initialization, while the platform re-compilation also imposes a significant overhead. In this paper, a Process-based Reconfigurable Module is used in order to bypass the non-productive simulation parts, thus accelerating the simulation. The effectiveness of the proposed methodology is proved with a series of computationally intensive multimedia applications, where the simulation time improvements reach 34% on average.
In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application d...
详细信息
ISBN:
(纸本)9781479901036
In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.
The main objective of this paper is to implement a multiplier for high speed and low energy applications. Multipliers are the building blocks of high performance systems like FIR filters, Digital signal processors, et...
详细信息
The main objective of this paper is to implement a multiplier for high speed and low energy applications. Multipliers are the building blocks of high performance systems like FIR filters, Digital signal processors, etc in which speed is the dominating factor. There are many multiplier architectures developed to increase the speed of algebra. Booth algorithm is the most effective algorithm used for fast performances. This works by introducing a high performance multiplier using Modified Radix-4 booth algorithm with Redundant Binary Adder to get high speed. A comparative study of different booth algorithms in terms of power consumption, delay, area, energy and energy delay product is also discussed in this work. All the circuits are simulated in the Cadence simulation tool using 180nm technology. The experimental results show that the proposed booth multiplier shows high speed, low energy and low energy delay product compared to the existing booth multipliers.
暂无评论