Emerging digital television applications and the conventional MPSoC architectures encounter drastically increasing performance and flexibility requirement. To display high quality of images on the display devices, sev...
详细信息
Emerging digital television applications and the conventional MPSoC architectures encounter drastically increasing performance and flexibility requirement. To display high quality of images on the display devices, several image processing has to be performed. However, these algorithms are nonstandard and change case by case. It is difficult to achieve real time processing by using general purpose processor or DSP. In this paper, we present a reconfigurable Application Specific Instruction-set Processor (ASIP) which can perform several image processingalgorithms by using the same data path. It can complete several 1D filtering processing within 8 cycle/pixel, performing 16 times higher performance compare to conventional RISC processor. the performance of this ASIP can achieve the requirement of Full HD(1920×1080) application.
Wireless interference does not necessarily result in the lost of all the information of the desired signal. In this paper, we study the signature allocation in wireless networks by exploiting the retained signal featu...
详细信息
ISBN:
(纸本)9781612846842
Wireless interference does not necessarily result in the lost of all the information of the desired signal. In this paper, we study the signature allocation in wireless networks by exploiting the retained signal feature under interference. the commonly used node signature includes MAC address and PN sequence. However, both of them take a lot of time to identify the signature when the network scale is large. We propose a novel signature allocation method based on a multi-hash policy, using a set of hash functions and a mapping-table to identify the signature. the proposed method decreases significantly boththe signature identification complexity and the signature conflict probability. We implement the algorithm in GNU Radio and evaluate the performance in a testbed of three USRP nodes. the result shows that, under typical settings,
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is...
详细信息
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. this has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Multi-core architectures with asymmetric core performance have recently shown great promise, because applications with different needs can benefit from either the high performance of a fast core or the high parallelis...
详细信息
ISBN:
(纸本)9781450322102
Multi-core architectures with asymmetric core performance have recently shown great promise, because applications with different needs can benefit from either the high performance of a fast core or the high parallelism and power efficiency of a group of slow cores. this performance heterogeneity can be particularly beneficial to applications running in virtual machines (VMs) on virtualized servers, which often have different needs and exhibit different performance and power characteristics. therefore, scheduling VMs on performance-asymmetric multicore architectures can have a great impact on a system's overall energy efficiency. Unfortunately, existing VM managers, such as Xen, have not taken the heterogeneity into account and thus often result in low energy efficiencies. In this paper, we propose a novel VM scheduling algorithm that exploits core performance heterogeneity to optimize the overall system energy efficiency. We first introduce a metric termed energy-efficiency factor to characterize the power and performance behaviors of the applications hosted by VMs on different *** then present a method to dynamically estimate the VM's energy-efficiency factors and then map the VMs to heterogeneous cores, such that the energy efficiency of the entire system is maximized. We implement the proposed algorithm in Xen and evaluate it with standard benchmarks on a real testbed. the experimental results show that our solution improves the system energy efficiency (i.e., performance per watt) by 13.5% on average and up to 55% for some benchmarks, compared to the default Xen scheduler.
In this paper we present a survey of existing prototypes dedicated to software defined radio. We propose a classification related to the architectural organization of the prototypes and provide some conclusions about ...
详细信息
In this paper we present a survey of existing prototypes dedicated to software defined radio. We propose a classification related to the architectural organization of the prototypes and provide some conclusions about the most promising architectures. this study should be useful for cognitive radio testbed designers who have to choose between many possible computing platforms. We also introduce a new cognitive radio testbed currently under construction and explain how this study have influenced the test-bed designers choices.
this paper presents an approach which improves the performance of word alignment for English-Hindi language pair. Longer sentences in the corpus create severe problems like the high computational requirements and poor...
详细信息
the Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. these methods have been parallelized, with a recent set of papers attem...
详细信息
the Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. these methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures [1]. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong scalability. Further, the algorithms were not demonstrated on nonuniform distributions of particles that arise in practice. In this paper, we develop an efficient scalable version of the FMM that can be scaled well on many heterogeneous nodes for nonuniform data. Key contributions of our work are data structures that allow uniform work distribution over multiple computing nodes, and that minimize the communication cost. these new data structures are computed using a parallel algorithm, and only require a small additional computation overhead. Numerical simulations on a heterogeneous cluster empirically demonstrate the performance of our algorithm.
this work presents an efficient method to map Motion Estimation (ME) algorithms onto General Purpose Graphic processing Unit (GPGPU) architectures using CUDA programming model. Our method jointly exploits the massive ...
详细信息
this work presents an efficient method to map Motion Estimation (ME) algorithms onto General Purpose Graphic processing Unit (GPGPU) architectures using CUDA programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelization potential of ME algorithms: Full Search (FS) and Diamond Search (DS). Our main goal is to evaluate the feasibility of achieving real-time high-definition video encoding performance running on GPUs. For comparison reasons, multi-core parallel and distributed versions of these algorithms were developed using OpenMP and MPI (Message Passing Interface) libraries, respectively. the CUDA-based solutions achieve the highest speed-up in comparison with OpenMP and MPI versions for bothalgorithms and, when compared to the state-of-the-art, our FS and DS solutions reach up to 18x and 11x speed-up, respectively.
By amassing 'wisdom of the crowd', social tagging systems draw more and more academic attention in interpreting Internet folk knowledge. In order to uncover their hidden semantics, several researches have atte...
详细信息
Advances in microelectronic devices have dissolved the boundary between software and hardware. Since hardware circuits are generally faster and enable significantly broader parallelism to be provided, a number of rece...
详细信息
Advances in microelectronic devices have dissolved the boundary between software and hardware. Since hardware circuits are generally faster and enable significantly broader parallelism to be provided, a number of recent research works are dedicated to high-performance computations in electronic systems without direct use of processing cores which introduce a number of constraints (e.g. pre-defined size of operands, pre-given sets of instructions, limits for concurrency and parallelism, etc.). this paper suggests a regular way enabling methods and functions from general-purpose languages to be converted to hardware implementations. Consequently, such conventional programming techniques as hierarchy, recursion, passing arguments, and returning values were entirely implemented in hardware modules. the resulting circuits are faster than their software alternatives and this is confirmed by examples and numerous experiments from different application areas.
暂无评论