this paper describes a new background calibration technique for pipeline analog-to-digital converters (ADCs). the new scheme utilizes an existing digital foreground calibration algorithm and extends it to work in back...
详细信息
this paper describes a new background calibration technique for pipeline analog-to-digital converters (ADCs). the new scheme utilizes an existing digital foreground calibration algorithm and extends it to work in background. the goal is to digitally calibrate the pipeline ADCs in the background without stopping the input conversion. In this method one additional stage connected in parallel to the stage under calibration and one cyclic ADC are used to accommodate the calibration. the extra stage and the cyclic ADC are only used during the calibration process. Sources of error in pipeline architectures and effects of error on residue plot of 1-bit per stage are identified and discussed. the digital background calibration accounts for capacitor mismatch, comparator offset, charge injection and finite op-amp gain. By applying proposed calibration to a 12 bit resolution pipeline ADC, maximum INL improved from 14 to 0.6 LSB, and maximum DNL improved from 26 to 0.8 LSB.
One Super Hi-Vision (SHV) 4kx4k@60fps fractional motion estimation (FME) engine is proposed in this ***,the mode reduction and edge detection techniques are adopted to filter out unpromising modes in the algorithm ***...
详细信息
One Super Hi-Vision (SHV) 4kx4k@60fps fractional motion estimation (FME) engine is proposed in this ***,the mode reduction and edge detection techniques are adopted to filter out unpromising modes in the algorithm ***,two parallel improved schemes,called 16-pel scale processing and MB-split assignment,are given out in hardware level,which reduces design effort to only ***,sub-sampling technique is adopted during SATD (sum-of-absolute-transformed-difference) generation,which saves 75% hardware *** using TSMC 0.8um in worst work conditions (1.62V,125°C),our FME engine can achieve SHV 4kx4k@60fps real-time processing with 547.5k gates hardware.
this paper proposes a dynamic communication-efficient pattern and introduces a new efficient data-exchange process in parallel sorting, called the DCES (Dynamic Communication-Efficient parallel Sorting) algorithm, to ...
详细信息
this paper proposes a dynamic communication-efficient pattern and introduces a new efficient data-exchange process in parallel sorting, called the DCES (Dynamic Communication-Efficient parallel Sorting) algorithm, to improve the communication time. In this approach, we present the dynamic communication pattern by using a ldquoBroadcast-Checkerrdquo table, which can reduce total iterations to one iteration (in best case), or 2, 3, ..., or up to log2P(log2P+1)/2 (in worst case), while total (static) iterations of the recently study are fixed = log2P(log2P+1)/2. Finally to evaluate the sorting performance, we implemented our DCES algorithm on the SGI Origin2000. Our investigated experimental results have been compared to those of the best of existing algorithms (LBM: Load Balanced Merge sort). In the experiments, the proposed DCES algorithm yielded the improved results over those of the LBM algorithm at least 24% on the system of size P = 4 and at least 34% on the system of size P = 8.
As more computing cores are integrated onto a single chip,the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs).For data-parallel applications,we study ...
详细信息
As more computing cores are integrated onto a single chip,the effect of network communication latency is becoming more and more significant on Multi-core Network-onChips (NoCs).For data-parallel applications,we study the model of parallel speedup by including network communication latency in Amdahl's *** speedup analysis considers the effect of network topology,network size,traffic model and computation/communication *** also study the speedup *** our Multi-core NoC platform,a real data-parallel application,*** multiplication,is used to validate the *** theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled *** analysis can be used to guide architects and programmers to improve parallelprocessing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.
this paper addresses a novel coarse grain dynamic reconfigurable computing system,called DReAC-2,design and hardware implement.A whole DReAC-2 system integrates a Nios II processor,which manages the whole reconfigurab...
详细信息
this paper addresses a novel coarse grain dynamic reconfigurable computing system,called DReAC-2,design and hardware implement.A whole DReAC-2 system integrates a Nios II processor,which manages the whole reconfigurable system,and a dynamic reconfigurable coprocessor,which comprises of an 8x8processing node array designed for high regularity,high computationintensive *** prototype of DReAC-2 has been implemented on the ALTERA STRATIX II EP2S180 development *** to task's nature,MIMD computing array can select either parallel-pipelined pattern or array-parallel pattern to gain the better *** experiment results show that DReAC-2 achieves much higher 10~100x factor than NIOS II processors,and 2x~4x factors and higher precision than some others reconfigurable processors.
the Intelligent Surfer is one of algorithms designed for ranking of search engine results. It is an interesting combination of the PageRank algorithm and the content of web pages. Its main disadvantage is long computa...
详细信息
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel program...
详细信息
ISBN:
(纸本)9783540695004
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel programming standards for high-end shard-memory machine architectures into desktop programming environments. In this paper we present a performance study of these new systems. We evaluate the performance of an OpenMP shared-memory programming model that is integrated into Microsoft Visual Studio C++ 2005 and Intel C++ compilers on a multicore processor. We benchmarked using the NAS OpenMP high-level applications benchmarks and the EPCC OpenMP low-level benchmarks. We report the basic timings, scalability, and run-time profiles of each benchmark and analyze the running results.
Due to its high-level nature, parallel functional languages provide some advantages for the programmer. Unfortunately, the functional programming community has not paid much attention to some important practical probl...
详细信息
ISBN:
(纸本)9783540695004
Due to its high-level nature, parallel functional languages provide some advantages for the programmer. Unfortunately, the functional programming community has not paid much attention to some important practical problems, like debugging parallel programs. In this paper we introduce the first debugger that works with any parallel extension of the functional language Haskell, the de facto standard in the (lazy evaluation) functional programming community. the debugger is implemented as an independent library. thus, it can be used with any Haskell compiler. Moreover, the debugger can be used to analyze how much speculative work has been done in any program.
In this paper we provide both a qualitative and a quantitative evaluation of a decoupled multithreaded architecture that uses non-blocking threads. Our architecture is based on simple in-order pipelines and complete d...
详细信息
ISBN:
(纸本)9783540695004
In this paper we provide both a qualitative and a quantitative evaluation of a decoupled multithreaded architecture that uses non-blocking threads. Our architecture is based on simple in-order pipelines and complete decoupling of memory accesses from execution pipelines. We extend the architecture to support thread level speculation using snooping cache coherency protocols. We evaluate the performance gains from speculations by varying the number of load/store instructions compared to computational instructions, miss speculation rates and the degree of thread level speculation. Our architecture presents a viable alternative to complex superscalar and super-speculative CPUs.
To solve 'dimensional curse' problem, the cell-based filtering scheme has been proposed, but it shows a linear decrease in performance as the dimensionality is increased. In this paper, we propose a parallel h...
详细信息
ISBN:
(纸本)9781424423576
To solve 'dimensional curse' problem, the cell-based filtering scheme has been proposed, but it shows a linear decrease in performance as the dimensionality is increased. In this paper, we propose a parallel high-dimensional index structure for content-based information retrieval so as to cope withthe linear decrease in retrieval performance. In addition, we devise data insertion, range query and k-NN query processingalgorithms which are suitable for a cluster-based parallel architecture. Finally, we show that our parallel index structure achieves good retrieval performance in proportion to the number of servers in the cluster-based architecture and it outperforms a parallel version of the VA-File when the dimensionality is over 10.
暂无评论