检索结果-内蒙古大学图书馆

Design methodology for throughput optimum architectures of hash algorithms of the MD4-class

JOURNAL OF SIGNAL processing SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2008年第1-2期53卷 89-102页

作者： Lee, Yong Ki Chan, Herwin Verbauwhede, Ingrid Univ Calif Los Angeles Los Angeles CA 90095 USA Katholieke Univ Leuven Louvain Belgium

In this paper we propose an architecture design methodology to optimize the throughput of MD4-based hash algorithms. the proposed methodology includes an iteration bound analysis of hash algorithms, which is the theoretical delay limit, and Data Flow Graph transformations to achieve the iteration bound. We applied the methodology to some MD4-based hash algorithms such as SHA1, MD5 and RIPEMD-160. Since SHA1 is the algorithm which requires all the techniques we show, we also synthesized the transformed SHA1 algorithm in a 0.18 mu m CMOS technology in order to verify its correctness and its achievement of high throughput. To the best of our knowledge, the proposed SHA1 architecture is the first to achieve the theoretical throughput optimum beating all previously published results. though we demonstrate a limited number of examples, this design methodology can be applied to any other MD4-based hash algorithm.

关键词： architecture design methodology throughput optimization MD4-based hash algorithm SHA1 MD5 RIPEMD-160 iteration bound analysis DFG (Data Flow Graph) transformation

来源：评论

学校读者我要写书评

暂无评论

On a novel dynamic parallel hardware architecture for lifting-based DWT

On a novel dynamic parallel hardware architecture for liftin...

引用

14th international Euro-Par conference

作者： Khanfir, Sami Jemni, Mohamed Ecole Super Sci & Tech Tunis Res Unit Technol Informat & Commun Tunis Tunisia

ISBN: (纸本)9783540854500

A novel fast scheme for Discrete Wavelet Transform (DWT) was introduced in last years under the name of lifting scheme [4, 7]. this new scheme presents many advantages over the convolution-based approach [3, 7]. For instance it is very suitable for parallelization. In this paper we present two new parallel FPGA-based implementations of the lifting-based DWT scheme. the first implementation uses pipelining, parallel processing and data reuse to increase the speed up of the algorithm. In the second architecture a controller is introduced to dynamically deploy a suitable number of clones accordingly to the available hardware resources on a targeted environment. these two architectures are able of processing large size incoming images or multi-framed images in real-time. the simulations driven on a Xilinx Virtex-5 FPGA environment has proven the practical efficiency of our contribution: the first architecture has given an operating frequency of 289 MHz, and the second demonstrated the controller's capabilities of deploying the maximum number of clones from the available resources, over a targeted FPGA environment and processing the task in parallel.

关键词： parallel reconfigurable DWT lifting FPGA

来源：评论

学校读者我要写书评

暂无评论

Fast nonlocal filtering applied to electron cryomicroscopy

Fast nonlocal filtering applied to electron cryomicroscopy

引用

5th IEEE international Symposium on Biomedical Imaging

作者： Darbon, Jerome Cunha, Alexandre Chan, Tony F. Osher, Stanley Jensen, Grant J. Univ Calif Los Angeles Dept Math Los Angeles CA 90024 USA CALTECH Ctr Adv Comp Res Pasadena CA 91125 USA CALTECH Div Biol Pasadena CA 91125 USA

ISBN: (纸本)9781424420025

We present an efficient algorithm for nonlocal image filtering with applications in electron cryomicroscopy. Our denoising algorithm is a rewriting of the recently proposed nonlocal mean filter. It builds on the separable property of neighborhood filtering to offer a fast parallel and vectorized implementation in contemporary shared memory computer architectures while reducing the theoretical computational complexity of the original filter. In practice, our approach is much faster than a serial, non-vectorized implementation and it scales linearly with image size. We demonstrate its efficiency in data sets from Caulobacter crescentus tomograms and a cryoimage containing viruses and provide visual evidences attesting the remarkable quality of the nonlocal means scheme in the context of cryoimaging. With such development we provide biologists with an attractive filtering tool to facilitate their scientific discoveries.

关键词： nonlocal mean filtering image denoising electron cryomicroscopy image vectorization SMID parallel image processing

来源：评论

学校读者我要写书评

暂无评论

High-Speed parallel Architecture for Software-Based CRC

High-Speed Parallel Architecture for Software-Based CRC

引用

Consumer Communications and Networking conference, CCNC IEEE

作者： Youngju Do Sung-Rok Yoon Taekyu Kim Kwang Eui Pyun Sin-Chong Park School of Engineering Information and Communications University Daejeon South Korea

this paper proposes a software based parallel CRC (Cyclic Redundancy Check) algorithm called ' N-byte RCC (Repetition of Computation and Combination )''. this algorithm is the iterative process of message computation by the 'slicing-by-4' and combination through the ' zero block lookup tables '. this algorithm can parallelize the CRC calculation with any number of processors. In order to verify the performance of our algorithm, we employ two different communication architectures; the single bus architecture and the 1-star topology NoC (Network on Chip) architecture. With respect to those architectures, we explore our parallel algorithm by using TLM (Transaction Level Model). From the simulation results, we present that the proposed parallel CRC algorithm with BUS and NoC architectures reduces the processing time by 28 percent and 38 percent, respectively, compared to the 'slicing-by-8' which is the fastest algorithms among other software based algorithms. Furthermore, the 1-star NoC architecture of the parallel CRC shows higher performance than the single bus architecture regardless of the number of processors.

关键词： parallel architectures Cyclic redundancy check Computer architecture Software algorithms Iterative algorithms Network-on-a-chip Concurrent computing Table lookup Network topology parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Using graphics processors for high-performance IR query processing 08

Using graphics processors for high-performance IR query proc...

引用

17th international conference on World Wide Web 2008, WWW'08

作者： Ding, Shuai He, Jinru Yan, Hao Suel, Torsten CIS Department Polytechnic University Brooklyn NY 11201 United States

ISBN: (纸本)9781605580852

Web search engines are facing formidable performance challenges as they need to process thousands of queries per second over billions of documents. To deal with this heavy workload, current engines use massively parallel architectures of thousands of machines that require large hardware investments. We investigate new ways to build such high-performance IR systems based on Graphical processing Units (GPUs). GPUs were originally designed to accelerate computer graphics applications through massive on-chip parallelism. Recently a number of researchers have studied how to use GPUs for other problem domains including databases and scientific computing [2, 3, 5], but we are not aware of previous attempts to use GPUs for large-scale web search. Our contribution here is to design a basic system architecture for GPU-based high-performance IR, and to describe how to perform highly efficient query processing within such an architecture. Preliminary experimental results based on a prototype implementation suggest that significant gains in query processing performance might be obtainable with such an approach.

关键词： Query processing

来源：评论

学校读者我要写书评

暂无评论

Modeling damage evolution in friction stir welding process

引用

JOURNAL OF ENGINEERING MATERIALS AND TECHNOLOGY-TRANSACTIONS OF thE ASME 2008年第2期130卷 0061-00610页

作者： He, Youliang Dawson, Paul R. Boyce, Donald E. Cornell Univ Sibley Sch Mech & Aerosp Engn Ithaca NY 14853 USA

the evolution of voids (damage) in friction stir welding processes was simulated using a void growth model that incorporates viscoplastic flow and strain hardening of incompressible materials during plastic deformation. the void growth rate is expressed as a function of the void volume fraction, the effective deformation rate, and the ratio of the mean stress to the strength of the material. A steady-state Eulerian finite element formulation was employed to calculate the flow and thermal fields in three dimensions, and the evolution of the strength and damage was evaluated by integrating the evolution equations along the streamlines obtained in the Eulerian configuration. the distribution of internal voids within the material was qualitatively compared with experimental results, and a good agreement was observed in terms of the spatial location of voids. the effects of pin geometry and operational parameters such as tool rotational and travel speeds on the evolution of damage were also examined.

关键词： friction stir welding void growth finite element analysis streamlines parallel computation

来源：评论

学校读者我要写书评

暂无评论

Design methodology for throughput optimum architectures of hash algorithms of the MD4-class

Design methodology for throughput optimum architectures of h...

引用

17th IEEE international conference on Application-Specific Systems, architectures and Processors

作者： Lee, Yong Ki Chan, Herwin Verbauwhede, Ingrid Univ Calif Los Angeles Los Angeles CA 90095 USA Katholieke Univ Leuven Louvain Belgium

关键词： architecture design methodology throughput optimization MD4-based hash algorithm SHA1 MD5 RIPEMD-160 iteration bound analysis DFG (Data Flow Graph) transformation

来源：评论

学校读者我要写书评

暂无评论

Adaptive microarray image acquisition system and microarray image processing using FPGA technology

Adaptive microarray image acquisition system and microarray ...

引用

12th international conference on Knowledge-Based Intelligent Information and Engineering Systems

作者： Belean, Bogdan Borda, Monica Fazakas, Albert Tech Univ Cluj Napoca Fac Elect Commun & Informat Technol Cluj Napoca 400027 Romania

ISBN: (纸本)9783540855668

the present paper proposes an adaptive hardware implementation for a microarray image acquisition system, which is mandatory for implementing hardware algorithms for processing microarray images. processing techniques for microarray image are also described, together with a hardware implementation of a spot border detection algorithm. the hardware implementation takes advantage of parallel computation capabilities offered by FPGA technology. Results which prove time and cost efficiency are presented for both hardware implementations.

关键词： cDNA microarray image processing parallel processing FPGA technology

来源：评论

学校读者我要写书评

暂无评论

Euro-Par 2008 parallel processing 2008

引用

丛书名： Lecture Notes in Computer Science

2008年

作者： Emilio Luque Tomas Margalef Domingo Benítez

ISBN: (数字)9783540854517

ISBN: (纸本)9783540854500

this book constitutes the refereed proceedings of the 14th international conference on parallel Computing, Euro-Par 2008, held in Las Palmas de Gran Canaria, Spain, in August 2008. the 86 revised papers presented were carefully reviewed and selected from 264 submissions. the papers are organized in topical sections on support tools and environments; performance prediction and evaluation; scheduling and load balancing; high performance architectures and compilers; parallel and distributed databases; grid and cluster computing; peer-to-peer computing; distributed systems and algorithms; parallel and distributed programming; parallel numerical algorithms; distributed and high-performance multimedia; theory and algorithms for parallel computation; and high performance networks.

关键词： parallel processing (Electronic computers) Congresses.

来源：评论

学校读者我要写书评

暂无评论

Optimistic parallelization support for event stream processing systems

Optimistic parallelization support for event stream processi...

引用

5th Middleware Doctoral Symposium, MDS'08, Co-located with ACM/IFIP/USENIX 9th international Middleware conference, Middleware 2008

作者： Brito, Andrey Systems Engineering Group TU Dresden Germany

ISBN: (纸本)9781605583617

Event stream applications consist of an acyclic graph of components that are traversed by streams of events. Examples of operations in such components are filtering, aggregation, enrichment, and transformation of events and, commonly, applications include a mix of common-use library functions and user-defined functions. When the operation only depends on the current input events, the component can be trivially parallelized by replication. However, if the component keeps state that is used for the computation of the results, the trivial parallelization approach does not work. parallel versions for common components have being designed, but complex or user-defined components are normally limited by single thread performance. In this work, we use optimistic parallelization approaches to harness the potential of multi-core processors to scale the performance of stateful operators in event stream applications. In addition, we investigate indulgent ways to allow the user to provide application knowledge that can improve the amount of useful speculative work. the current prototype shows considerable gain in throughput even though some speculative executions must be disregarded. Copyright 2008 ACM.

关键词： event stream processing optimistic parallelization software transactional memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：