检索结果-内蒙古大学图书馆

8th international conference on Computational Science

作者： Benoit, Anne Kosch, Harald Rehn-Sonigo, Veronika Robert, Yves Ecole Normale Super Lyon LIP 46 Allee Italie F-69364 Lyon 07 France Univ Passau D-94032 Passau Germany

ISBN: (纸本)9783540693833

Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonistic criteria should be optimized, such as throughput/period and latency (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this paper, we study the bi-criteria mapping (minimizing period and latency) of the JPEG encoding on a cluster of workstations. We present an integer linear programming formulation for this NP-hard problem, and we present an in-depth performance evaluation of several polynomial heuristics.

关键词： pipeline workflow application multi-criteria optimization JPEG encoding

来源：评论

学校读者我要写书评

暂无评论

the impact of multimedia extensions for multimedia applications on mobile computing systems

The impact of multimedia extensions for multimedia applicati...

引用

8th Asia-Pacific conference on Computer-Human Interaction

作者： Kim, Jong-Myon Univ Ulsan Sch Comp Engn & Inforinat Technol Ulsan 680749 South Korea

ISBN: (纸本)9783540705840

Multimedia is a key element in human-computer interaction systems. Multimedia applications, however, are among the most dominant computing workloads driving innovations in high performance and low power imaging systems. parallel implementations of multimedia applications mostly focus on the use of parallel computers. Modem general-purpose processors, however, have employed multimedia extensions (e.g., MMX, VIS, MAX, AltiVec) or subword parallel instructions to their instruction set architectures to improve the performance of multimedia. this paper quantitatively evaluates the impact of multimedia extensions on multiprocessor systems to exploit subword level parallelism (SLP) in addition to data level parallelism (DLP). Experimental results for a set of multimedia applications on a representative multiprocessor array shows that MMX (a representative Intel's multimedia extension) achieve an average speedup ranging from 3x to 5x over the same baseline multiprocessor array. MMX also outperforms baseline in both area efficiency (a 13% increase) and energy consumption (a 73% decrease), resulting in better component utilization and sustainable battery life. these results demonstrate that MMX is a suitable candidate for mobile multimedia computing systems.

关键词： mobile multimedia computing systems multimedia extensions multiprocessor arrays parallel processing

来源：评论

学校读者我要写书评

暂无评论

parallelization of bulk operations for STL dictionaries 1

引用

13th international Euro-Par conference on parallel processing

作者： Frias, Leonor Singler, Johannes Univ Politecn Cataluna Dep Llenguatges Sistemes Informat E-08028 Barcelona Spain Univ Karlsruhe Inst Theoret Comp Sci Karlsruhe Germany

ISBN: (数字)9783540784746

ISBN: (纸本)9783540784722

STL dictionaries like map and set are commonly used in C++ programs. We consider parallelizing two of their bulk operations, namely the construction from many elements, and the insertion of many elements at a time. Practical algorithms are proposed for these tasks. the implementation is completely generic and engineered to provide best performance for the variety of possible input characteristics. It features transparent integration into the STL. this can make programs profit in an easy way from multi-core processing power. the performance measurements show the practical usefulness on real-world multi-core machines with up to eight cores.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

A Load Balancing Knapsack Algorithm for parallel Fuzzy c-Means Cluster Analysis

A Load Balancing Knapsack Algorithm for Parallel Fuzzy c-Mea...

引用

8th international conference on High Performance Computing for Computational Science (VECPAR 2008)

作者： Modenesi, Marta. V. Evsukoff, Alexandre G. Costa, Myrian C. A. Univ Fed Rio de Janeiro COPPE BR-21945970 Rio De Janeiro Brazil

ISBN: (纸本)9783540928584

this work proposes a load balance algorithm to parallel processing based on a variation of the classical knapsack problem. the problem considers the distribution of a set of partitions, defined by the number of clusters, over a set of processors attempting to achieve a minimal overall processing cost. the work is an optimization for the parallel fuzzy c-means (FCM) clustering analysis algorithm proposed in a previous work composed by two distinct parts: the cluster analysis, properly said, using the FCM algorithm to calculate of clusters centers and the PBM index to evaluate partitions, and the load balance, which is modeled by the multiple knapsack problem and implemented through a heuristic that incorporates the restrictions related to cluster analysis in order to gives more efficiency to the parallel process.

关键词： Unsupervised Classification Fuzzy c-Means Load Balance Optimization

来源：评论

学校读者我要写书评

暂无评论

Performance of multicore systems on parallel data clustering with deterministic annealing

Performance of multicore systems on parallel data clustering...

引用

8th international conference on Computational Science

作者： Qiu, Xiaohong Fox, Geoffrey C. Yuan, Huapeng Bae, Seung-Hee Chrysanthakopoulos, George Nielsen, Henrik Frystyk Indiana Univ Res Comp UITS Bloomington IN 47405 USA Indiana Univ Bloomington Commun Grids Lab Bloomington IN USA Microsoft Res Redmond WA USA

ISBN: (纸本)9783540693833

We present a performance analysis of a scalable parallel data clustering algorithm with deterministic annealing for multicore systems that compares MPI and a new C# messaging runtime library CCR (Concurrency and Coordination Runtime) with Windows and Linux and using both threads and processes. We investigate effects of memory bandwidth and fluctuations of run times of loosely synchronized threads. We give results on message latency and bandwidth for two processor multicore systems based on AMD and Intel architectures with a total of four and eight cores. We compare our C# results with C using MPICH2 and Nemesis and Java with both mpiJava and MPJ Express. We show initial speedup results from Geographical Information Systems and Cheminformatics clustering problems. We abstract the key features of the algorithm and multicore systems that lead to the observed scalable parallel performance.

关键词： data mining MPI multicore parallel computing performance threads windows

来源：评论

学校读者我要写书评

暂无评论

On the Implementation of Boundary Element Engineering Codes on the Cell Broadband Engine

On the Implementation of Boundary Element Engineering Codes ...

引用

8th international conference on High Performance Computing for Computational Science (VECPAR 2008)

作者： Cunha, Manoel T. F. Telles, J. C. F. Coutinho, Alvaro L. G. A. Univ Fed Rio de Janeiro Dept Civil Engn COPPE BR-21941972 Rio De Janeiro Brazil

ISBN: (纸本)9783540928584

Originally developed by the consortium Sony-Toshiba-IBM for the Playstation 3 game console, the Cell Broadband Engine processor has been increasingly used in a much wider range of applications like HDTV sets and multimedia devices. Conforming the new Cell Broadband Engine Architecture that extends the PowerPC architecture, this processor can deliver high computational power embedding nine cores in a single chip: one general purpose PowerPC core and eight vector cores optimized for compute-intensive tasks. the processor's performance is enhanced by single-instruction-multiple-data (SIMD) instructions that allow to execute tip to four floating-point operations in one clock cycle. this multi-level parallel environment is highly suited to applications processing data streams: encryption/decryption, multimedia, image and signal processing, among others. this paper discusses the use of Cell BE to solve engineering problems and the practical aspects of the implementations of numerical method codes in this new architecture. To demonstrate the Cell BE programming techniques and the efficient porting of existing scalar algorithms to run on a multi-level parallel processor, the authors present the techniques applied to a well-known program for the solution of two dimensional elastostatic problems with the Boundary Element Method. the programming guidelines provided here may also be extended to other numerical methods. Numerical experiments show the effectiveness of the proposed approach.

关键词： Cell Broadband Engine Boundary Element Method Boundary Elements parallel Programming Vectorization SIMD

来源：评论

学校读者我要写书评

暂无评论

Design methodology for throughput optimum architectures of hash algorithms of the MD4-class

Design methodology for throughput optimum architectures of h...

引用

17th IEEE international conference on Application-Specific Systems, architectures and Processors

作者： Lee, Yong Ki Chan, Herwin Verbauwhede, Ingrid Univ Calif Los Angeles Los Angeles CA 90095 USA Katholieke Univ Leuven Louvain Belgium

In this paper we propose an architecture design methodology to optimize the throughput of MD4-based hash algorithms. the proposed methodology includes an iteration bound analysis of hash algorithms, which is the theoretical delay limit, and Data Flow Graph transformations to achieve the iteration bound. We applied the methodology to some MD4-based hash algorithms such as SHA1, MD5 and RIPEMD-160. Since SHA1 is the algorithm which requires all the techniques we show, we also synthesized the transformed SHA1 algorithm in a 0.18 mu m CMOS technology in order to verify its correctness and its achievement of high throughput. To the best of our knowledge, the proposed SHA1 architecture is the first to achieve the theoretical throughput optimum beating all previously published results. though we demonstrate a limited number of examples, this design methodology can be applied to any other MD4-based hash algorithm.

关键词： architecture design methodology throughput optimization MD4-based hash algorithm SHA1 MD5 RIPEMD-160 iteration bound analysis DFG (Data Flow Graph) transformation

来源：评论

学校读者我要写书评

暂无评论

Tunable parallel Experiments in a GridRPC Framework: Application to Linear Solvers

Tunable Parallel Experiments in a GridRPC Framework: Applica...

引用

8th international conference on High Performance Computing for Computational Science (VECPAR 2008)

作者： Caniou, Yves Gay, Jean-Sebastien Ramet, Pierre Univ Lyon 1 Lip ENS Lyon F-69622 Villeurbanne France Lip ENS Lyon F-69622 Villeurbanne France Univ Bordeaux 1 LABRI Bordeaux France

ISBN: (纸本)9783540928584

the use of scientific computing centers becomes more and more difficult on modem parallel architectures. Users must face a large variety of batch systems (with their own specific syntax) and have to set many parameters to tune their applications (e.g., processors and/or threads mapping, memory resource constraints). Moreover, finding the optimal performance is not the only criteria when a pool of jobs is submitted on the Grid (for numerical parametric analysis for instance) and one must focus on the wall-time completion. In this work we tackle the problem by using the DIET Grid middleware that integrates an adaptable PASTIX service to solve a set of experiments issued from the simulations of the ASTER project.

关键词： Grid computing Sparse linear solver Performance prediction Application specific plug-in scheduling

来源：评论

学校读者我要写书评

暂无评论

Real-time feature-aware video abstraction

引用

VISUAL COMPUTER 2008年第7-9期24卷 727-734页

作者： Zhao, Hanli Jin, Xiaogang Shen, Jianbing Mao, Xiaoyang Feng, Jieqing Zhejiang Univ State Key Lab CAD & CG Hangzhou 310027 Peoples R China Indiana Univ Purdue Univ Indianapolis IN 46202 USA Univ Yamanashi Yamanashi Japan

this paper presents a novel feature-aware rendering system that automatically abstracts videos and images with the goal of improving the effectiveness of imagery for visual communication tasks. We integrate the bilateral grid to simplify regions of low contrast, which is faster than the separable approximation to the bilateral filter, and use a feature flow-guided anisotropic edge detection filter to enhance regions of high contrast. the edges detected in this paper are smoother, more coherent and stylistic than those of the isotropic difference-of-Gaussian filter. the presented algorithms are highly parallel, allowing a real-time performance on modern GPUs. the implementation of our approach is straightforward. Several experimental examples are given at the end of the paper to demonstrate the effectiveness of our approach.

关键词： non-photorealistic rendering visual communication real-time video processing image processing

来源：评论

学校读者我要写书评

暂无评论

Hardware BLAST algorithms with multi-seeds detection and parallel extension

引用

4th international Workshop on Applied Reconfigurable Computing

作者： Xia, Fei Don, Yong Xu, Jinbo Natl Univ Def Technol Dept Comp Sci Changsha 410073 Peoples R China

ISBN: (纸本)9783540786092

As one of the most widely used bio-sequence searching tools, BLAST adopts index-based approach to detect the matches between two substrings by looking up a large table and processing one match per query. In this paper, we propose a systolic array approach to detect string matches without using looking up tables. the pipelining systolic array is implemented as a multi-seeds detection and parallel extension pipeline engine to accelerate the first two stages of NCBI BLAST algorithm. Different from the index-based approach, our implementation consumes little memory resources and eliminates redundant string extensions by merging multiple adjoin seeds into a valid seed. Our FPGA implementation achieves superior performance results in both of processing element number and clock frequency over related works in the area of FPCA BLAST accelerators. the experimental results also show the speedup can reach about 17 and 48 compared to the NCBI BLASTp and TBLASTn programs for 3072-residue queries on Intel P4 CPU, respectively. Furthermore, the idea of multi-seeds detection also can be adopted in other seed-based heuristic searching applications.

关键词： Systolic arrays

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：