检索结果-内蒙古大学图书馆

14th international Euro-Par conference on parallel Computing

作者： Hoenen, Olivier Violard, Eric Univ Strasbourg LSIIT ICPS Strasbourg France

ISBN: (纸本)9783540854500

this work is devoted to the numerical resolution of the 4D Vlasov equation using an adaptive mesh of phase space. We previously proposed a parallel algorithm designed for distributed memory architectures. the underlying numerical scheme makes possible a parallelization using a block-based mesh partitioning. Efficiency of this algorithm relies on maintaining a good load balance at a low cost during the whole simulation. In this paper, we propose a dynamic load balancing mechanism based on a geometric partitioning algorithm. this mechanism is deeply integrated into the parallel algorithm in order to minimize overhead. Performance measurements on a PC cluster show the good quality of our load balancing and confirm the pertinence of our approach.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A distributed shared parallel IO system for HPC

A distributed shared parallel IO system for HPC

引用

5th international conference on Information Technology - New Generations

作者： Guo Yu-Feng Li Qiong Liu Guang-Ming Cao Yue-Sheng Zhang Lei Natl Univ Def Technol Comp Inst Sch Comp Beijing Peoples R China

ISBN: (纸本)9780769530994

the technology of parallel 10 is one of the key technologies for high performance computer. Firstly, the 10 system of the newest Top500 typical machines will be introduced in this paper. Secondly, a new distributed shared parallel 10 system for high performance computer (DSPIO) will be put forward, and some key technologies implemented in the system 14411 be discussed Finally, a prototype system is built. the experiment results show that this architecture can offer high 10 bandwidth, good scalability, and suit for high performance computing very much.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

3D block-based medial axis transform and chessboard distance transform on the CREW PRAM

3D block-based medial axis transform and chessboard distance...

引用

8th international conference on algorithms and architectures for parallel processing

作者： Lin, Shih-Ying Horng, Shi-Jinn Kao, Tzong-Wann Fahn, Chin-Shyurng Fan, Pingzhi Lee, Cheng-Ling Bourgeois, Anu Southwest Jiaotong Univ Inst Mobile Commun Chengdu 610031 Peoples R China Natl Taiwan Univ Sci & Technol Dept Elect Engn Taipei Taiwan Natl Taiwan Univ Sci & Technol Dept Comp Sci & Informat Engn Taipei Taiwan Natl United Univ Dept Elect Engn Miaoli Taiwan Technol & Sci Inst Northern Taiwan Dept Elect Engn Taipei Taiwan Natl United Univ Dept Electroopt Engn Miaoli Taiwan Georgia State Univ Dept Comp Sci Atlanta GA 30302 USA

ISBN: (纸本)9783540695004

Traditionally, the block-based medial axis transform (BB-MAT) and the chessboard distance transform (CDT) were usually viewed as two completely different image computation problems, especially for three dimensional (3D) space. We achieve the computation of the 3D CDT problem by implementing the 3D BB-MAT algorithm first. For a 3D binary image of size N-3, our parallel algorithm can be run in O(logN) time using N-3 processors on the concurrent read exclusive write (CREW) parallel random access machine (PRAM) model to solve both 3D BB-MAT and 3D CDT problems, respectively. In addition, we have implemented a message passing interface (MPI) program on an AMD Opteron Model 270 cluster system to verify the proposed parallel algorithm, since the PRAM model is not available in the real world. the experimental results show that the speedup is saturated when the number of processors used is more than four, regardless of the problem size.

关键词： parallel algorithm image processing CREW PRAM model block-based medial axis transform chessboard distance transform Euclidean distance transform

来源：评论

学校读者我要写书评

暂无评论

On power dissipation in information processing

On power dissipation in information processing

引用

5th international conference on Information Technology - New Generations

作者： Ostroumov, Roman Wang, Kang L. Luna Innovat Inc Roanoke VA 24011 USA Univ Calif Los Angeles Div Res Lab Dept Elect Engn FENA Los Angeles CA 90095 USA

ISBN: (纸本)9780769530994

We consider power dissipation during simple switching in information processing. By considering general two-level system we show that the energy dissipation during errorless switching has a minimum of kTln2 and increases linearly with a switching speed. Also, we find optimal switching function, which minimizes heat dissipation for the given error rate. We present some estimates and compare them with results for the CMOS technology.

关键词： power dissipation nanoscale architectures CMOS

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the international conference on Application-Specific Systems, architectures and Processors

Proceedings of the International Conference on Application-S...

引用

ASAP08 - IEEE 19th international conference on Application-Specific Systems, architectures and Processors

ISBN: (纸本)9781424418985

the proceedings contain 52 papers. the topics discussed include: fast custom instruction identification by convex subgraph enumeration;bit matrix multiplication in commodity processors;security processor with quantum key distribution;dynamically reconfigurable regular expression matching architecture;an efficient implementation of a phase unwrapping kernel on reconfigurable hardware;a parallel hardware architecture for connected component labeling based on fast label merging;design space exploration of a cooperative MIMO receiver for reconfigurable architectures;dynamic holographic reconfiguration on a four-context ODRGA;FPGA-based hardware accelerator of the heat equation with applications on infrared;FPGA based singular value decomposition for image processing applications;accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs;and reconfigurable acceleration of microphone array algorithms for speech enhancement.

关键词： Systolic arrays

来源：评论

学校读者我要写书评

暂无评论

Enhancing Multimedia processing by Wave-Pipelining Integer Units and Floating Point Units in Whole

Enhancing Multimedia Processing by Wave-Pipelining Integer U...

引用

5th international conference on Electrical Engineering/Electronics, Computer Telecommunications and Information Technology

作者： Fukase, Masa-aki Noda, Kazunori Yokoyama, Atsuko Sato, Tomoaki Hirosaki Univ Grad Sch Sci & Technol Hirosaki Aomori Japan Hirosaki Univ Comp & Network Syst Ctr Hirosaki Aomori Japan

ISBN: (纸本)9781424421015

A ubiquitous processor, HCgorilla followed Java CPU for multimedia processing and was built in RNG (random number generators) for cipher processing. then, HCgorilla had an execution stage composed of several units for those sophisticated processing. Since the execution stage kept physical separation. each function took different latency. this required instruction scheduling similarly to regular super scalar processors. We describe, in this paper, the improvement of HCgorilla to solve this issue. Specifically. the execution stage composed of arithmetic units is wave-pipelined in whole. this completely merges the parallel structure without physical separation. the waved multifunctional execution unit is effective to realize wide-range dynamic ILP (instruction level parallelism) at a rate higher than regular superscalar processors.

关键词： Ubiquitous processor execution unit floating point wave-pipeline

来源：评论

学校读者我要写书评

暂无评论

Serial and parallel FPGA-based variable block size motion estimation processors

引用

JOURNAL OF SIGNAL processing SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2008年第1期51卷 77-98页

作者： Li, Brian M. H. Leong, Philip H. W. Chinese Univ Hong Kong Dept Comp Sci & Engn Shatin Hong Kong Peoples R China

H.264/AVC is the latest video coding standard adopting variable block size motion estimation (VBS-ME), quarter-pixel accuracy, motion vector prediction and multi-reference frames for motion estimation. these new features result in much higher computation requirements than previous coding standards. In this paper we propose a novel most significant bit (MSB) first bit-serial architecture for full-search block matching VBS-ME, and compare it with systolic implementations. Since the nature of MSB-first processing enables early termination of the sum of absolute difference (SAD) calculation, the average hardware performance can be enhanced. Five different designs, one and two dimensional systolic and tree implementations along with bit-serial, are compared in terms of performance, pixel memory bandwidth, occupied area and power consumption.

关键词： motion estimation FPGA systolic bit serial video compression

来源：评论

学校读者我要写书评

暂无评论

Using Dedicated and Opportunistic Networks in Synergy for a Cost-effective Distributed Stream processing Platform

Using Dedicated and Opportunistic Networks in Synergy for a ...

引用

14th international conference on parallel and Distributed Systems

作者： Asaduzzaman, Shah Maheswaran, Muthucumaru Univ Ottawa SITE Ottawa ON K1N 6N5 Canada McGill Univ SOCS Montreal PQ H3A 2A7 Canada

ISBN: (纸本)9780769534343

this paper presents a case for exploiting the synergy of dedicated and opportunistic network resources in a distributed hosting platform for data stream processing applications. Our previous studies have demonstrated the benefits of combining dedicated reliable resources with opportunistic resources in case of high-throughput computing applications, where timely allocation of the processing units is the primary concern. Since distributed stream processing applications demand large volume of data transmission between the processing sites at a consistent rate, adequate control over the network resources is important here to assure a steady flow of processing. In this paper, we propose a system model for the hybrid hosting platform where stream processing servers installed at distributed sites are interconnected with a combination of dedicated links and public Internet. Decentralized algorithms have been developed for allocation of the two classes of network resources among the competing tasks with an objective towards higher task throughput and better utilization of expensive dedicated resources. Results from extensive simulation study show that with proper management, systems exploiting the synergy of dedicated and opportunistic resources yield considerably higher task throughput and thus, higher return on investment over the systems solely using expensive dedicated resources.

关键词： throughput

来源：评论

学校读者我要写书评

暂无评论

Sorting on partially connected mesh networks

Sorting on partially connected mesh networks

引用

5th international conference on Information Technology - New Generations

作者： Feng, Xuerong Liu, Chunlei Kong, Jun Valdosta State Univ Dept Math & Comp Sci Valdosta GA 31698 USA North Dakota State Univ Dept Comp Sci Fargo ND 58105 USA

ISBN: (纸本)9780769530994

We consider sorting problems based on compare-and-exchange operations on partially connected mesh networks, where n node are organized in sequence and each connects to its k nearest neighbors on both sides. Each node holds a distinct key and these keys need to be sorted in certain order We present a sequential algorithm with 3/8kn(2) + O (nlogn) time complexity and a parallel algorithm with 3/2kn + O(logn) time complexity.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel tiled QR factorization for multicore architectures

引用

7th international conference on parallel processing and Applied Mathematics

作者： Buttari, Alfredo Langou, Julien Kurzak, Jakub Dongarra, Jack Univ Tennessee Dept Comp Sci Knoxville TN 37996 USA Univ Tennessee Dept Elect Engn Knoxville TN 37996 USA Univ Colorado Dept Mech Sci Denver CO 80309 USA Univ Colorado Health Sci Ctr Denver CO 80309 USA Oak Ridge Natl Lab Comp Sci & Mech Div Oak Ridge TN USA Univ Manchester Manchester M13 9PL Lancs England

ISBN: (纸本)9783540681052

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. this paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. these tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. Compared to the standard approach, say with LAPACK, may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.

关键词： Factorization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：