检索结果-内蒙古大学图书馆

33rd IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Schwartz, Oded Weiss, Elad Hebrew Univ Jerusalem Jerusalem Israel

ISBN: (纸本)9781728112466

The Matrix Chain Ordering Problem is a well studied optimization problem, aiming at finding optimal parentheses assignment for minimizing the number of arithmetic operations required when computing a chain of matrix multiplications. Existing algorithms include the O(N-3) dynamic programming of Godbole (1973) and the faster O(N log N) algorithm of Hu and Shing (1982). We show that both may result in suboptimal parentheses assignment on modern machines as they do not take into account inter-processor communication costs that often dominate the running time. Further, the optimal solution may change when using fast matrix multiplication algorithms. We show that the O(N-3) dynamic-programing algorithm easily adapts to provide optimal solutions for modern matrix multiplication algorithms, and obtain an adaption of the O(N log N) algorithm that guarantees a constant approximation.

关键词： algorithms I/O Complexity Fast Matrix Multiplication parallel Computation Matrix Chain Products

来源：评论

学校读者我要写书评

暂无评论

Research on disk array parallel test

Research on disk array parallel test

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： He, Qinlu Li, Zhanhuai Le, Xiaowang Hui, Fengwang Sun, Jian Department of Computer Science Northwestern Polytechnic University Xi'an China

ISBN: (纸本)9780769545752

Researches on technologies about testing IOPS and data transfer speed of disk arrays in mass storage systems. We propose a parallel testing technology towards the high performance disk array and realize the testing work on the newest developed disk array. By taking researches on many factors which effect disk performance seriously and quantifying them through experiments, so we realize evaluations on the whole disk array. Finally, we take researches on bottlenecks of the disk arrays and prove the results to be correct. © 2011 IEEE.

关键词： Data transfer

来源：评论

学校读者我要写书评

暂无评论

parallel efficiency and parametric optimization in CASTEP

Parallel efficiency and parametric optimization in CASTEP

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Chen, Jun Fu, Liangjie Yang, Huaming High Performance Computing Center Central South University Changsha 410083 China School of Resources Processing and Bioengineering Central South University Changsha 410083 China

ISBN: (纸本)9780769545752

parallel efficiency is always a fundamental research field in high performance computing. This paper focuses on parallel computing at high performance computing cluster with CASTEP program, discusses multi-core parallel efficiency in CASTEP, and analyses the influence of the main calculation parameters upon total CPU time and memory usage in case study, such as CPU cores(CPUs), cutoff energy, k-point, and supercell size. The paper also rationalizes and optimizes in detail the better use of limited computing resources under special circumstances. © 2011 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Investigation of the Motion of (99942) Apophis Asteroid Using the SKIF Cyberia Multiprocessor Computing System

引用

COSMIC RESEARCH 2010年第5期48卷 409-416页

作者： Bykova, L. E. Galushina, T. Yu. Tomsk VV Kuibyshev State Univ Res Inst Appl Math & Mech Tomsk 634050 Russia

We present the results of investigation of the dynamics of (99942) Apophis asteroid, which will undergo a very close encounter with the Earth on April 13, 2029. The region of possible motions of the asteroid is considered on the time interval (2004, 2040). In addition, it is shown that an increase of the observational interval (2004, 2006) until 2008 allowed us to reduce significantly the area of possible motions. All investigations were performed by numerical methods with the help of algorithms and software developed by us in the environment of parallel programming using the SKIF Cyberia multiprocessor computer of the Tomsk State University.

关键词： Time intervals computer software Multiprocessor algorithms Numerical methods asteroids parallel programming State Universities CLOSE ENCOUNTERS

来源：评论

学校读者我要写书评

暂无评论

Parallaxis-III a structured data-parallel programming language

Parallaxis-III a structured data-parallel programming langua...

引用

IEEE 1st international Conference on algorithms and architectures for parallel Processing (ICA3PP 95)

作者： Braunl, T UNIV STUTTGART IPVRCOMP VIS GRPD-70565 STUTTGARTGERMANY

ISBN: (纸本)0780320182

Parallaxis is a machine-independent language for data-parallel programming, based on sequential Modula-2. programming in Parallaxis is done on a level of abstraction with virtual processors and virtual connections, which may be defined by the application programmer. This paper describes Parallaxis-III, the current version of the language definition, together with a number of parallel sample algorithms.

关键词： Computer programming languages

来源：评论

学校读者我要写书评

暂无评论

SCL-Chan: An asynchronous data-parallel language for irregular algorithms

SCL-Chan: An asynchronous data-parallel language for irregul...

引用

2nd international Workshop on High-Level parallel programming Models and Supportive Environments / 11th international parallel Processing symposium

作者： Melin, E Raffin, B Rebeuf, X Virot, B LIFO - IIIA Universite d'Orleans Orleans France

ISBN: (纸本)0818678836

parallelism suffers from a lack of programming languages both simple to handle and able to take advantage of the power of present parallel computers. If parallelism expression is too high level, compilers have to perform complex optimizations leading often to poor performances. One the other hand, too low level parallelism transfers difficulties reward the programmer. In this paper, we propose a new programming language that integrates both a synchronous data-parallel progamming model and an asynchronous execution model. The synchronous data-parallel programming model allows a safe program designing. The asynchronous execution model yields an efficient execution on present MIMD architectures without any program transformation. Our language relies on a logical instruction ordering exploited by specific send/receive communications. It allows to express only the effective data dependences between processors. This ability is enforced by a possible send/receive unmatching useful for irregular algorithms. A sparse vector computation exemplifies our language potentialities.

关键词： design of parallel programming languages synchronous and data-parallel programming model asynchronous execution model structural clock irregular algorithm

来源：评论

学校读者我要写书评

暂无评论

Task parallelism for object oriented programs

Task parallelism for object oriented programs

引用

9th international symposium on parallel architectures, algorithms and Networks, I-SPAN 2008

作者： Giacaman, Nasser Sinnen, Oliver Department of Electrical and Computer Engineering University of Auckland New Zealand

ISBN: (纸本)9780769531250

parallel computing is notoriously challenging due to the difficulty in developing correct and efficient programs. With the arrival of multi-core processors for desktop systems, desktop applications must now be parallelised. However achieving task parallelism for such object-oriented programs has traditionally been, and still remains, difficult. This paper presents a powerful task concept for parallel object-oriented programming and presents the results from a source-to-source compiler and runtime system. With the addition of a single keyword, the sequential code does not require restructuring and asynchronous task management is performed on behalf of the programmer;the parallel code required to realise task parallelism looks very much like the sequential counterpart. An intuitive solution is provided to handle task dependencies as well as integrating different task concepts into one model. © 2008 Crown Copyright.

关键词： Object oriented programming

来源：评论

学校读者我要写书评

暂无评论

Improving parallel FDTD method performance using SSE instructions

Improving parallel FDTD method performance using SSE instruc...

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Zhang, Lihong Yu, Wenhua School of Information Engineering Communication University of China Beijing China Fundamentals Department Chinese People's Armed Police Force Academy Langfang Hebei China Penn State University University Park PA United States

ISBN: (纸本)9780769545752

Electromagnetic researchers are often faced with long execution time and therefore algorithmic and implementation-level optimization can dramatically increase the overall performance of electromagnetism simulation using FDTD method. In this paper, we focus on acceleration implementation of 3D parallel FDTD method by taking advantage of the extended instruction sets found in modern processors, in particular the SSE instruction set. We present a SSE version of 3D parallel FDTD Method that results in a considerable 3× speedup. © 2011 IEEE.

关键词： Finite difference time domain method

来源：评论

学校读者我要写书评

暂无评论

parallel tree building on a range of shared address space multiprocessors: algorithms and application performance 1

Parallel tree building on a range of shared address space mu...

引用

1st Merged international parallel Processing symposium/symposium on parallel and Distributed Processing (IPPS/SPDP 1998)

作者： Shan, HZ Singh, JP Princeton Univ Dept Comp Sci Princeton NJ 08544 USA

ISBN: (纸本)0818684038

irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a coherent shared address space programming model. As more and more supercomputing platforms that can support different programming models become available to users, from tightly-coupled hardware-coherent machines to clusters of workstations or SMPs, to truly deliver on its ease of programing advantages to application users it is important that the shared address space model nor only perform and scale well in the rightly-coupled case but also port well in performance across the range of platforms (as the message passing model can). For tree-based N-body applications, this is currently not true: While the actual computation of interactions ports well, the parallel tree building phase can become a severe bottleneck on coherent shared address space platforms, in particular an platforms with less aggressive, commodity-oriented communication architectures (even though it rakes less than 3 percent of the time in most sequential executions). We therefore investigate the performance of five parallel tree building methods in the context of a complete galaxy simulation on four very different platforms that support this programming model: an SGI Origin2000 (an aggressive hardware cache-coherent machine with physically distributed memory), an SGI Challenge bits-based shared memory multiprocessor art Intel Paragon running a shared virtual memory protocol in software at page granularity, and a Wisconsin Typhoon-zero in which the granularity of coherence can be varied using hardware support but the protocol runs in software (in the last case using both a page-based and a fine-grained protocol). We find that the algorithms used successfully and widely distributed so far for the first two platforms cause overall application performance to be very poor on the latter two commodit

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Performance Characterization and Evaluation of HPC algorithms on Dissimilar Multicore architectures 16

Performance Characterization and Evaluation of HPC Algorithm...

引用

16th IEEE international Conference on High Performance Computing and Communications HPCC 2014\11th IEEE international Conference on Embedded Software and Systems ICESS 2014\6th international symposium on Cyberspace Safety and Security CSS 2014

作者： Krishnan, S. P. T. Veeravalli, Bharadwaj Agcy Sci Technol & Res Inst Infocomm Res Singapore 138632 Singapore Natl Univ Singapore Dept Elect & Comp Engn Singapore 117583 Singapore

ISBN: (纸本)9781479961238

In this paper, we share our experiences in using two important yet different High Performance Computing (HPC) architectures for evaluating two HPC algorithms. The first architecture is an Intel x64 ISA based homogenous multicore with Uniform Memory Access (UMA) type shared-memory based Symmetric Multi-Processing system. The second architecture is an IBM Power ISA based heterogenous multicore with Non-Uniform Memory Access (NUMA) based distributed-memory Asymmetric Multi-Processing system. The two HPC algorithms are for predicting biological molecular structures, specifically the RNA secondary structures. The first algorithm that we created is a parallelized version of a popular serial RNA secondary structure prediction algorithm called PKNOTS. The second algorithm is a new parallel-by-design algorithm that we have developed called MARSs. Using real Ribo-Nucleic Acid (RNA) sequences, we conducted large-scale experiments involving hundreds of sequences using the above two algorithms. Based on thousands of data points that we collected as an outcome of our experiments, we report on the observed performance metrics for both the algorithms on the two architectures. Through our experiments, we infer that architectures with specialized co-processors for number-crunching along with high-speed memory bus and dedicated bus controllers generally perform better than general-purpose multi-processor architectures. In addition, we observed that algorithms that are intrinsically parallelized by design are able to scale & perform better by taking advantage of the underlying parallel architecture. We further share best practices on handling scalability aspects with regards to workload size. We believe our results are applicable to other HPC applications on similar HPC architectures.

关键词： ARCHITECTURE High Performance Computing algorithms Multi-core processors Prediction algorithms parallel architectures Multi-Processing Performance metrics Building construction Best Practices

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：