检索结果-内蒙古大学图书馆

A high performance crashworthiness simulation system based on GPU

ADVANCES IN ENGINEERING SOFTWARE 2015年 86卷 29-38页

作者： Cai, Yong Wang, Guoping Li, Guangyao Wang, Hu Peking Univ Sch Elect Engn & Comp Sci Beijing 100871 Peoples R China Hunan Univ State Key Lab Adv Design & Mfg Vehicle Body Changsha 410082 Peoples R China

Crashworthiness simulation system is one of the key computer-aided engineering (CAE) tools for the automobile industry and implies two potential conflicting requirements: accuracy and efficiency. A parallel crashworthiness simulation system based on graphics processing unit (GPU) architecture and the explicit finite element (FE) method is developed in this work. Implementation details with compute unified device architecture (CUDA) are considered. The entire parallel simulation system involves a parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty contact force calculation algorithm. Three basic GPU-based parallel strategies are suggested to meet the natural parallelism of the explicit FE algorithm. Two free GPU-based numerical calculation libraries, cuBLAS and Thrust, are introduced to decrease the difficulty of programming. Furthermore, a mixed array and a thread map to element strategy are proposed to improve the performance of the test pairs searching. The outer loop of the nested loop through the mixed array is unrolled to realize parallel searching. An efficient storage strategy based on data sorting is presented to realize data transfer between different hierarchies with coalesced access during the contact pairs searching. A thread map to element pattern is implemented to calculate the penetrations and the penetration forces;a double float atomic operation is used to scatter contact forces. The simulation results of the three different models based on the Intel Core i7-930 and the NVIDIA GeForce GTX 580 demonstrate the precision and efficiency of this developed parallel crashworthiness simulation system. (C) 2015 Elsevier Ltd. All rights reserved.

关键词： Crashworthiness Explicit finite element Graphics processing units CUDA parallel programming CAE

来源：评论

学校读者我要写书评

暂无评论

ACHIEVING EXASCALE CAPABILITIES THROUGH HETEROGENEOUS COMPUTING

引用

IEEE MICRO 2015年第4期35卷 26-36页

作者： Schulte, Michael J. Ignatowski, Mike Loh, Gabriel H. Beckmann, Bradford M. Brantley, William C. Gurumurthi, Sudhanva Jayasena, Nuwan Paul, Indrani Reinhardt, Steven K. Rodgers, Gregory Adv Micro Devices Inc Sunnyvale CA 94088 USA

This article provides an overview of AMD's vision for exascale computing. The authors envision exascale computing nodes that compose integrated CPUs and GPUs, along with the hardware and software support to enable scientists to effectively run their scientific experiments on an exascale system. The authors discuss the challenges in building a heterogeneous exascale system and describe ongoing research efforts to realize AMD's exascale vision.

关键词： Energy Conservation Graphics Processing Units parallel Machines parallel programming Exascale Capability Heterogeneous Computing AMD Exascale Computing Performance Capability Hardware Optimization Energy Efficiency Supercomputer High End High Performance Computing System High Volume GPU Technology Energy Efficient Data parallel Computing GPU Capability Accelerated Processing Units Heterogeneous Exascale System Graphics Processing Units Random Access Memory Bandwidth Memory Management Energy Efficiency Supercomputers Computer Programs Exascale Computing Heterogeneous Computing Energy Efficiency Data parallel Computing Hardware

来源：评论

学校读者我要写书评

暂无评论

Mutual Exclusion Verification of Peterson's Solution in Isabelle/HOL

Mutual Exclusion Verification of Peterson's Solution in Isab...

引用

International Conference on Trustworthy Systems and their Applications (TSA)

作者： Xiaojun Ji Lihua Song Institute of CIS PLA University of Science and Technology Nanjing China

ISBN: (纸本)9781509035403

Peterson's solution is a classical algorithm for mutual exclusion problem. But rigorous works on analyzing its properties of safety or liveness are rare so far. In theorem prover Isabelle/HOL, we formally modelled Peterson's solution for two processes, and proved that it satisfies mutual exclusion property. With Paulson's inductive approach, the algorithm is inductively defined as a set of all possible event lists of two concurrent processes, in which event is defined as atomic action of concurrent processe. All of the reasoning codes have been checked by Isabelle/HOL. Comparing with those works based on model checking, our work can be easily generalized to the analysis of Peterson's solution for n (n>2) processes. And the model we defined for Peterson's solution could be extended to analyze liveness property of Peterson's solution. The process of proving also produces some good advices on how to programming Peterson's solution.

关键词： Algorithm design and analysis Cognition Protocols Safety Analytical models parallel programming Computers

来源：评论

学校读者我要写书评

暂无评论

Extended Kalman Filter-Based parallel Dynamic State Estimation

引用

IEEE TRANSACTIONS ON SMART GRID 2015年第3期6卷 1539-1549页

作者： Karimipour, Hadis Dinavahi, Venkata Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2V4 Canada

There is a growing need for accurate and efficient real-time state estimation with increasing complexity, interconnection, and insertion of new devices in power systems. In this paper, a massively parallel dynamic state estimator is developed on a graphic processing unit (GPU), which is especially designed for processing large data sets. Within the massively parallel framework, a lateral two-level dynamic state estimator is proposed based on the extended Kalman filter method, utilizing both supervisory control and data acquisition, and phasor measurement unit (PMU) measurements. The measurements at the buses without PMU installations are predicted using previous data. The results of the GPU-based dynamic state estimator are compared with a multithread CPU-based code. Moreover, the effects of direct and iterative linear solvers on the state estimation algorithm are investigated. The simulation results show a total speed-up of up to 15 times for a 4992-bus system.

关键词： Compute unified device architecture (CUDA) data parallelism dynamic state estimation (DSE) extended Kalman filter (EKF) graphic processing units (GPUs) large-scale systems massive-thread multithread OpenMP parallel programming phasor measurement units (PMUs)

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of image matching with MPI

Parallel implementation of image matching with MPI

引用

Canadian Conference on Electrical and Computer Engineering (CCECE)

作者： Ismail Sheikh Alfonso Oviedo Alejandro Emerio Nagi Mekhiel Ryerson University Toronto ON CA Dept. of Electrical & Computer Engineering (ECE) Ryerson University Toronto Canada

In this paper, the performance of parallel computing will be thoroughly discussed in the domain of image matching. The concept of image matching is widely used in the areas of security, medical and computer vision which require comparing two images for similarities. However, depending on the size of images, it is highly possible that the application computation cannot be handled in a single processor running a sequential algorithm. In order to overcome this limitation, parallel computing is introduced through the Message Passing Interface (MPI) library. In this project, for the comparison of two images, both images are first converted into grayscale and then are compared using the Sum of Square Differences (SSD) algorithm. Further, a parallel network of 12 processors was implemented for image matching and to calculate the performance of the SSD algorithm between both images. The performance gain of 12, 8, 4 and 2 processors was compared with the performance of a single processor. The comparison results presented a linear relationship between the performance gain and the number of processors used for execution. Hence, it proves that there are significant benefits of parallelism on SSD applications.

关键词： Computational modeling Image matching Computers Performance gain parallel processing parallel programming Mathematical model

来源：评论

学校读者我要写书评

暂无评论

HIPS Introduction and Committees

HIPS Introduction and Committees

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： David Boehme Xu Liu

Provides a listing of current committee members and society officers.

关键词： Conferences Hip Distributed processing parallel programming Program processors Scalability

来源：评论

学校读者我要写书评

暂无评论

Improving Coding Performance and Energy Efficiency of Erasure Coding Process for Storage Systems - A parallel and Scalable Approach

Improving Coding Performance and Energy Efficiency of Erasur...

引用

IEEE International Conference on Cloud Computing, CLOUD

作者： Hsing-Bung Chen Song Fu HPC-5 Group Los Alamos National Lab Los Alamos New Mexico USA Department of Computer Science and Engineering University of North Texas Texas Denton

Erasure code based object storage systems are becoming popular choices for archive storage systems due to cost-effective storage space saving schemes and higher fault-resilience capabilities. Both erasure code encoding and decoding procedures involve heavy array, matrix, and table-lookup compute intensive operations. With today's advanced CPU design technologies such as multi-core, many-core, and streaming SIMD instruction sets we can effectively and efficiently adapt the erasure code technology in cloud storage systems and apply it to handle very large-scale date sets. Current solutions of the erasure coding process are based on single process approach which is not capable of processing very large data sets efficient and effectively. To prevent the bottleneck of a single process erasure encoding process, we utilize the task parallelism property from a multicore computing system and improve erasure coding process with parallel processing capability. We have leveraged open source erasure coding software and implemented a concurrent and parallel erasure coding software, called parEC. The proposed parEC process is realized through MPI run time parallel I/O environment and then data placement process is applied to distribute encoded data blocks to their destination storage devices. In this paper, we present the software architecture of parEC. We conduct various performance testing cases on parEC's software components. We present our early experience of using parEC, and address parEC's current status and future development works.

关键词： Encoding Software Multicore processing Testing parallel processing Bandwidth parallel programming

来源：评论

学校读者我要写书评

暂无评论

Accelerating engineering software on modern multi-core processors

引用

ADVANCES IN ENGINEERING SOFTWARE 2015年 84卷 77-84页

作者： Borin, Edson Devloo, Philippe R. B. Vieira, Gilvan S. Shauer, Nathan Univ Estadual Campinas Inst Comp Campinas SP Brazil Univ Estadual Campinas Fac Civil Engn Campinas SP Brazil

Recent multi-core designs migrated from Symmetric Multi Processing to cache coherent Non Uniform Memory Access architectures. In this paper we discuss performance issues that arise when designing parallel Finite Element programs for a 64-core ccNUMA computer and explore solutions for these issues. We first present the overview of the computer architecture and show that highly parallel code that does not take into account the aspects of the system memory organization scales poorly, achieving only 2.8x speedup when running with 64 threads. Then, we discuss how we identified the sources of overhead and evaluate three possible solutions for the problem. We show that the first solution does not require the application's code to be modified, however, the speedup achieved is only 10.6x. The second solution enables the performance to scale up to 30.9x, however, it requires the programmer to manually schedule threads and allocate related data on local CPUs and memory banks and rely on ccNUMA aware libraries that are not portable across operating systems. Also, we propose and evaluate "copy-on-thread", an alternative solution that enables the performance to scale up to 25.5x without relying on specialized libraries nor requiring specific data allocation and thread scheduling. Finally, we argue that the issues reported only happen for large data sets and conclude the paper with recommendations to help programmers to design algorithms and programs that perform well on such kind of machine. (C) 2014 Civil-Comp Ltd. and Elsevier Ltd. All rights reserved.

关键词： parallel programming parallel processing Cache-coherent Non Uniform Memory Access Finite Element Methods Multi-core processors Shared memory

来源：评论

学校读者我要写书评

暂无评论

Optimizing Total Energy-Mass Flux (TEMF) Planetary Boundary Layer Scheme for Intel's Many Integrated Core (MIC) Architecture

引用

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2015年第8期8卷 4106-4119页

作者： Mielikainen, Jarno Huang, Bormin Huang, Hung-Lung Allen Univ Wisconsin Ctr Space Sci & Engn Madison WI 53703 USA

In order to make use of the ever-improving microprocessor performance, the applications must be modified to take advantage of the parallelism of today's microprocessors. One such application that needs to be modernized is the weather research and forecasting (WRF) model, which is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the total energy-mass flux (TEMF) that unifies turbulence and moist convection components produces a better result than other PBL schemes. Thus, we present our optimization results for the TEMF PBL scheme. Those optimizations included vectorization of the code to utilize multiple vector units inside each processor code. The optimizations improved the performance of the original TEMF code on Xeon Phi 7120P by a factor of 25.9x. Furthermore, the same optimizations improved the performance of the TEMF on a dual socket configuration of eight-core Intel Xeon E5-2670 CPUs by a factor of 8.3x compared to the original TEMF code.

关键词： Intel Many Integrated Core (MIC) Intel Xeon Phi parallel programming planetary boundary layer (PBL) single instruction multiple data (SIMD) weather forecasting weather research and forecasting (WRF)

来源：评论

学校读者我要写书评

暂无评论

parallel Subgraph Matching on Massive Graphs

Parallel Subgraph Matching on Massive Graphs

引用

International Congress on Image and Signal Processing, BioMedical Engineering and Informatics

作者： Bo Suo Zhanhuai Li Wei Pan School of Computer Science and Engineering Northwestern Polytechnical University

ISBN: (纸本)9781509037117

While numerous applications, such as social networks, protein-protein interaction networks, and bibliographic networks, mainly consist of graph-structured data, massive graphs, of which the scales range from million nodes to billion nodes, are common-place. Searching within these kinds of graphs is urged to be efficient. Unfortunately, since the subgraph isomorphism problem is NP-complete, querying on large graphs is still challenging. Most of existing approaches employ various pruning rules to facilitate the matching process on a single machine. When a data graph is large and dense, auxiliary information, known as index, and intermediate results could easily run out of computational resources. Recently, inspired by the popularity of parallel programming models, such as MapReduce and Pregel, there is a trend to solve the subgraph matching problem upon them. However, caused by the incompleteness of graph data in each cluster machine, parallel solutions of subgraph matching are often proposed in a brute-force way. In this paper, we propose a parallel subgraph matching framework which uses k-hop replication based partitioning approach to distribute the graph data across cluster machines. Benefiting from the proposed framework, the completeness of local searching can be ensured. Hence, previous studies on indexing graph data become usable and valuable for the parallel graph querying problem. For the consideration of efficiency, taking a light-weight neighborhood-based index as an example, we also propose two potential optimization opportunities for reducing intermediate results. We implement the proposed framework on Hadoop/MapReduce. Our experimental results on real-world data sets demonstrate its effectiveness on very large graphs.

关键词： graph data parallel Lines machine parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：