检索结果-内蒙古大学图书馆

parallel STATISTICAL MULTIRESOLUTION ESTIMATION FOR IMAGE RECONSTRUCTION

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2016年第5期38卷 C533-C559页

作者： Kramer, Stephan C. Hagemann, Johannes Kuenneke, Lutz Lebert, Jan Max Planck Inst Biophys Chem Lab Cellular Dynam D-37077 Gottingen Germany Inst Theoret Phys D-37077 Gottingen Germany Univ Gottingen Inst Xray Phys D-37077 Gottingen Germany Univ Gottingen Dept Phys D-37077 Gottingen Germany

We show that a careful parallelization of statistical multiresolution estimation (SMRE) improves the phase reconstruction in X-ray near-field holography. The central step in, and the computationally most expensive part of, SMRE methods is Dykstra's algorithm. It projects a given vector onto the intersection of convex sets. We discuss its implementation on NVIDIA's compute unified device architecture (CUDA). Compared to a CPU implementation parallelized with OpenMP, our CUDA implementation is up to one order of magnitude faster. Our results show that a careful parallelization of Dykstra's algorithm enables its use in large-scale statistical multiresolution analyses.

关键词： statistical multiresolution estimation CUDA OpenMP parallel programming image processing X-ray imaging phase reconstruction inverse problem

来源：评论

学校读者我要写书评

暂无评论

An augmented Lagrangian interior-point approach for large-scale NLP problems on graphics processing units

引用

COMPUTERS & CHEMICAL ENGINEERING 2016年 85卷 76-83页

作者： Cao, Yankai Seth, Arpan Laird, Carl D. Purdue Univ Sch Chem Engn 480 Stadium Mall Dr W Lafayette IN 47907 USA

The demand for fast solution of nonlinear optimization problems, coupled with the emergence of new concurrent computing architectures, drives the need for parallel algorithms to solve challenging nonlinear programming (NLP) problems. In this paper, we propose an augmented Lagrangian interior-point approach for general NLP problems that solves in parallel on a Graphics processing unit (GPU). The algorithm is iterative at three levels. The first level replaces the original problem by a sequence of bound-constrained optimization problems using an augmented Lagrangian method. Each of these bound-constrained problems is solved using a nonlinear interior-point method. Inside the interior-point method, the barrier sub-problems are solved using a variation of Newton's method, where the linear system is solved using a preconditioned conjugate gradient (PCG) method, which is implemented efficiently on a GPU in parallel. This algorithm shows an order of magnitude speedup on several test problems from the COPS test set. (C) 2015 Elsevier Ltd. All rights reserved.

关键词： Nonlinear programming parallel programming GPU Augmented Lagrangian method Interior-point method

来源：评论

学校读者我要写书评

暂无评论

Optimizing Purdue-Lin Microphysics Scheme for Intel Xeon Phi Coprocessor

引用

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2016年第1期9卷 425-438页

作者： Mielikainen, Jarno Huang, Bormin Huang, Hung-Lung Allen Univ Wisconsin Space Sci & Engn Ctr Madison WI 53703 USA

Due to severe weather events, there is a growing need for more accurate weather predictions. Climate change has increased both frequency and severity of such events. Optimizing weather model source code would result in reduced run times or more accurate weather predictions. One such weather model is the weather research and forecasting (WRF) model, which is designed for both numerical weather prediction (NWP) and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the WRF model. The scheme includes six classes of hydro meteors: 1) water vapor;2) cloud water;3) raid;4) cloud ice;5) snow;and 6) graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Thus, we present our optimization results for the Purdue-Lin microphysics scheme. Those optimizations included improved vectorization of the code to utilize multiple vector units inside each processor code better. Performed optimizations improved the performance of the original unmodified Purdue-Lin microphysics code running natively on Xeon Phi 7120P by a factor of 4.7x. Similarly, the same optimizations improved the performance of the Purdue-Lin microphysics scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.3x compared to the original code.

关键词： Intel MIC Intel Xeon Phi microphysics parallel programming SIMD weather forecasting weather research and forecasting (WRF)

来源：评论

学校读者我要写书评

暂无评论

Efficient implementation of tree skeletons on distributed-memory parallel computers

引用

Scalable Computing 2017年第1期18卷 17-34页

作者： Matsuzaki, Kiminori School of Information Kochi University of Techonology 185 Tosayamadacho-Miyanokuchi Kami Kochi782-8502 Japan

parallel tree skeletons are basic computational patterns that can be used to develop parallel programs for manipulating trees. In this paper, we propose an efficient implementation of parallel tree skeletons on distributed-memory parallel computers. In our implementation, we divide a binary tree to segments based on the idea of m-bridges with high locality, and represent local segments as serialized arrays for high sequential performance. We furthermore develop a cost model for our implementation of parallel tree skeletons. We confirm the efficacy of our implementation with several experiments. © 2017. SCPE. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2016年第8期42卷 764-785页

作者： Atachiants, Roman Doherty, Gavin Gregg, David Trinity Coll Dublin Dublin 2 Ireland

The shift towards multicore processing has led to a much wider population of developers being faced with the challenge of exploiting parallel cores to improve software performance. Debugging and optimizing parallel programs is a complex and demanding task. Tools which support development of parallel programs should provide salient information to allow programmers of multicore systems to diagnose and distinguish performance problems. Appropriate design of such tools requires a systematic analysis of the problems which might be identified, and the information used to diagnose them. Building on the literature, we put forward a potential taxonomy of parallel performance problems, and an observational model which links measurable performance data to these problems. We present a validation of this model carried out with parallel programming experts, identifying areas of agreement and disagreement. This is accompanied with a survey of the prevalence of these problems in software development. From this we can identify contentious areas worthy of further exploration, as well as those with high prevalence and strong agreement, which are natural candidates for initial moves towards better tool support.

关键词： parallel programming multicore multi-threaded optimization performance problems performance analysis diagnosis debugging taxonomy

来源：评论

学校读者我要写书评

暂无评论

High-performance XML modeling of parallel queries based on MapReduce framework

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2016年第4期19卷 1975-1986页

作者： Song, Kunfang Lu, Hongwei Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China

With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. Traditional parallel XML parsing and indexing approaches are inadequate for processing large-scale XML datasets on clusters and;therefore, we propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, we design an advanced two phase MapReduce solution that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The first MapReduce phase applies filtering, labeling, index building techniques, in which each DataNode performs elements labeling using a map function and a reduce function to merge and build indexes. In the second phase, local XML queries in multiple partitions are performed in parallel using index-table-enabled B-SLCA. Our experimental results show the efficiency and effectiveness of our proposed parallel XML data approach using MapReduce Framework.

关键词： Big XML parallel programming Distributed programming MapReduce B-SLCA

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel method for documents similarity in a large dataset

引用

Journal of Computers (Taiwan) 2017年第3期28卷 251-264页

作者： Papias, Niyigena Zhang, Zuping School of Information Science and Engineering Central South University Changsha410083 China

Near-duplicate document detection attracts much attention from researchers since the growth of documents production is very high. The main problem confronted while looking for duplicate or near-duplicate document detection is a very high dimensional data which increases the time and space requirements for processing the data. With the trend of production of new documents, the system to detect similarity among documents becomes almost impracticable. We are proposing a new approach for solving this problem which consists in reducing the dimensionality of data and also use efficiently parallel programming to fully maximize the available capacity of the hardware. The intuition we have by using parallel programming is that more processors/core will perform better than only one processor if their management is well done. We have implemented our method and tested it empirically and experimental results have demonstrated that our algorithm performs better than other methods used for All Pairs Similarity Search (APSS) which employ multi-core and multi-programming to deduct the similarity of the documents. The results show that our method can reduce up to 65% terms to be used in similarity computation and its execution time is better than Partition-based Similarity Search method which uses parallel processing for document similarity.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Contextual abstraction in a type system for component-based high performance computing platforms

引用

SCIENCE OF COMPUTER programming 2016年第Part1期132卷 96-128页

作者： de Carvalho Junior, Francisco Heron Rezende, Cenez Araujo Silva, Jefferson de Carvalho Al-Alam, Wagner Guimaraes Uchoa de Alencar, Joao Marcelo Univ Fed Ceara Ciencia Comp Fortaleza CE Brazil

HTS (Hash Type System) is a type system designed for component-based high performance computing (CBHPC) platforms, aimed at reconciling portability, modularity by separation of concerns, a high-level of abstraction and high performance. Portability and modularity are properties of component-based systems that have been extensively validated. For improving the performance of HPC applications, HTS introduces an automated approach for dynamically discovering, loading and binding parallel components tuned for the characteristics of the parallel computing platforms where the application will execute. To do so, it is based on contextual abstraction, where the performance of components that encapsulate parallel computations, communication patterns and data structures may be tuned according to the features of parallel computing platforms and the application requirements. In turn, for providing a higher level of abstraction in parallel programming, HTS supports an expressive approach for skeleton-based programming. A study of the safety properties of HTS using a calculus of component composition has provided solid foundations for the design of configuration languages for the safe specification and deployment of parallel components. The features of HTS are validated with three case studies that exercise the programming techniques behind contextual abstraction, including skeletons and performance tuning. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Component-based software engineering High performance computing Component-based high performance computing parallel programming Type systems

来源：评论

学校读者我要写书评

暂无评论

parallel relaxation-based joint dynamic state estimation of large-scale power systems

引用

IET GENERATION TRANSMISSION & DISTRIBUTION 2016年第2期10卷 452-459页

作者： Karimipour, Hadis Dinavahi, Venkata Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2V4 Canada

Massive amounts of data generated in large-scale grids poses a formidable challenge for real-time monitoring of power systems. Dynamic state estimation which is a prerequisite for normal operation of power systems involves the time-constrained solution of a large set of equations which requires significant computational resources. In this study, an efficient and accurate relaxation-based parallel processing technique is proposed in the presence of phasor measurement units. A combination of different types of parallelism is used on both single and multiple graphic processing units to accelerate large-scale joint dynamic state estimation simulation. The estimation results for both generator and network states verify that proper massive-thread parallel programming makes the entire implementation scalable and efficient with high accuracy.

关键词： phasor measurement power system state estimation parallel programming graphics processing units parallel relaxation-based joint dynamic state estimation large-scale power systems phasor measurement units graphic processing units large-scale joint dynamic state estimation simulation massive-thread parallel programming

来源：评论

学校读者我要写书评

暂无评论

Power grid simulation applications developed using the GridPACK™ high performance computing framework

引用

ELECTRIC POWER SYSTEMS RESEARCH 2016年 141卷 22-30页

作者： Jin, Shuangshuang Chen, Yousu Diao, Ruisheng Huang, Zhenyu (Henry) Perkins, William Palmer, Bruce Pacific Northwest Natl Lab Richland WA 99352 USA

The need for accelerating power grid simulation through high performance computing (HPC) has long been recognized, and prior efforts have been devoted to developing one-off parallel computing applications for particular power grid functions. Non-transferable software codes and duplicated implementations in these prior efforts are a major barrier to more widespread HPC adoption in power grid applications. Modern HPC hardware and architecture require significant computing expertise for application development. The GridPACK (TM) software framework described in this paper provides an HPC-compatible software structure to access modern parallel solvers and HPC-ready modules for common components in power grid simulation applications. GridPACK hides the HPC details and enables power system developers to focus on applications instead of computational details. Several example applications of GridPACK are presented to demonstrate the capabilities of GridPACK and the performance of HPC simulations with large power grid networks. Examples discussed include: a dynamic simulation application capable of running a 17,156-bus Western Electricity Coordinating Council (WECC) system in a computational speed faster than real time (e.g., under 30 s for a 30-s simulation), a static contingency analysis application using a task manager, and a dynamic contingency analysis application utilizing two levels of parallelism. These example applications illustrate GridPACK's capabilities to support different types of simulations within a unified framework and. to support reuse of transferable software codes across power grid applications. The computational results indicate strong performance improvements for power grid simulations with GridPACK. (C) 2016 Elsevier B.V. All rights reserved.

关键词： High performance computing parallel programming Power system computation Power system dynamics Dynamic simulation Contingency analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：