检索结果-内蒙古大学图书馆

Parameter tuning for a cooperative parallel implementation of process-network synthesis algorithms

CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH 2019年第2期27卷 551-572页

作者： Bartos, Aniko Bertok, Botond Univ Pannonia Egyet Str 10 Veszprem Hungary

Process-network synthesis is the determination of the optimal network structure of a process system together with optimal configurations and capacities of the operating units incorporated into the system. The aim of developing more and more sophisticated solver algorithms is to find the optimum as fast as possible and increase the circle of practically solvable process synthesis problems. The P-graph framework can effectively reduce the number of structures to be examined and accelerate the computation searching for the optimum due to the exploitation of combinatorial characteristics of candidate solution structures. A cooperative parallel implementation of P-graph algorithms have been published recently to exploit the capabilities of multi-core and multiprocessor systems (Bartos and Bertok in De Gruyter Ser Logic Appl 1:303-313, 2015). The parallel implementation has increased performance significantly but this can be further improved by fine tuning the parameters of the parallel algorithm. Outcomes of experiments on parameter optimization are to be presented herein.

关键词： Graph and tree search parallel programming Process network synthesis P-graph Parameter tuning

来源：评论

学校读者我要写书评

暂无评论

FastMFDs: a fast, efficient algorithm for mining minimal functional dependencies from large-scale distributed data with Spark

引用

JOURNAL OF SUPERCOMPUTING 2019年第5期75卷 2497-2517页

作者： Cheng, Feng Yang, Zhe Soochow Univ Sch Comp Sci & Technol Suzhou 215006 Jiangsu Peoples R China

Minimal functional dependency is an important relationship in the relational database. It can describe some special relationships between complex and irregular attributes in the relational database. Extracting minimal functional dependencies (MFDs) from relational databases is an important database analysis technique. However, as the data grows larger and larger in size, even the most efficient stand-alone algorithms are exponential in the number of attributes of the relations. Discovering MFDs on a single computer is hard and slow, and it can only be applied to small centralized datasets. It is challenging to discover MFDs from big data, especially large-scale distributed data. Apache Spark is a unified analytics engine for big data processing;we present a new algorithm FastMFDs based on Spark for discovering all MFDs from large-scale distributed data in parallel. FastMFDs uses both the RDD framework and the DataFrame framework to store and process distributed data. FastMFDs deletes equivalent attributes. FastMFDs also provides two-way search algorithm for searching and pruning. We experimented our algorithm on real-life datasets, and our algorithm is more efficient and faster than the existing discovering methods.

关键词： Minimal functional dependency Big data parallel programming Spark

来源：评论

学校读者我要写书评

暂无评论

Enhanced global optimization methods applied to complex fisheries stock assessment models

引用

APPLIED SOFT COMPUTING 2019年 77卷 50-66页

作者： Penas, David R. Gomez, Andres Fraguela, Basilio B. Martin, Maria J. Cervino, Santiago Univ Santiago de Compostela MODESTYA Res Grp Dept Stat Math Anal & Optimizat Santiago De Compostela Spain Univ Santiago de Compostela Inst Math IMAT Santiago De Compostela Spain Univ Santiago de Compostela Galician Supercomp Ctr CESGA Santiago De Compostela Spain Univ A Coruna Grp Arquitectura Comp Fac Informat Campus Elvina S-N La Coruna 15071 Spain Ctr Oceanog Vigo Inst Espanol Oceanog POB 1552 Vigo 36200 Spain

Statistical fisheries models are frequently used by researchers and agencies to understand the behavior of marine ecosystems or to estimate the maximum acceptable catch of different species of commercial interest. The parameters of these models are usually adjusted through the use of optimization algorithms. Unfortunately, the choice of the best optimization method is far from trivial. This work proposes the use of population-based algorithms to improve the optimization process of the Globally applicable Area Disaggregated General Ecosystem Toolbox (Gadget), a flexible framework that allows the development of complex statistical marine ecosystem models. Specifically, parallel versions of the Differential Evolution (DE) and the Particle Swarm Optimization (PSO) methods are proposed. The proposals include an automatic selection of the internal parameters to reduce the complexity of their usage, and a restart mechanism to avoid local minima. The resulting optimization algorithms were called PMA (parallel Multirestart Adaptive) DE and PMA PSO respectively. Experimental results prove that the new algorithms are faster and produce more accurate solutions than the other parallel optimization methods already included in Gadget. Although the new proposals have been evaluated on fisheries models, there is nothing specific to the tested models in them, and thus they can be also applied to other optimization problems. Moreover, the PMA scheme proposed can be seen as a template that can be easily applied to other population-based heuristics. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Global optimization parallel programming Marine ecosystem models Particle Swarm Optimization Differential evolution

来源：评论

学校读者我要写书评

暂无评论

Automatic Cost Analysis for Imperative BSP Programs

引用

INTERNATIONAL JOURNAL OF parallel programming 2019年第2期47卷 184-212页

作者： Jakobsson, Arvid Univ Orleans INSA Ctr Val Loire LIFO EA 4022 Orleans France Huawei Technol France Res Ctr Boulogne France

Bulk Synchronous parallel (BSP) is a model for parallel computing with predictable scalability. BSP has a cost model: programs can be assigned a cost which describes their resource usage on any parallel machine. However, the programmer has to manually derive this cost. This paper describes an automatic method for the derivation of BSP program costs, based on classic cost analysis and approximation of polyhedral integer volumes. Our method requires and analyzes programs with textually aligned synchronization and textually aligned, polyhedral communication. We have implemented the analysis and our prototype obtains cost formulas that are parametric in the input parameters of the program and the parameters of the BSP computer and thus bound the cost of running the program with any input on any number of cores. We evaluate the cost formulas and find that they are indeed upper bounds, and tight for data-oblivious programs. Additionally, we evaluate their capacity to predict concrete run times in two parallel settings: a multi-core computer and a cluster. We find that when exact upper bounds can be found, they accurately predict run-times. In networks with full bisection bandwidth, as the BSP model supposes, results are promising with errors <50%.

关键词： parallel programming Bulk Synchronous parallelism Static analysis Cost analysis

来源：评论

学校读者我要写书评

暂无评论

Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures

引用

COMPUTER PHYSICS COMMUNICATIONS 2019年 235卷 305-323页

作者： Hadade, Ioan Wang, Feng Carnevale, Mauro di Mare, Luca Imperial Coll London Rolls Royce Vibrat UTC London SW7 2AZ England Univ Oxford Oxford Thermofluids Inst Oxford OX2 0ES England

This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skylake CPUs and the Intel Xeon Phi Knights Corner and Knights Landing manycore processors. We discuss and demonstrate their implementation in two distinct classes of computational kernels: face-based loops represented by the computation of fluxes and cell-based loops representing updates to state vectors. We present the importance of making efficient use of the underlying vector units in both classes of computational kernels with special emphasis on the changes required for vectorising face-based loops and their intrinsic indirect and irregular access patterns. We demonstrate the advantage of different data layouts for cell-centred as well as face data structures and architectural specific optimisations for improving the performance of gather and scatter operations which are prevalent in unstructured mesh applications. The implementation of a software prefetching strategy based on auto tuning is also shown along with an empirical evaluation on the importance of multithreading for in order architectures such as Knights Corner. We explore the various memory modes available on the Intel Xeon Phi Knights Landing architecture and present an approach whereby both traditional DRAM as well as MCDRAM interfaces are exploited for maximum performance. We obtain significant full application speed-ups between 2.8 and 3X across the multicore CPUs in two-socket node configurations, 8.6X on the Intel Xeon Phi Knights Corner coprocessor and 5.6X on the Intel Xeon Phi Knights Landing processor in an unstructured finite volume CFD code representative in size and complexity to an industrial application. Program summary Program Title: some_opt_for_unstructured_cfd Program Files doi: http://***/10.17632/zyh2zkf3jw.1 Licensing provisions: GNU General Public License 3 (GPL)

关键词： Unstructured grids Computational fluid dynamics Code optimisation High performance computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Heterogeneous Multi-Core Based Biomedical Application Processing System and programming Toolkit

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2019年第8期91卷 963-978页

作者： Hussain, Tassadaq Haider, Amna Taleb-Ahmed, Abdelmalik Riphah Int Univ Islamabad Pakistan UCERD Islamabad Pakistan Lab Ind & Human Automat Mech & Comp Sci Famars France Univ Valenciennes & Hainaut Cambresis Bat Malvache Famars France

Due to the growth of biological databases and biomedical instruments, the high performance active (real-time) signal processing becomes a challenge for medical scientists and engineers. The medical applications require a high-performance signal processor which can process the scientific and engineering biomedical applications and is easy to program. In this article, we have suggested a biomedical sensor interface and heterogeneous multi-core processing architecture based biomedical application processing system (BAPS) and biomedical applications toolkit. The biomedical sensor interface supports multiple regular and complex medical signals and provides digital data to the processing system. The BAPS uses heterogeneous multi-core architecture that processes biomedical applications with the performance up to 10 billion operations per sec and accuracy of 1 mu sec. The biomedical application toolkit provides programmability by giving support of hardware-level, scientific and artificial intelligence programming. The BAPS provides a single embedded platform solution to process a wide range of biomedical signal and image processing applications. To prove the importance of the proposed system, we developed the BAPS hardware architecture and tested it with different biomedical applications. When compared the results of BAPS with the baseline system, the results show that BAPS improves active (real-time) applications performance up to 12.8 times and processes passive (non-real-time) application 7.4 times faster and improves the 4.84-time performance of artificial intelligence application. While comparing the power and energy, the BAPS draws 1.56 times less dynamic power and consumes 21.85 times less energy.

关键词： FPGA Multi-core Embedded system HPC parallel programming Biomedical

来源：评论

学校读者我要写书评

暂无评论

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2019年第8期30卷 1768-1785页

作者： Shudler, Sergei Berens, Yannick Calotoiu, Alexandru Hoefler, Torsten Strube, Alexandre Wolf, Felix Argonne Natl Lab Lemont IL 60439 USA Tech Univ Darmstadt D-64289 Darmstadt Germany Swiss Fed Inst Technol CH-8092 Zurich Switzerland Julich Supercomp Ctr D-52425 Julich Germany

Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm and performance engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI and parallel sorting algorithms as examples, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.

关键词： Software engineering high performance computing parallel programming performance analysis performance modeling

来源：评论

学校读者我要写书评

暂无评论

On the maturity of parallel applications for asymmetric multi-core processors

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2019年 127卷 105-115页

作者： Chronaki, Kallia Moreto, Miguel Casas, Marc Rico, Alejandro Badia, Rosa M. Ayguade, Eduard Valero, Mateo Barcelona Supercomp Ctr Barcelona Spain ARM Richardson TX USA CSIC Artificial Intelligence Res Inst IIIA Madrid Spain

Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility power budget. This paper performs the first extensive evaluation of how portable are the current HPC applications for such supercomputing systems. Specifically we evaluate several execution models on an ARM *** AMC using the PARSEC benchmark suite that includes representative highly parallel applications. We compare schedulers at the user, OS and runtime levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system level as it improves the baseline by 23%, while the heterogeneous-aware OS scheduling solution improves the baseline by 10%. (C) 2019 Published by Elsevier Inc.

关键词： parallel programming Scheduling Runtime systems Asymmetric multi-cores HPC

来源：评论

学校读者我要写书评

暂无评论

A view of programming scalable data analysis: from clouds to exascale

引用

JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS 2019年第1期8卷 1-16页

作者： Talia, Domenico Univ Calabria DIMES Arcavacata Di Rende Italy

Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.

关键词： Big data analysis Cloud computing Exascale computing Data mining parallel programming Scalability

来源：评论

学校读者我要写书评

暂无评论

Novel circuit designs of memristor synapse and neuron

引用

NEUROCOMPUTING 2019年 330卷 11-16页

作者： Hong, Qinghui Zhao, Liang Wang, Xiaoping Huazhong Univ Sci & Technol Sch Automat Wuhan 430074 Hubei Peoples R China

In this work, novel circuits based on memristors for implementing electronic synapse and artificial neuron are designed. First, two simple synaptic circuits for implementing weighting calculations of voltage and current modes using twin memristors are proposed. A synaptic weighting operation is defined as a difference function between the twin memristors, which can be adjusted in reverse by applying programmed signals and conducting positive, zero, and negative synaptic weights. Second, two neuron circuits using the proposed memristor synapses, in which parallel computing and programming can be achieved, are designed. Finally, performances of the proposed memristor synapses and neuron circuits, such as weight programming, neuron computing, and parallel operation, are analyzed through PSpice simulations. (C) 2018 Elsevier B.V. All rightsreserved.

关键词： Memristor Synaptic circuit Neuron circuit parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：