检索结果-内蒙古大学图书馆

Revisiting the Bag-of-Visual-Words model: A hierarchical localization architecture for mobile systems

ROBOTICS AND AUTONOMOUS SYSTEMS 2019年 113卷 104-119页

作者： Bampis, Loukas Gasteratos, Antonios Democritus Univ Thrace Dept Prod & Management Engn 12 Vas Sophias GR-67132 Xanthi Greece

In this paper, an enhanced visual place recognition system is proposed aiming to improve the localization performance of a mobile platform. Our technique takes full advantage of the continuous input image stream in order to provide additional knowledge to the matching functionality. The well-established Bag-of-Visual-Words model is adapted into a hierarchical design that derives the visual information from the full entity of a natural scene into the description, while it additionally preserves the geometric structure of the explored world. Our approach is evaluated as part of a state-of-the-art Simultaneous-Localization and-Mapping algorithm, and parallelization techniques are exploited utilizing every available hardware module in a low-power device. The implemented algorithm has been tested on several publicly available datasets offering consistently accurate localization results and preventing the majority of redundant computations that the additional geometrical verifications can induce. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Localization Visual place recognition Mobile systems parallel programming

来源：评论

学校读者我要写书评

暂无评论

Parameter tuning for a cooperative parallel implementation of process-network synthesis algorithms

引用

CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH 2019年第2期27卷 551-572页

作者： Bartos, Aniko Bertok, Botond Univ Pannonia Egyet Str 10 Veszprem Hungary

Process-network synthesis is the determination of the optimal network structure of a process system together with optimal configurations and capacities of the operating units incorporated into the system. The aim of developing more and more sophisticated solver algorithms is to find the optimum as fast as possible and increase the circle of practically solvable process synthesis problems. The P-graph framework can effectively reduce the number of structures to be examined and accelerate the computation searching for the optimum due to the exploitation of combinatorial characteristics of candidate solution structures. A cooperative parallel implementation of P-graph algorithms have been published recently to exploit the capabilities of multi-core and multiprocessor systems (Bartos and Bertok in De Gruyter Ser Logic Appl 1:303-313, 2015). The parallel implementation has increased performance significantly but this can be further improved by fine tuning the parameters of the parallel algorithm. Outcomes of experiments on parameter optimization are to be presented herein.

关键词： Graph and tree search parallel programming Process network synthesis P-graph Parameter tuning

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of a Hybrid Computer Cluster Built on IBM POWER8 Microprocessors

引用

programming AND COMPUTER SOFTWARE 2019年第6期45卷 324-332页

作者： Mal'kovskii, S., I Sorokin, A. A. Korolev, S. P. Zatsarinnyi, A. A. Tsoi, G., I Russian Acad Sci Comp Ctr Far Eastern Branch Ul Kim Yu Chena 65 Khabarovsk 680000 Russia Russian Acad Sci Fed Res Ctr Comp Sci & Control Ul Vavilova 44-2 Moscow 119333 Russia

This paper is devoted to the performance evaluation of a hybrid computer cluster built on IBM POWER8 CPUs and NVIDIA Tesla P100 GPUs. The architecture of the computing system and software used are described. Results of experiments carried out using the STREAM, NPB, Crossroads/NERSC-9 DGEMM, and HPL packages are discussed. The efficiency of the simultaneous multithreading (SMT) technology supported by POWER8 processors, as well as the performance of some compilers, parallel programming and mathematical libraries, on this architecture is analyzed.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Enhanced global optimization methods applied to complex fisheries stock assessment models

引用

APPLIED SOFT COMPUTING 2019年 77卷 50-66页

作者： Penas, David R. Gomez, Andres Fraguela, Basilio B. Martin, Maria J. Cervino, Santiago Univ Santiago de Compostela MODESTYA Res Grp Dept Stat Math Anal & Optimizat Santiago De Compostela Spain Univ Santiago de Compostela Inst Math IMAT Santiago De Compostela Spain Univ Santiago de Compostela Galician Supercomp Ctr CESGA Santiago De Compostela Spain Univ A Coruna Grp Arquitectura Comp Fac Informat Campus Elvina S-N La Coruna 15071 Spain Ctr Oceanog Vigo Inst Espanol Oceanog POB 1552 Vigo 36200 Spain

Statistical fisheries models are frequently used by researchers and agencies to understand the behavior of marine ecosystems or to estimate the maximum acceptable catch of different species of commercial interest. The parameters of these models are usually adjusted through the use of optimization algorithms. Unfortunately, the choice of the best optimization method is far from trivial. This work proposes the use of population-based algorithms to improve the optimization process of the Globally applicable Area Disaggregated General Ecosystem Toolbox (Gadget), a flexible framework that allows the development of complex statistical marine ecosystem models. Specifically, parallel versions of the Differential Evolution (DE) and the Particle Swarm Optimization (PSO) methods are proposed. The proposals include an automatic selection of the internal parameters to reduce the complexity of their usage, and a restart mechanism to avoid local minima. The resulting optimization algorithms were called PMA (parallel Multirestart Adaptive) DE and PMA PSO respectively. Experimental results prove that the new algorithms are faster and produce more accurate solutions than the other parallel optimization methods already included in Gadget. Although the new proposals have been evaluated on fisheries models, there is nothing specific to the tested models in them, and thus they can be also applied to other optimization problems. Moreover, the PMA scheme proposed can be seen as a template that can be easily applied to other population-based heuristics. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Global optimization parallel programming Marine ecosystem models Particle Swarm Optimization Differential evolution

来源：评论

学校读者我要写书评

暂无评论

Automatic Cost Analysis for Imperative BSP Programs

引用

INTERNATIONAL JOURNAL OF parallel programming 2019年第2期47卷 184-212页

作者： Jakobsson, Arvid Univ Orleans INSA Ctr Val Loire LIFO EA 4022 Orleans France Huawei Technol France Res Ctr Boulogne France

Bulk Synchronous parallel (BSP) is a model for parallel computing with predictable scalability. BSP has a cost model: programs can be assigned a cost which describes their resource usage on any parallel machine. However, the programmer has to manually derive this cost. This paper describes an automatic method for the derivation of BSP program costs, based on classic cost analysis and approximation of polyhedral integer volumes. Our method requires and analyzes programs with textually aligned synchronization and textually aligned, polyhedral communication. We have implemented the analysis and our prototype obtains cost formulas that are parametric in the input parameters of the program and the parameters of the BSP computer and thus bound the cost of running the program with any input on any number of cores. We evaluate the cost formulas and find that they are indeed upper bounds, and tight for data-oblivious programs. Additionally, we evaluate their capacity to predict concrete run times in two parallel settings: a multi-core computer and a cluster. We find that when exact upper bounds can be found, they accurately predict run-times. In networks with full bisection bandwidth, as the BSP model supposes, results are promising with errors <50%.

关键词： parallel programming Bulk Synchronous parallelism Static analysis Cost analysis

来源：评论

学校读者我要写书评

暂无评论

Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures

引用

COMPUTER PHYSICS COMMUNICATIONS 2019年 235卷 305-323页

作者： Hadade, Ioan Wang, Feng Carnevale, Mauro di Mare, Luca Imperial Coll London Rolls Royce Vibrat UTC London SW7 2AZ England Univ Oxford Oxford Thermofluids Inst Oxford OX2 0ES England

This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skylake CPUs and the Intel Xeon Phi Knights Corner and Knights Landing manycore processors. We discuss and demonstrate their implementation in two distinct classes of computational kernels: face-based loops represented by the computation of fluxes and cell-based loops representing updates to state vectors. We present the importance of making efficient use of the underlying vector units in both classes of computational kernels with special emphasis on the changes required for vectorising face-based loops and their intrinsic indirect and irregular access patterns. We demonstrate the advantage of different data layouts for cell-centred as well as face data structures and architectural specific optimisations for improving the performance of gather and scatter operations which are prevalent in unstructured mesh applications. The implementation of a software prefetching strategy based on auto tuning is also shown along with an empirical evaluation on the importance of multithreading for in order architectures such as Knights Corner. We explore the various memory modes available on the Intel Xeon Phi Knights Landing architecture and present an approach whereby both traditional DRAM as well as MCDRAM interfaces are exploited for maximum performance. We obtain significant full application speed-ups between 2.8 and 3X across the multicore CPUs in two-socket node configurations, 8.6X on the Intel Xeon Phi Knights Corner coprocessor and 5.6X on the Intel Xeon Phi Knights Landing processor in an unstructured finite volume CFD code representative in size and complexity to an industrial application. Program summary Program Title: some_opt_for_unstructured_cfd Program Files doi: http://***/10.17632/zyh2zkf3jw.1 Licensing provisions: GNU General Public License 3 (GPL)

关键词： Unstructured grids Computational fluid dynamics Code optimisation High performance computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Heterogeneous Multi-Core Based Biomedical Application Processing System and programming Toolkit

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2019年第8期91卷 963-978页

作者： Hussain, Tassadaq Haider, Amna Taleb-Ahmed, Abdelmalik Riphah Int Univ Islamabad Pakistan UCERD Islamabad Pakistan Lab Ind & Human Automat Mech & Comp Sci Famars France Univ Valenciennes & Hainaut Cambresis Bat Malvache Famars France

Due to the growth of biological databases and biomedical instruments, the high performance active (real-time) signal processing becomes a challenge for medical scientists and engineers. The medical applications require a high-performance signal processor which can process the scientific and engineering biomedical applications and is easy to program. In this article, we have suggested a biomedical sensor interface and heterogeneous multi-core processing architecture based biomedical application processing system (BAPS) and biomedical applications toolkit. The biomedical sensor interface supports multiple regular and complex medical signals and provides digital data to the processing system. The BAPS uses heterogeneous multi-core architecture that processes biomedical applications with the performance up to 10 billion operations per sec and accuracy of 1 mu sec. The biomedical application toolkit provides programmability by giving support of hardware-level, scientific and artificial intelligence programming. The BAPS provides a single embedded platform solution to process a wide range of biomedical signal and image processing applications. To prove the importance of the proposed system, we developed the BAPS hardware architecture and tested it with different biomedical applications. When compared the results of BAPS with the baseline system, the results show that BAPS improves active (real-time) applications performance up to 12.8 times and processes passive (non-real-time) application 7.4 times faster and improves the 4.84-time performance of artificial intelligence application. While comparing the power and energy, the BAPS draws 1.56 times less dynamic power and consumes 21.85 times less energy.

关键词： FPGA Multi-core Embedded system HPC parallel programming Biomedical

来源：评论

学校读者我要写书评

暂无评论

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2019年第8期30卷 1768-1785页

作者： Shudler, Sergei Berens, Yannick Calotoiu, Alexandru Hoefler, Torsten Strube, Alexandre Wolf, Felix Argonne Natl Lab Lemont IL 60439 USA Tech Univ Darmstadt D-64289 Darmstadt Germany Swiss Fed Inst Technol CH-8092 Zurich Switzerland Julich Supercomp Ctr D-52425 Julich Germany

Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm and performance engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI and parallel sorting algorithms as examples, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.

关键词： Software engineering high performance computing parallel programming performance analysis performance modeling

来源：评论

学校读者我要写书评

暂无评论

MPGP-QOC: Multi-programming and graph-partition-based QOC for QNN inference

引用

Future Generation Computer Systems 2026年 174卷

作者： Yiding Liu China North Vehicle Research Institute No. 4 Courtyard Huishuling Beijing 100072 China

Quantum neural networks (QNNs) in quantum computing hold promise for transforming machine learning, potentially offering advantages over classical computers. Their unique properties and quantum parallelism open avenues for exploring enhanced computational capabilities in the quantum domain, presenting intriguing opportunities for advancing machine learning applications. However, building efficient QNNs inference remains a challenge due to the high computational complexity, the difficulty in optimizing the quantum circuits, and the underutilization of current quantum hardware. To address these challenges, we introduce a new QNNs inference framework named Multi-programming and Graph- Partition-based Quantum Optimal Control (MPGP-QOC) that combines parallelization, quantum optimal control and graph partitioning with parameterized quantum circuits. Our framework is designed to be compatible with existing quantum software and hardware platforms, making it easy to implement and experimentally validate. We demonstrate the effectiveness of our framework by applying it to several quantum machine learning classification tasks with different QNNs configurations. Our experimental results show that our MPGP-QOC achieves significant speedup by 10.1 × (up to 10.4 × ) over the state-of-the-art QNNs inference, with notable compilation reduction (up to 6.5 × ) while maintaining a comparable level of accuracy (up to 6% improvement)

关键词： Quantum computing Quantum neural networks Quantum optimal control parallel programming Graph partitioning

来源：评论

学校读者我要写书评

暂无评论

On the maturity of parallel applications for asymmetric multi-core processors

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2019年 127卷 105-115页

作者： Chronaki, Kallia Moreto, Miguel Casas, Marc Rico, Alejandro Badia, Rosa M. Ayguade, Eduard Valero, Mateo Barcelona Supercomp Ctr Barcelona Spain ARM Richardson TX USA CSIC Artificial Intelligence Res Inst IIIA Madrid Spain

Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility power budget. This paper performs the first extensive evaluation of how portable are the current HPC applications for such supercomputing systems. Specifically we evaluate several execution models on an ARM *** AMC using the PARSEC benchmark suite that includes representative highly parallel applications. We compare schedulers at the user, OS and runtime levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system level as it improves the baseline by 23%, while the heterogeneous-aware OS scheduling solution improves the baseline by 10%. (C) 2019 Published by Elsevier Inc.

关键词： parallel programming Scheduling Runtime systems Asymmetric multi-cores HPC

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：