检索结果-内蒙古大学图书馆

Stream parallelism with ordered data constraints on multi-core systems

JOURNAL OF SUPERCOMPUTING 2019年第8期75卷 4042-4061页

作者： Griebler, Dalvan Hoffmann, Renato B. Danelutto, Marco Fernandes, Luiz G. Pontificia Univ Catolica Rio Grande do Sul Fac Informat Porto Alegre RS Brazil Univ Pisa Dept Comp Sci Pisa Italy

It is often a challenge to keep input/output tasks/results in order for parallel computations over data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads.

关键词： parallel programming parallel stream processing parallel data compression parallel video streaming

来源：评论

学校读者我要写书评

暂无评论

A parallel-Computing Algorithm for High-Energy Physics Particle Tracking and Decoding Using GPU Architectures

引用

IEEE ACCESS 2019年 7卷 91612-91626页

作者： Fernandez Declara, Placido Campora Perez, Daniel Hugo Garcia-Blas, Javier Vom Bruch, Dorothea Daniel Garcia, J. Neufeld, Niko CERN EP LBC CH-1211 Geneva Switzerland Univ Carlos III Madrid Dept Comp Sci & Engn Madrid 28911 Spain Univ Seville ETSI Informat E-41012 Seville Spain Sorbonne Univ Paris Diderot Sorbonne Paris Cite LPNHE CNRSIN2P3 F-75005 Paris France

Real-time data processing is one of the central processes of particle physics experiments which require large computing resources. The LHCb (Large Hadron Collider beauty) experiment will be upgraded to cope with a particle bunch collision rate of 30 million times per second, producing 10(9) particles/s. 40 Tbits/s need to be processed in real-time to make filtering decisions to store data. This poses a computing challenge that requires exploration of modern hardware and software solutions. We present Compass, a particle tracking algorithm and a parallel raw input decoding optimized for GPUs. It is designed for highly parallel architectures, data-oriented, and optimized for fast and localized data access. Our algorithm is configurable, and we explore the trade-off in computing and physics performance of various configurations. A CPU implementation that delivers the same physics performance as our GPU implementation is presented. We discuss the achieved physics performance and validate it with Monte Carlo simulated data. We show a computing performance analysis comparing consumer and server-grade GPUs, and a CPU. We show the feasibility of using a full GPU decoding and particle tracking algorithm for high-throughput particle trajectories reconstruction, where our algorithm improves the throughput up to 7.4 x compared to the LHCb baseline.

关键词： CUDA GPGPU track reconstruction particle tracking parallel programming

来源：评论

学校读者我要写书评

暂无评论

Formal specification and implementation of an automated pattern-based parallel-code generation framework

引用

INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER 2019年第2期21卷 183-202页

作者： Perez, Gervasio Yovine, Sergio Consejo Nacl Invest Cient & Tecn ICC Buenos Aires DF Argentina Univ Buenos Aires Buenos Aires DF Argentina

programming correct parallel software in a cost-effective way is a challenging task requiring a high degree of expertise. As an attempt to overcoming the pitfalls undermining parallel programming, this paper proposes a pattern-based, formally grounded tool that eases writing parallel code by automatically generating platform-dependent programs from high-level, platform-independent specifications. The tool builds on three pillars: (1) a platform-agnostic parallel programming pattern, called PCR, (2) a formal translation of PCRs into a parallel execution model, namely Concurrent Collections (CnC), and (3) a program rewriting engine that generates code for a concrete runtime implementing CnC. The experimental evaluation carried out gives evidence that code produced from PCRs can deliver performance metrics which are comparable with handwritten code but with assured correctness. The technical contribution of this paper is threefold. First, it discusses a parallel programming pattern, called PCR, consisting of producers, consumers, and reducers which operate concurrently on data sets. To favor correctness, the semantics of PCRs is mathematically defined in terms of the formalism FXML. PCRs are shown to be composable and to seamlessly subsume other well-known parallel programming patterns, thus providing a framework for heterogeneous designs. Second, it formally shows how the PCR pattern can be correctly implemented in terms of a more concrete parallel execution model. Third, it proposes a platform-agnostic C++ template library to express PCRs. It presents a prototype source-to-source compilation tool, based on C++ template rewriting, which automatically generates parallel implementations relying on the Intel CnC C++ library.

关键词： Formal methods Software design patterns parallel programming Automated code generation

来源：评论

学校读者我要写书评

暂无评论

Effective parallel Computing via a Free Stale Synchronous parallel Strategy

引用

IEEE ACCESS 2019年 7卷 118764-118775页

作者： Shi, Hang Zhao, Yue Zhang, Bofeng Yoshigoe, Kenji Chang, Furong Shanghai Univ Sch Comp Engn & Sci Shanghai 200444 Peoples R China Toyo Univ Fac Informat Networking Innovat & Design INIAD Tokyo 1128606 Japan Kashi Univ Sch Comp Sci & Technol Xinjiang 844008 Peoples R China

As the data becomes bigger and more complex, people tend to process it in a distributed system implemented on clusters. Due to the power consumption, cost, and differentiated price-performance, the clusters are evolving into the system with heterogeneous hardware leading to the performance difference among the nodes. Even in a homogeneous cluster, the performance of the nodes is different due to the resource competition and the communication cost. Some nodes with poor performance will drag down the efficiency of the whole system. Existing parallel computing strategies such as bulk synchronous parallel strategy and stale synchronous parallel strategy are not well suited to this problem. To address it, we proposed a free stale synchronous parallel (FSSP) strategy to free the system from the negative impact of those nodes. FSSP is improved from stale synchronous parallel (SSP) strategy, which can effectively and accurately figure out the slow nodes and eliminate the negative effects of those nodes. We validated the performance of the FSSP strategy by using some classical machine learning algorithms and datasets. Our experimental results demonstrated that FSSP was 1.5-12x faster than the bulk synchronous parallel strategy and stale synchronous parallel strategy, and it used 4x fewer iterations than the asynchronous parallel strategy to converge.

关键词： Straggler parallel strategy parallel programming

来源：评论

学校读者我要写书评

暂无评论

Adaptation of an Iterative PCA to a Manycore Architecture for Hyperspectral Image Processing

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2019年第7期91卷 759-771页

作者： Lazcano, R. Madronal, D. Fabelo, H. Ortega, S. Salvador, R. Callico, G. M. Juarez, E. Sanz, C. UPM Ctr Software Technol & Multimedia Syst CITSEM Madrid Spain ULPGC Res Inst Appl Microelect IUMA Las Palmas Gran Canaria Spain

This paper presents a study of the adaptation of a Non-Linear Iterative Partial Least Squares (NIPALS) algorithm applied to Hyperspectral Imaging to a Massively parallel Processor Array manycore architecture, which assembles 256 cores distributed over 16 clusters. This work aims at optimizing the internal communications of the platform to achieve real-time processing of large data volumes with limited computational resources and memory bandwidth. As hyperspectral images are composed of extensive volumes of spectral information, real-time requirements, which are upper-bounded by the image capture rate of the hyperspectral sensor, are a challenging objective. To address this issue, the image size is usually reduced prior to the processing phase, which is itself a computationally intensive task. Consequently, this paper proposes an analysis of the intrinsic parallelism and the data dependency within the NIPALS algorithm and its subsequent implementation on a manycore architecture. Furthermore, this implementation has been validated against three hyperspectral images extracted from both remote sensing and medical datasets. As a result, an average speedup of 17x has been achieved when compared to the sequential version. Finally, this approach has been compared with other state-of-the-art implementations, outperforming them in terms of performance.

关键词： NIPALS-PCA Hyperspectral imaging Massively parallel processing Real-time processing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of a Hybrid Computer Cluster Built on IBM POWER8 Microprocessors

引用

programming AND COMPUTER SOFTWARE 2019年第6期45卷 324-332页

作者： Mal'kovskii, S., I Sorokin, A. A. Korolev, S. P. Zatsarinnyi, A. A. Tsoi, G., I Russian Acad Sci Comp Ctr Far Eastern Branch Ul Kim Yu Chena 65 Khabarovsk 680000 Russia Russian Acad Sci Fed Res Ctr Comp Sci & Control Ul Vavilova 44-2 Moscow 119333 Russia

This paper is devoted to the performance evaluation of a hybrid computer cluster built on IBM POWER8 CPUs and NVIDIA Tesla P100 GPUs. The architecture of the computing system and software used are described. Results of experiments carried out using the STREAM, NPB, Crossroads/NERSC-9 DGEMM, and HPL packages are discussed. The efficiency of the simultaneous multithreading (SMT) technology supported by POWER8 processors, as well as the performance of some compilers, parallel programming and mathematical libraries, on this architecture is analyzed.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Revisiting the Bag-of-Visual-Words model: A hierarchical localization architecture for mobile systems

引用

ROBOTICS AND AUTONOMOUS SYSTEMS 2019年 113卷 104-119页

作者： Bampis, Loukas Gasteratos, Antonios Democritus Univ Thrace Dept Prod & Management Engn 12 Vas Sophias GR-67132 Xanthi Greece

In this paper, an enhanced visual place recognition system is proposed aiming to improve the localization performance of a mobile platform. Our technique takes full advantage of the continuous input image stream in order to provide additional knowledge to the matching functionality. The well-established Bag-of-Visual-Words model is adapted into a hierarchical design that derives the visual information from the full entity of a natural scene into the description, while it additionally preserves the geometric structure of the explored world. Our approach is evaluated as part of a state-of-the-art Simultaneous-Localization and-Mapping algorithm, and parallelization techniques are exploited utilizing every available hardware module in a low-power device. The implemented algorithm has been tested on several publicly available datasets offering consistently accurate localization results and preventing the majority of redundant computations that the additional geometrical verifications can induce. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Localization Visual place recognition Mobile systems parallel programming

来源：评论

学校读者我要写书评

暂无评论

Parameter tuning for a cooperative parallel implementation of process-network synthesis algorithms

引用

CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH 2019年第2期27卷 551-572页

作者： Bartos, Aniko Bertok, Botond Univ Pannonia Egyet Str 10 Veszprem Hungary

Process-network synthesis is the determination of the optimal network structure of a process system together with optimal configurations and capacities of the operating units incorporated into the system. The aim of developing more and more sophisticated solver algorithms is to find the optimum as fast as possible and increase the circle of practically solvable process synthesis problems. The P-graph framework can effectively reduce the number of structures to be examined and accelerate the computation searching for the optimum due to the exploitation of combinatorial characteristics of candidate solution structures. A cooperative parallel implementation of P-graph algorithms have been published recently to exploit the capabilities of multi-core and multiprocessor systems (Bartos and Bertok in De Gruyter Ser Logic Appl 1:303-313, 2015). The parallel implementation has increased performance significantly but this can be further improved by fine tuning the parameters of the parallel algorithm. Outcomes of experiments on parameter optimization are to be presented herein.

关键词： Graph and tree search parallel programming Process network synthesis P-graph Parameter tuning

来源：评论

学校读者我要写书评

暂无评论

FastMFDs: a fast, efficient algorithm for mining minimal functional dependencies from large-scale distributed data with Spark

引用

JOURNAL OF SUPERCOMPUTING 2019年第5期75卷 2497-2517页

作者： Cheng, Feng Yang, Zhe Soochow Univ Sch Comp Sci & Technol Suzhou 215006 Jiangsu Peoples R China

Minimal functional dependency is an important relationship in the relational database. It can describe some special relationships between complex and irregular attributes in the relational database. Extracting minimal functional dependencies (MFDs) from relational databases is an important database analysis technique. However, as the data grows larger and larger in size, even the most efficient stand-alone algorithms are exponential in the number of attributes of the relations. Discovering MFDs on a single computer is hard and slow, and it can only be applied to small centralized datasets. It is challenging to discover MFDs from big data, especially large-scale distributed data. Apache Spark is a unified analytics engine for big data processing;we present a new algorithm FastMFDs based on Spark for discovering all MFDs from large-scale distributed data in parallel. FastMFDs uses both the RDD framework and the DataFrame framework to store and process distributed data. FastMFDs deletes equivalent attributes. FastMFDs also provides two-way search algorithm for searching and pruning. We experimented our algorithm on real-life datasets, and our algorithm is more efficient and faster than the existing discovering methods.

关键词： Minimal functional dependency Big data parallel programming Spark

来源：评论

学校读者我要写书评

暂无评论

Enhanced global optimization methods applied to complex fisheries stock assessment models

引用

APPLIED SOFT COMPUTING 2019年 77卷 50-66页

作者： Penas, David R. Gomez, Andres Fraguela, Basilio B. Martin, Maria J. Cervino, Santiago Univ Santiago de Compostela MODESTYA Res Grp Dept Stat Math Anal & Optimizat Santiago De Compostela Spain Univ Santiago de Compostela Inst Math IMAT Santiago De Compostela Spain Univ Santiago de Compostela Galician Supercomp Ctr CESGA Santiago De Compostela Spain Univ A Coruna Grp Arquitectura Comp Fac Informat Campus Elvina S-N La Coruna 15071 Spain Ctr Oceanog Vigo Inst Espanol Oceanog POB 1552 Vigo 36200 Spain

Statistical fisheries models are frequently used by researchers and agencies to understand the behavior of marine ecosystems or to estimate the maximum acceptable catch of different species of commercial interest. The parameters of these models are usually adjusted through the use of optimization algorithms. Unfortunately, the choice of the best optimization method is far from trivial. This work proposes the use of population-based algorithms to improve the optimization process of the Globally applicable Area Disaggregated General Ecosystem Toolbox (Gadget), a flexible framework that allows the development of complex statistical marine ecosystem models. Specifically, parallel versions of the Differential Evolution (DE) and the Particle Swarm Optimization (PSO) methods are proposed. The proposals include an automatic selection of the internal parameters to reduce the complexity of their usage, and a restart mechanism to avoid local minima. The resulting optimization algorithms were called PMA (parallel Multirestart Adaptive) DE and PMA PSO respectively. Experimental results prove that the new algorithms are faster and produce more accurate solutions than the other parallel optimization methods already included in Gadget. Although the new proposals have been evaluated on fisheries models, there is nothing specific to the tested models in them, and thus they can be also applied to other optimization problems. Moreover, the PMA scheme proposed can be seen as a template that can be easily applied to other population-based heuristics. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Global optimization parallel programming Marine ecosystem models Particle Swarm Optimization Differential evolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：