检索结果-内蒙古大学图书馆

National Computing Colleges Conference (NCCC), 2021

作者： Fahd A. Alhaidari Maissa A. Al Metrik College of Computer Science and Information Technology Imam Abdulrahman Bin Faisal University Dammam Saudi Arabia

ISBN: (纸本)9781728167206

With the spread of multi-core systems, parallel programming increased in popularity. However, parallelizing algorithms in some cases yield negative results due to overhead. Additionally, implementing parallel algorithms is not always an easy or achievable task. Therefore, finding out to what extent a multi-core architecture can aid in the enhancement of the algorithm's speedup could become extremely beneficial. This paper studies and calculates the execution time and speedup of three of the most popular divide and conquer algorithms (Merge sort, quick sort, and matrix multiplication), the conducted experiments tested against various array sizes. The experiments take place on three different multi-core machines ranging from a dual-core CPU to a hexa-core CPU. The obtained results conclude that speedup is directly proportional to the number of CPU cores, such that using a hexa-core CPU in lieu of a dual-core CPU can achieve a speedup up to twice as fast. Thus, utilizing powerful multi-core CPU's could rival the use of parallelism on a standard CPU.

关键词： Program processors Multicore processing parallel programming Computational modeling Distance measurement Time complexity Task analysis

来源：评论

学校读者我要写书评

暂无评论

The parallel optimization based on the PVS algorithm and research on the evaluation function in the Game of the Amazons

The parallel optimization based on the PVS algorithm and res...

引用

第33届中国控制与决策会议

作者： Haoyu Wang Hongkun Qiu School of Computer Science Shenyang Aerospace University Engineering Training Center Shenyang Aerospace University

The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the *** the same time,we also did some research and experiments on the evaluation function of Amazon chess which provided a set of available Amazon evaluation functions and parameter adjustment results for reference.

关键词： Computer Game Amazon Game PVS parallel programming Evaluation Function

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of Multi-core DSP parallel Compiler Based on Otsu Method 20

Design and Implementation of Multi-core DSP Parallel Compile...

引用

Proceedings of the 4th International Conference on Advances in Image Processing

作者： Tianxu Zhang Fanchen Meng Wuhan Institute of Technology China and Huazhong University of Science and Technology China Wuhan Institute of Technology China

ISBN: (纸本)9781450388368

This paper introduces the principle of the three classical and widely applied local value methods, including Otsu method, maximum entropy method and iterative method. It runs on VS2010 (Microsoft Visual Studio 2010) platform, compares and analyzes it. And then, selects Otsu method with relatively good results to transplant in standard C language on CCS (Code Composer Studio) platform. A multi-core DSP (Digital Signal Processor) is established. After the TMS320C6678 environment, the OpenMP framework is used for parallel processing to optimize the Otsu method for fork-Join mode is used for parallel computing. Two cores, four cores and eight cores are used for fast processing of the Otsu method, summarize the law of speed increase. The results show that the parallel implementation of the digital image processing algorithm based on multi-core DSP in this paper can effectively improve the running speed on the basis of ensuring the accuracy of the Otsu method.

关键词： parallel programming Multi-core DSP Otsu method Digital image processing

来源：评论

学校读者我要写书评

暂无评论

On the parallel programmability of JavaSymphony for multi-cores and clusters

引用

INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING 2019年第4期30卷 247-264页

作者： Aleem, Muhammad Prodan, Radu Islam, Muhammad Arshad Iqbal, Muhammad Azhar Capital Univ Sci & Technol Dept Comp Sci Islamabad 44000 Pakistan Univ Innsbruck Inst Comp Sci A-6020 Innsbruck Austria

This paper explains the programming aspects of a promising Java-based programming and execution framework called JavaSymphony. JavaSymphony provides unified high-level programming constructs for applications related to shared, distributed, hybrid memory parallel computers, and co-processors accelerators. JavaSymphony applications can be executed on multi/many-core conventional and data-parallel architectures. JavaSymphony is based on the concept of dynamic virtual architectures, which allows programmers to define a hierarchical structure of the underlying computing resources and to control load-balancing and task-locality. In addition to GPU support, JavaSymphony provides a multi-core aware scheduling mechanism capable of mapping parallel applications on large multi-core machines and heterogeneous clusters. Several real applications and benchmarks (on modern multi-core computers, heterogeneous clusters, and machines consisting of a combination of different multi-core CPU and GPU devices) have been used to evaluate the performance. The results demonstrate that the JavaSymphony outperforms the Java implementations, as well as other modern alternative solutions.

关键词： parallel programming Java multi-core scheduler GPU computing

来源：评论

学校读者我要写书评

暂无评论

Kernel Tuner: A search-optimizing GPU code auto-tuner

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2019年 90卷 347-358页

作者： van Werkhoven, Ben Netherlands eSci Ctr Sci Pk 140 NL-1098 XG Amsterdam Netherlands

A very common problem in GPU programming is that some combination of thread block dimensions and other code optimization parameters, like tiling or unrolling factors, results in dramatically better performance than other kernel configurations. To obtain highly-efficient kernels it is often required to search vast and discontinuous search spaces that consist of all possible combinations of values for all tunable parameters. This paper presents Kernel Tuner, an easy-to-use tool for testing and auto-tuning OpenCL, CUDA, and C kernels with support for many search optimization algorithms that accelerate the tuning process. This paper introduces the application of many new solvers and global optimization algorithms for auto-tuning GPU applications. We demonstrate that Kernel Tuner can be used in a wide range of application scenarios and drastically decreases the time spent tuning, e.g. tuning a GEMM kernel on AMD Vega Frontier Edition 71.2x faster than brute force search. (C) 2018 The Author. Published by Elsevier B.V.

关键词： GPU computing Auto-tuning parallel programming Performance optimization Software development

来源：评论

学校读者我要写书评

暂无评论

Fast and Communication-Efficient Algorithm for Distributed Support Vector Machine Training

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2019年第5期30卷 1065-1076页

作者： Dass, Jyotikrishna Sarin, Vivek Mahapatra, Rabi N. Texas A&M Univ Dept Comp Sci & Engn College Stn TX 77840 USA

Support Vector Machines (SVM) are widely used as supervised learning models to solve the classification problem in machine learning. Training SVMs for large datasets is an extremely challenging task due to excessive storage and computational requirements. To tackle so-called big data problems, one needs to design scalable distributed algorithms to parallelize the model training and to develop efficient implementations of these algorithms. In this paper, we propose a distributed algorithm for SVM training that is scalable and communication-efficient. The algorithm uses a compact representation of the kernel matrix, which is based on the QR decomposition of low-rank approximations, to reduce both computation and storage requirements for the training stage. This is accompanied by considerable reduction in communication required for a distributed implementation of the algorithm. Experiments on benchmark data sets with up to five million samples demonstrate negligible communication overhead and scalability on up to 64 cores. Execution times are vast improvements over other widely used packages. Furthermore, the proposed algorithm has linear time complexity with respect to the number of samples making it ideal for SVM training on decentralized environments such as smart embedded systems and edge-based internet of things, IoT.

关键词： Machine learning support vector machines classification algorithms parallel programming distributed computing message passing quadratic programming iterative algorithms optimization multicore processing

来源：评论

学校读者我要写书评

暂无评论

Highly Scalable parallel Checksums

Highly Scalable Parallel Checksums

引用

International Conference on parallel and Distributed Systems (ICPADS)

作者： Christian Siebert Heinrich Heine University Düsseldorf Centre for Information and Media Technology Düsseldorf Germany

ISBN: (纸本)9781665408790

Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with huge amounts of data, which introduces failures that may remain undetected. Therefore, additional protection becomes a necessity at large scale. However, checking the integrity of larger data sets, especially in case of distributed data, clearly requires parallel approaches. We show how popular checksums, such as CRC-32 or Adler-32, can be parallelized efficiently. This also disproves a widespread belief that parallelizing aforementioned checksums, especially in a scalable way, is not possible. The mathematical properties behind these checksums enable a method to combine partial checksums such that its result corresponds to the checksum of the concatenated partial data. Our parallel checksum algorithm utilizes this combination idea in a scalable hierarchical reduction scheme to combine the partial checksums from an arbitrary number of processing elements. Although this reduction scheme can be implemented manually using most parallel programming interfaces, we use the Message Passing Interface, which supports such a functionality directly via non-commutative user-defined reduction operations. In conjunction with the efficient checksum capabilities of the zlib library, our algorithm can not only be implemented conveniently and in a portable way, but also very efficiently. Additional shared-memory parallelization within compute nodes completes our hybrid parallel checksum solutions, which show a high scalability of up to 524,288 threads. At this scale, computing the checksums of 240 TiB data took only 3.4 seconds for CRC-32 and 2.6 seconds for Adler-32. Finally, we discuss the APES application as a representative of dynamic supercomputer applications. Thanks to our scalable checksum algorithm, even such applications are now able to detect many errors withi

关键词： Runtime parallel programming Heuristic algorithms Scalability Message passing Distributed databases Supercomputers

来源：评论

学校读者我要写书评

暂无评论

The parallel Multi-Mode Digraph Task Model for Energy-Aware Real-Time Heterogeneous Multi-Core Systems

引用

IEEE TRANSACTIONS ON COMPUTERS 2019年第10期68卷 1511-1524页

作者： Zahaf, Houssam-Eddine Lipari, Giuseppe Bertogna, Marko Boulet, Pierre Univ Lille CNRS Cent Lille UMR 9189CRIStALCtr Rech Informat Signal & Autom F-59000 Lille France Univ Modena Reggio & Emilia I-41121 Modena MO Italy

Many task models have been proposed to express and analyze the behavior of real-time applications at different levels of precision. Most of them target sequential applications with no support for parallelism. The digraph task model is one of the most general ones, as it allows modeling arbitrary directed graphs (digraphs) for sequential job releases. In this paper, we extend the digraph task model to support intra-task parallelism. For the proposed parallel multi-mode digraph model, we derive sufficient schedulability tests and a dichotomic search to improve the test pessimism for a set of n tasks onto a heterogeneous single-ISA multi-core platform. To reduce the computational complexity of the schedulability test, we also propose heuristics for (i) partitioning parallel digraph tasks onto the heterogeneous cores, and (ii) assigning core operating frequencies to reduce the overall energy consumption, while meeting real-time constraints. The effectiveness of the proposed approach is validated with an exhaustive set of simulations.

关键词： Real-time systems digraph parallel programming energy aware

来源：评论

学校读者我要写书评

暂无评论

ParDSL: a domain-specific language framework for supporting deployment of parallel algorithms

引用

SOFTWARE AND SYSTEMS MODELING 2019年第5期18卷 2907-2935页

作者： Tekinerdogan, Bedir Arkin, Ethem Wageningen Univ Informat Technol Wageningen Netherlands Aselsan Ankara Turkey

An important challenge in parallel computing is the mapping of parallel algorithms to parallel computing platforms. This requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform and the implementation and deployment of the algorithm to the computing platform. However, in current parallel computing approaches very often only conceptual and idiosyncratic models are used which fall short in supporting the communication and analysis of the design decisions. In this article, we present ParDSL, a domain-specific language framework for providing explicit models to support the activities for mapping parallel algorithms to parallel computing platforms. The language framework includes four coherent set of domain-specific languages each of which focuses on an activity of the mapping process. We use the domain-specific languages for modeling the design as well as for generating the required platform-specific models and the code of the selected parallel algorithm. In addition to the languages, a library is defined to support systematic reuse. We discuss the overall architecture of the language framework, the separate DSLs, the corresponding model transformations and the toolset. The framework is illustrated for four different parallel computing algorithms.

关键词： Model-driven software development parallel programming High-performance computing Domain-specific language Architecture framework

来源：评论

学校读者我要写书评

暂无评论

MADOKA: an ultra-fast approach for large-scale protein structure similarity searching

引用

BMC BIOINFORMATICS 2019年第1期20卷 662-662页

作者： Deng, Lei Zhong, Guolun Liu, Chenzhe Luo, Judong Liu, Hui Cent South Univ Sch Comp Sci & Engn Changsha 410075 Peoples R China Nanjing Med Univ Affiliated Changzhou 2 Peoples Hosp Dept Radiat Oncol Changzhou Peoples R China Changzhou Univ Lab Informat Management Changzhou 213164 Peoples R China

Background: Protein structure comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging. Results: Here, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6-100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://***/ Conclusions: MADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs.

关键词： Protein structure alignment Structural neighbor searching parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：