检索结果-内蒙古大学图书馆

14th International Joint Conference on Computer Science and Software Engineering (JCSSE)

作者： Coetsier, Jean-Charles Jiamthapthaksin, Rachsuda Assumption Univ Comp Sci Dept Bangkok Thailand

ISBN: (纸本)9781509048342

Support Vector Machine (SVM) is one of the most popular machine learning algorithm to perform classification tasks and help organizations in different ways to improve their efficiency. A lot of studies have been made to improve SVM including speed, accuracy, and/or scalability. The algorithm possesses parameters that need precision tuning to perform well. This work proposes a novel parallelized parameter selection using Flower Pollination Algorithm (FPA) to quickly find the optimal parameters of SVM. In particular, MapReduce algorithm introduced in big data framework is applied to both FPA and SVM, which forms a fully distributed algorithm to support a large dataset. The experimental results of parallelized FPA-SVM on real datasets show its outstanding speed in generating optimal models while maintaining high accuracy.

关键词： support vector machine flower pollination algorithm map reduce machine learning parameter selections parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

A Unified Optimization Approach for Sparse Tensor Operations...

引用

IEEE International Conference on Cluster Computing (CLUSTER)

作者： Liu, Bangtian Wen, Chengyao Sarwate, Anand D. Dehnavi, Maryam Mehri Rutgers State Univ Newark NJ 07102 USA

ISBN: (纸本)9781538623268

Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make such implementations challenging. We leverage the fact that sparse tensor operations share similar computation patterns to propose a unified tensor representation called F-COO. Combined with GPU-specific optimizations, F-COO provides highly-optimized implementations of sparse tensor computations on GPUs. The performance of the proposed unified approach is demonstrated for tensor-based kernels such as the Sparse Matricized Tensor-Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor-Times-Matrix Multiply (SpTTM) that are used in tensor decomposition algorithms. Compared to state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to 3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement the CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs.

关键词： Tensile stress Sparse matrices Optimization Matrix decomposition parallel algorithms Acceleration Graphics processing units

来源：评论

学校读者我要写书评

暂无评论

parallelization Strategies for Fast Factorized Backprojection SAR on Embedded Multi-Core Architectures

Parallelization Strategies for Fast Factorized Backprojectio...

引用

IEEE International Conference on Microwaves, Antennas, Communications and Electronic Systems (COMCAS)

作者： Wielage, M. Cholewa, F. Riggers, C. Pirsch, P. Blume, H. Leibniz Univ Hannover Inst Microelect Syst D-30167 Hannover Germany

ISBN: (纸本)9781538631690

This paper presents parallelization strategies for the implementation of imaging algorithms for synthetic aperture radar (SAR). Great emphasis is placed on time-domain based algorithms, namely the Global Backprojection Algorithm (GBP) and its accelerated version, the Fast Factorized Backprojection Algorithm (FFBP). Multi-core platforms are selected for implementation as some combine good performance results with moderate power consumption. The implemented algorithms support several types of parallelization, as the stages of the algorithms can be handled sequentially or interleaved. For the GBP algorithm three different data distribution schemes are investigated. For the FFBP algorithm a successive stage calculation method is compared with a combined calculation method. The performance is exemplary evaluated on the low cost/energy, yet powerful multi-core platform Odroid-XU4. All parallelization strategies show an almost linear speed-up with the number of used cores. Even though a specific multi-core platform is investigated, the design decisions are applicable for general multi-core architectures.

关键词： Projection algorithm Radar imaging parallel algorithms Multicore processing Low-power electronics

来源：评论

学校读者我要写书评

暂无评论

parallel Minimum Norm Solution of Sparse Block Diagonal Column Overlapped Underdetermined Systems

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2017年第4期43卷 31-31页

作者： Torun, F. Sukru Manguoglu, Murat Aykanat, Cevdet Bilkent Univ Bilkent Turkey Middle East Tech Univ Ankara Turkey Bilkent Univ Dept Comp Engn TR-06800 Ankara Turkey Middle East Tech Univ Dept Comp Engn TR-06800 Ankara Turkey

Underdetermined systems of equations in which the minimum norm solution needs to be computed arise in many applications, such as geophysics, signal processing, and biomedical engineering. In this article, we introduce a new parallel algorithm for obtaining the minimum 2-norm solution of an underdetermined system of equations. The proposed algorithm is based on the Balance scheme, which was originally developed for the parallel solution of banded linear systems. The proposed scheme assumes a generalized banded form where the coefficient matrix has column overlapped block structure in which the blocks could be dense or sparse. In this article, we implement the more general sparse case. The blocks can be handled independently by any existing sequential or parallel QR factorization library. A smaller reduced system is formed and solved before obtaining the minimum norm solution of the original system in parallel. We experimentally compare and confirm the error bound of the proposed method against the QR factorization based techniques by using true single-precision arithmetic. We implement the proposed algorithm by using the message passing paradigm. We demonstrate numerical effectiveness as well as parallel scalability of the proposed algorithm on both shared and distributed memory architectures for solving various types of problems.

关键词： Minimum norm solution underdetermined least square problems parallel algorithms balance method

来源：评论

学校读者我要写书评

暂无评论

Fast Power-of-Two RNS Scaling Algorithm for Large Dynamic Ranges 4

Fast Power-of-Two RNS Scaling Algorithm for Large Dynamic Ra...

引用

4th International Conference on Engineering and Telecommunication (EnT)

作者： Isupov, Konstantin Knyazkov, Vladimir Kuvaev, Alexander Vyatka State Univ Dept Elect Comp Machines Kirov 610000 Russia

ISBN: (纸本)9781538645475

This paper presents a new efficient algorithm for scaling by power of two in the residue number system (RNS). It focuses on arbitrary moduli sets with large dynamic ranges. In this algorithm, in order to determine the remainder when dividing the number to be scaled by the scaling factor, an interval estimation of the RNS representation is used. The proposed algorithm requires only machine-precision integer and floating-point operations, and is well parallelized. The algorithm is implemented for CPU, as well as for GPU using CUDA C language.

关键词： residue number system scaling interval estimation parallel algorithms high performance CUDA

来源：评论

学校读者我要写书评

暂无评论

Improvements in Approximation Performance and parallelization of Nonnegative Matrix Factorization with Newton Iteration 15

Improvements in Approximation Performance and Parallelizatio...

引用

15th International Conference on High Performance Computing & Simulation (HPCS)

作者： Kutil, Rade Flatz, Markus Vajtersic, Marian Univ Salzburg Dept Comp Sci Salzburg Austria Slovak Acad Sci Math Inst Bratislava Slovakia

ISBN: (纸本)9781538632505

The goal of Nonnegative Matrix Factorization (NMF) is to represent a large nonnegative matrix in an approximate way as a product of two significantly smaller nonnegative matrices. In comparison to other algorithms to calculate the NMF, Newton-type methods can be parallelized very well because Newton iterations can be performed in parallel without exchanging data between processes. However, these methods can show problematic convergence behavior, limiting their efficiency. We present a modified algorithm that achieves stable convergence by using Karush-Kuhn-Tucker (KKT) conditions and a reflective technique for constraint handling, backtracking line search for global convergence, and a modified target function to avoid explicit inequality handling. Our method allows for an inexact approach, where only few Newton iterations are performed per outer iteration. Experiments show that this leads to faster convergence in the sequential as well as in the parallel case. Although shorter outer iterations increase communication overhead, speedups are still satisfactory.

关键词： Nonnegative Matrix Factorization (NMF) Newton iteration Karush-Kuhn-Tucker conditions (KKT) computational linear algebra parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Pseudo Arc-Length Moving Mesh Schemes for Multidimensional Detonation

引用

SCIENTIFIC PROGRAMMING 2017年第1期2017卷 1-17页

作者： Ning, Jianguo Yuan, Xinpeng Ma, Tianbao Li, Jian Beijing Inst Technol State Key Lab Explos Sci & Technol Beijing 100081 Peoples R China

We have discussed the multidimensional parallel computation for pseudo arc-length moving mesh schemes, and the schemes can be used to capture the strong discontinuity for multidimensional detonations. Different from the traditional Euler numerical schemes, the problems of parallel schemes for pseudo arc-length moving mesh schemes include diagonal processor communications and mesh point communications, which are illustrated by the schematic diagram and key pseudocodes. Finally, the numerical examples are given to show that the pseudo arc-length moving mesh schemes are second-order convergent and can successfully capture the strong numerical strong discontinuity of the detonation wave. In addition, our parallel methods are proved effectively and the computational time is obviously decreased.

关键词： ARC length GEOMETRY MATHEMATICAL models parallel algorithms COMPUTATIONAL geometry PSEUDOCODE (Computer program language)

来源：评论

学校读者我要写书评

暂无评论

Accelerating High-Dimensional Integration using Lattice Rules on GPUs

Accelerating High-Dimensional Integration using Lattice Rule...

引用

International Conference on Computational Science and Computational Intelligence (CSCI)

作者： Almulihi, Ahmed de Doncker, Elise Western Michigan Univ Dept Comp Sci Kalamazoo MI 49008 USA

ISBN: (纸本)9781538626528

Lattice rules for multiple integration yield a powerful method to approximate high-dimensional integrals for various function classes. Using generator vectors obtained from the fast component-by-component (CBC) construction of lattice rules, we incorporated rank-1 lattices for numerical integration on GPU accelerators. We show accuracy and efficiency results for a number of multivariate integrals, and compare with results obtained by Monte Carlo integration for the same functions also on GPU. The lattice rules achieve high accuracy and excellent speedups.

关键词： parallel algorithms Lattice rules Monte Carlo Multivariate integration

来源：评论

学校读者我要写书评

暂无评论

GPU-based Bio-inspired Model for Solving Association Rules Mining Problem 25

GPU-based Bio-inspired Model for Solving Association Rules M...

引用

25th Euromicro International Conference on parallel, Distributed and Network-Based Processing (PDP)

作者： Djenouri, Youcef Bendjoudi, Ahcene Djenouri, Djamel Comuzzi, Marco UNIST 50 UNIST Gil Ulsan 44949 South Korea CERIST DTISI Algiers Algeria

ISBN: (纸本)9781509060580

We explore in this paper the application of bio-inspired approaches to the association rules mining (ARM) problem for the purpose of accelerating the process of extracting the correlations between items in sizeable data instances. A new bio-inspired GPU-based model is proposed, which benefits from the massively GPU threading by evaluating multiple rules in parallel on GPU. To validate the proposed model, the most used bio-inspired approaches (GA, PSO, and BSO) have been executed on GPU to solve well-known large ARM instances. Real experiments have been carried out on an Intel Xeon 64 bit quad-core processor E5520 coupled to an Nvidia Tesla C2075 GPU device. The results show that the genetic algorithm outperforms PSO and BSO. Moreover, it outperforms the state-of-the-art GPU-based ARM approaches when dealing with the challenging Webdocs instance.

关键词： Bio-Inspired Approaches Association Rule Mining parallel algorithms GPU Computing

来源：评论

学校读者我要写书评

暂无评论

Template Skycube algorithms for Heterogeneous parallelism on Multicore and GPU Architectures 17

Template Skycube Algorithms for Heterogeneous Parallelism on...

引用

ACM International Conference on Management of Data

作者： Bogh, Kenneth S. Chester, Sean Sidlauskas, Darius Assent, Ira Aarhus Univ Aarhus Denmark NTNU Trondheim Norway Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9781450341974

Multicore CPUs and cheap co-processors such as GPUs create opportunities for vastly accelerating database queries. However, given the differences in their threading models, expected granularities of parallelism, and memory subsystems, effectively utilising all cores with all co-processors for an intensive query is very difficult. This paper introduces a novel templating methodology to create portable, yet architecture-aware, algorithms. We apply this methodology on the very compute-intensive task of calculating the skycube, a materialisation of exponentially many skyline query results, which finds applications in data exploration and multi-criteria decision making. We define three parallel templates, two that leverage insights from previous skycube research and a third that exploits a novel point-based paradigm to expose more data parallelism. An experimental study shows that, relative to the state-of-the-art that does not parallelise well due to its memory and cache requirements, our algorithms provide an order of magnitude improvement on either architecture and proportionately improve as more GPUs are added.

关键词： manycore skycube template algorithms multicore cross-device parallelism skyline gpu numa heterogeneous architectures parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：