检索结果-内蒙古大学图书馆

data parallelization Algorithms for the Direct Simulation Monte Carlo Method for Rarefied Gas Flows on the Basis of OpenMP Technology

引用

COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS 2023年第12期63卷 2275-2296页

作者： Bykov, N. Yu. Fyodorov, S. A. Russian Acad Sci Fed Res Ctr Comp Sci & Control Moscow 119333 Russia Peter Great St Petersburg Polytech Univ St Petersburg 195251 Russia

A data parallelization algorithm for the direct simulation Monte Carlo method for rarefied gas flows is considered. The scaling of performance of the main algorithm procedures are analyzed. Satisfactory performance scaling of the parallel particle indexing procedure is shown, and an algorithm for speeding up the operation of this procedure is proposed. Using examples of solving problems of free flow and flow around a cone for a 28-core node with shared memory, an acceptable speedup of the entire algorithm was obtained. The efficiency of the data parallelization algorithm and the computational domain decomposition algorithm for free flow is compared. Using the developed parallel code, a study of the supersonic rarefied flow around a cone is carried out.

关键词： direct simulation Monte Carlo method parallel algorithms data parallelization OpenMP rarefied gas flow around cone

来源：评论

学校读者我要写书评

暂无评论

RAPIDSCORER: Fast Tree Ensemble Evaluation by Maximizing Compactness in data Level parallelization 18

RAPIDSCORER: Fast Tree Ensemble Evaluation by Maximizing Com...

引用

24th ACM SIGKDD Conference on Knowledge Discovery and data Mining (KDD)

作者： Ye, Ting Zhou, Hucheng Zou, Will Y. Gao, Bin Zhang, Ruofei Microsoft Vancouver BC Canada Alibaba Grp Hangzhou Zhejiang Peoples R China Microsoft Sunnyvale CA USA Microsoft Redmond WA USA Microsoft Res Redmond WA USA

ISBN: (纸本)9781450355520

Relevance ranking models based on additive ensembles of regression trees have shown quite good effectiveness in web search engines. In the era of big data, tree ensemble models grow large in both tree depth and ensemble size to provide even better search relevance and user experience. However, the computational cost for their scoring process is high, such that it becomes a challenging issue to apply the big tree ensemble models in a search engine which needs to answer thousands of queries per second. Although several works have been proposed to improve the scoring process, the challenge is still great especially when the model size grows large. In this paper, we present RAPIDSCORER, a novel framework for speeding up the scoring process of industry-scale tree ensemble models, without hurting the quality of scoring results. RAPIDSCORER introduces a modified run length encoding called epitome to the bitvector representation of the tree nodes. Epitome can greatly reduce the computation cost to traverse the tree ensemble, and work with several other proposed strategies to maximize the compactness of data units in memory. The achieved compactness makes it possible to fully utilize data parallelization to improve model scalability. Experiments on two web search benchmarks show that, RAPIDSCORER achieves significant speed-up over the state-of-the-art methods: V-QUICKSCORER, ranging from 1.3x to 3.5x;QUICKSCORER, ranging from 2.1x to 25.0x;VPRED, ranging from 2.3x to 18.3x;and XGBOOST, ranging from 2.6x to 42.5x.

关键词： vpred data parallelization run length encoding quickscorer simd tree ensemble traversal xgboost industry-scale models efficiency

来源：评论

学校读者我要写书评

暂无评论

Accelerating Recurrent Neural Network Training using Sequence Bucketing and Multi-GPU data parallelization 1

Accelerating Recurrent Neural Network Training using Sequenc...

引用

1st IEEE International Conference on data Stream Mining and Processing (DSMP)

作者： Khomenko, Viacheslav Shyshkov, Oleg Radyvonenko, Olga Bokhan, Kostiantyn Samsung R& Inst Ukraine SRK 57 Lva Tolstogo Str UA-01032 Kiev Ukraine

ISBN: (纸本)9781509037360

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without sequence bucketing is compared with the proposed solution for a different number of buckets. An example is given for the online handwriting recognition task using an LSTM recurrent neural network. The evaluation is performed in terms of the wall clock time, number of epochs, and validation loss value.

关键词： recurrent neural network mini-batch sequence bucketing data parallelization LSTM GPU

来源：评论

学校读者我要写书评

暂无评论

Accelerating BWA Aligner Using Multistage data parallelization on Multicore and Manycore Architectures

Accelerating BWA Aligner Using Multistage Data Parallelizati...

引用

16th Annual International Conference on Computational Science (ICCS)

作者： Chen, Shaolong Senar, Miquel A. Univ Autonoma Barcelona Dept Comp Architecture & Operating Syst Barcelona Spain

Nowadays, rapid progress in next generation sequencing (NGS) technologies has drastically decreased the cost and time required to obtain genome sequences. A series of powerful computing accelerators, such as GPUs and Xeon Phi MIC, are becoming a common platform to reduce the computational cost of the most demanding processes when genomic data is analyzed. GPU has received more attention at literature so far. However, Xeon Phi constitutes a very attractive approach to improve performance because applications don't need to be rewritten in a different programming language specifically oriented to it. Sequence alignment is a fundamental step in any variant analysis study and there are many tools that cope with this problem. We have selected BWA, one of the most popular sequence aligner, and studied different data management strategies to improve its execution time on hybrid systems made of multicore CPUs and Xeon Phi accelerators. Our main contributions are focused on designing new strategies that combine data splitting and index replication in order to achieve a better balance in the use of system memory and reduce latency penalties. Our experimental results show significant speed-up improvements when such strategies are executed in our hybrid platform, taking advantage of the combined computing power of a standard multicore CPU and a Xeon Phi accelerator.

关键词： Xeon Phi Sequence alignment data parallelization Multicore processors Manycore processors

来源：评论

学校读者我要写书评

暂无评论

Accelerating BWA Aligner Using Multistage data parallelization on Multicore and Manycore Architectures

引用

Procedia Computer Science 2016年 80卷 2438-2442页

作者： Shaolong Chen Miquel A. Senar Department of Computer Architecture & Operating System Universitat Autònoma de Barcelona Spain

Nowadays, rapid progress in next generation sequencing (NGS) technologies has drastically decreased the cost and time required to obtain genome sequences. A series of powerful computing accelerators, such as GPUs and Xeon Phi MIC, are becoming a common platform to reduce the computational cost of the most demanding processes when genomic data is analyzed. GPU has received more attention at literature so far. However, Xeon Phi constitutes a very attractive approach to improve performance because applications don’t need to be rewritten in a different programming language specifically oriented to it. Sequence alignment is a fundamental step in any variant analysis study and there are many tools that cope with this problem. We have selected BWA, one of the most popular sequence aligner, and studied different data management strategies to improve its execution time on hybrid systems made of multicore CPUs and Xeon Phi accelerators. Our main contributions are focused on designing new strategies that combine data splitting and index replication in order to achieve a better balance in the use of system memory and reduce latency penalties. Our experimental results show significant speed-up improvements when such strategies are executed in our hybrid platform, taking advantage of the combined computing power of a standard multicore CPU and a Xeon Phi accelerator.

关键词： Xeon Phi Sequence alignment data parallelization Multicore processors Manycore processors

来源：评论

学校读者我要写书评

暂无评论

A Hybrid parallelization Approach for Distributed and Scalable Deep Learning

引用

IEEE ACCESS 2022年 10卷 77950-77961页

作者： Akintoye, Samson B. Han, Liangxiu Zhang, Xin Chen, Haoming Zhang, Daoqiang Manchester Metropolitan Univ Dept Comp & Math Manchester M15 6BH Lancs England Univ Sheffield Dept Comp Sci Sheffield S10 2TN S Yorkshire England Nanjing Univ Aeronaut & Astronaut Coll Comp Sci & Technol Nanjing 210016 Peoples R China

Recently, Deep Neural Networks (DNNs) have recorded significant success in handling medical and other complex classification tasks. However, as the sizes of DNN models and the available datasets increase, the training process becomes more complex and computationally intensive, usually taking longer to complete. In this work, we have proposed a generic full end-to-end hybrid parallelization approach combining model and data parallelism for efficiently distributed and scalable training of DNN models. We have also proposed a Genetic Algorithm Based Heuristic Resources Allocation (GABRA) mechanism for optimal distribution of partitions on the available GPUs for computing performance optimization. We have applied our proposed approach to a real use case based on 3D Residual Attention Deep Neural Network (3D-ResAttNet) for efficient Alzheimer Disease (AD) diagnosis on multiple GPUs and compared with the existing state-of-the-art parallel methods. The experimental evaluation shows that our proposed approach is 20% averagely better than existing parallel methods in terms of training time and achieves almost linear speedup with little or no differences in accuracy performance when compared with the existing non-parallel DNN models.

关键词： Computational modeling data models Training Resource management Genetic algorithms Computer architecture Neural networks Deep learning genetic algorithm data parallelization model parallelization

来源：评论

学校读者我要写书评

暂无评论

OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow

引用

BMC BIOINFORMATICS 2021年第1期22卷 1-18页

作者： Bathke, Jochen Luehken, Gesine Justus Liebig Univ Giessen Inst Anim Breeding & Genet Ludwigstr 21 D-35390 Giessen Germany

Background The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. Results A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. Conclusions The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant

关键词： Variant calling SNP indel GATK Next generation sequencing Reproducibility data parallelization Benchmarking Java

来源：评论

学校读者我要写书评

暂无评论

Parallel Algorithm For Constructing a Cubic Spline on Multi-Core Processors in a Cluster 14

Parallel Algorithm For Constructing a Cubic Spline on Multi-...

引用

14th IEEE International Conference on Application of Information and Communication Technologies (AICT)

作者： Zaynidinov, Hakimjon Mallayev, Oybek Nurmurodov, Javohir TUIT Comp Engn Tashkent Uzbekistan

ISBN: (纸本)9781728173863

The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main performance indicators of parallel algorithms have been compared with consecutive algorithms. Spline methods are a versatile signal processing tool. It is more accurate than other mathematical methods, information equality is faster, and maintenance costs are much lower. On the other hand, the equipment used in such systems must also meet high performance requirements. To achieve high speeds, parallel algorithms were developed using OpenMP and MPI technologies and implemented in the architecture of multi-core processors. A mathematical method for the parallel calculation of the coefficients of a cubic spline has been developed and a parallel signal processing algorithm has been developed on its basis. As an example, parallelization is a computation during seismic signal processing. The main indicators of efficiency and acceleration of the parallel algorithm were compared with the sequential algorithm. Explained the relevance of the use of parallel numerical systems, described the main approaches to the distribution of processes and methods of data processing, described the principles of parallel programming technology, studied the basic parameters of parallel algorithms for the initial calculation of the numerical value of cubic spline. The parallel algorithm considered for constructing the cubic spline of defect 1 as p - > n leads to the construction of a local cubic spline on each grid interval omega.

关键词： Parallel computing UMA NUMA SMP data parallelization task parallelization data processing MPI

来源：评论

学校读者我要写书评

暂无评论

WORKLOAD-AWARE AUTOMATIC parallelization FOR MULTI-GPU DNN TRAINING 44

WORKLOAD-AWARE AUTOMATIC PARALLELIZATION FOR MULTI-GPU DNN T...

引用

44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Shin, Sungho Jo, Youngmin Choi, Jungwook Venkataramani, Swagath Srinivasan, Vijayalakshmi Sung, Wonyong Seoul Natl Univ Dept Elect & Comp Engn Seoul 08826 South Korea IBM Res AI 1101 Kitchawan Rd Yorktown Hts NY 10598 USA

ISBN: (纸本)9781479981311

Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.

关键词： Multi-GPU training data parallelization auto parallelization neural network training deep learning framework

来源：评论

学校读者我要写书评

暂无评论

SPECTRE: Supporting Consumption Policies in Window-Based Parallel Complex Event Processing 17

SPECTRE: Supporting Consumption Policies in Window-Based Par...

引用

18th ACM/IFIP/USENIX International Middleware Conference (Middleware)

作者： Mayer, Ruben Slo, Ahmad Tariq, Muhammad Adnan Rothermel, Kurt Graeber, Manuel Ramachandran, Umakishore Univ Stuttgart Inst Parallel & Distributed Syst Stuttgart Germany FAST Natl Univ Comp & Emerging Sci Dept Comp Sci Islamabad Pakistan Georgia Inst Technol Coll Comp Atlanta GA 30332 USA

ISBN: (纸本)9781450347204

Distributed Complex Event Processing (DCEP) is a paradigm to infer the occurrence of complex situations in the surrounding world from basic events like sensor readings. In doing so, DCEP operators detect event patterns on their incoming event streams. To yield high operator throughput, data parallelization frameworks divide the incoming event streams of an operator into overlapping windows that are processed in parallel by a number of operator instances. In doing so, the basic assumption is that the different windows can be processed independently from each other. However, consumption policies enforce that events can only be part of one pattern instance;then, they are consumed, i.e., removed from further pattern detection. That implies that the constituent events of a pattern instance detected in one window are excluded from all other windows as well, which breaks the data parallelism between different windows. In this paper, we tackle this problem by means of speculation: Based on the likelihood of an event's consumption in a window, subsequent windows may speculatively suppress that event. We propose the SPECTRE framework for speculative processing of multiple dependent windows in parallel. Our evaluations show an up to linear scalability of SPECTRE with the number of CPU cores.

关键词： Complex Event Processing data parallelization Event Consumption Consumption Policy Speculation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：