检索结果-内蒙古大学图书馆

parallel and distributed Collaborative Filtering: A Survey

ACM COMPUTING SURVEYS 2016年第2期49卷 37-37页

作者： Karydi, Efthalia Margaritis, Konstantinos Univ Macedonia Dept Appl Informat Parallel & Distributed Proc Lab 156 Egnatia St GR-54636 Thessaloniki Greece

Collaborative filtering is among the most preferred techniques when implementing recommender systems. Recently, great interest has turned toward parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of parallel and distributed collaborative filtering implementations, aiming to not only provide a comprehensive presentation of the field's development but also offer future research directions by highlighting the issues that need to be developed further.

关键词： Algorithms Documentation Collaborative filtering recommender systems

来源：评论

学校读者我要写书评

暂无评论

Efficient distributed parallel Aligning Reads and Reference Genome with Many Repetitive Subsequences Using Compact de Bruijn Graph 12

Efficient Distributed Parallel Aligning Reads and Reference ...

引用

12th International Symposium on parallel Architectures, Algorithms and Programming (PAAP)

作者： Li, Yao Zhong, Cheng Chen, Danyang Zhang, Jinxiong Yin, Mengxiao Guangxi Univ Sch Comp Elect & Informat Nanning Peoples R China Guangxi Univ Key Lab Parallel Distributed Comp Technol Nanning Peoples R China

ISBN: (纸本)9781665496391

A large number of reads generated by the next generation sequencing platform will contain many repetitive subsequences. Effective localizing and identifying genomic regions containing repetitive subsequences will contribute to the subsequent genomic data analysis. To accelerate the alignment between large-scale short reads and reference genome with many repetitive subsequences, this paper develops a compact de Bruijn graph based short-read alignment algorithm on distributed parallel computing platform. The algorithm uses resilient distributed data sets (RDDS) to perform calculations in memory, and executes the broadcast method to distribute short reads and reference genome to the computing nodes to reduce the data communication time on the cluster system, and the number of RDD partitions is set to optimize the performance of parallel aligning algorithm. Experimental results on real datasets show that compared with the compact de Bruijn graph based sequential short-read alignment algorithm, our implemented distributed parallel alignment algorithm achieves good acceleration on the premise of obtaining the same correct alignment percentage as a whole, and compared with existing distributed parallel alignment algorithms, the implemented parallel algorithm can more quickly complete the alignment between large-scale short reads and reference genome with highly repetitive subsequences.

关键词： read alignment highly repetitive subsequences compact de Bruijn graph Hash indexing distributed parallel computing

来源：评论

学校读者我要写书评

暂无评论

Automatic parallelism strategy generation with minimalmemory redundancy

引用

Frontiers of Information Technology & Electronic Engineering 2025年第1期26卷 109-118页

作者： Yanqi SHI Peng LIANG Hao ZHENG Linbo QIAO Dongsheng LI National Key Laboratory of Parallel and Distributed Computing National University of Defense TechnologyChangsha 410000China

Large-scale deep learning models are trained distributedly due to memory and computing resource *** existing strategy generation approaches take optimal memory minimization as the *** fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory *** propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel *** generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism ***,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory *** results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.

关键词： Deep learning Automatic parallelism Minimal memory redundancy

来源：评论

学校读者我要写书评

暂无评论

VoiceStyle: Voice-Based Face Generation via Cross-Modal Prototype Contrastive Learning

引用

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 2024年第9期20卷 1-23页

作者： Chen, Wuyang Zhu, Boqing Xu, Kele Dou, Yong Feng, Dawei Natl Key Lab Parallel & Distributed Proc Changsha Peoples R China

Can we predict a person's appearance solely based on their voice? This article explores this question by focusing on generating a face from an unheard voice segment. Our proposed method, VoiceStyle, combines cross-modal representation learning with generation modeling, enabling us to incorporate voice semantic cues into the generated face. In the first stage, we introduce cross-modal prototype contrastive (CMPC) learning to establish the association between voice and face. Recognizing the presence of false negative and deviate positive instances in real-world unlabeled data, we not only use voice-face pairs in the same video but also construct additional semantic positive pairs through unsupervised clustering, enhancing the learning process. Moreover, we recalibrate instances based on their similarity to cluster centers in the other modality. In the second stage, we harness the powerful generative capabilities of StyleGAN to produce faces. We optimize the latent code in StyleGAN's latent space, guided by the learned voice-face alignment. To address the importance of selecting an appropriate starting point for optimization, we aim to automatically find an optimal starting point by utilizing the face prototype derived from the voice input. The entire pipeline can be implemented in a self-supervised manner, eliminating the need for manually labeled annotations. Through extensive experiments, we demonstrate the effectiveness and performance of our VoiceStyle method in both cross-modal representation learning and voice-based face generation.

关键词： Cross-modal representation learning contrastive learning StyleGAN face generation

来源：评论

学校读者我要写书评

暂无评论

EFFICIENT PARTITION-OF-UNITY RADIAL-BASIS-FUNCTION INTERPOLATION FOR COUPLED PROBLEMS

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2025年第2期47卷 B558-B582页

作者： Schneider, David Uekermann, Benjamin Univ Stuttgart Inst Parallel & Distributed Syst D-70569 Stuttgart Germany

Mapping of data between nonmatching meshes is a key ingredient of multiphysics simulations. Black-box data mapping, which only operates on clouds of mesh vertices without connectivity, enables modular software environments. In this paper, we develop such a black-box approach that is capable of handling very large data sets on parallel systems. More precisely, we implement partition-of-unity radial-basis-function interpolation into the coupling library preCICE. The method tackles the data mapping problem by decomposing it into smaller, independent subproblems, which makes it well-suited for parallel computing. To this end, we develop a tailor-made clustering algorithm and study numerical details to ensure robustness and accuracy. We, moreover, deduce user-friendly mapping parameters for which we determine robust default values. Tests on real-world geometries show that the method is scalable and orders of magnitude more efficient than previous data mapping in preCICE. Consequently, the implementation greatly extends the applicability of preCICE, benefiting the library's large user community.

关键词： . multiphysics coupling data mapping partition-of-unity radial-basis functions high-performance computing math software

来源：评论

学校读者我要写书评

暂无评论

distributed and parallel Sparse Computing for Very Large Graph Neural Networks

Distributed and Parallel Sparse Computing for Very Large Gra...

引用

2022 IEEE International Conference on Big Data, Big Data 2022

作者： Petit, Quentin R. Li, Chong Emad, Nahid Huawei Paris Research Center Université Paris-Saclay Boulogne-Billancourt France Huawei Paris Research Center Distributed and Parallel Software Lab Boulogne-Billancourt France Université Versailles Saint-Quentin Maison de la Simulation LI-PaRAD Saclay France

ISBN: (纸本)9781665480451

Deep learning (DL) requires high-performance processing on big data. Graph Neural Networks, a challenging topic in DL using linear algebra methods, need algorithmic solutions to efficiently assign and process graph data on modern distributed and parallel machines, which are considered with mixed arithmetic and various types of tensor/matrix accelerators. Determining compression techniques for the graph's sparse data structures is one of the key *** first objective is to design and implement a reusable parallel numerical library to resolve large neural network graphs. Our design strategy is drawn on a component-based approach and targets maximum code reuse in various parallel contexts while allowing for performance optimization. The solution could be later integrated into a DL framework like MindSpore. © 2022 IEEE.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

Efficient two-stage modeling of heat plume interactions of geothermal heat pumps in shallow aquifers using convolutional neural networks

GEOENERGY SCIENCE AND ENGINEERING

引用

GEOENERGY SCIENCE AND ENGINEERING 2024年 237卷

作者： Pelzer, Julia Schulte, Miriam Univ Stuttgart Inst Parallel & Distributed Syst D-70569 Stuttgart Baden Wurttembe Germany

This paper presents an innovative approach to model the impact of geothermal heat pumps on groundwater temperature and the interaction of multiple heat plumes within the aquifer. The significance of this research lies in its applicability to real-time urban planning through a web application, where traditional simulation methods prove too time-consuming. Our methodology involves a two -stage neural network approach, leveraging convolutional neural networks (CNNs), trained on a simulated dataset constructed with realistic subsurface flow parameters extracted from borehole measurements in the Munich region. The research process is systematically structured into stages: the first stage rapidly predicts the general shape of a single heat pump's plume in the absence of other influences. Building upon this, the second stage refines predictions by incorporating the interactions with neighboring heat plumes. Both stages employ CNNs, allowing for efficient training and evaluation. Experiments are conducted on dataset sizes of 100 and 1000 data points and on input parameters, focusing on a reduced dataset with data points of a spatial size of 256x16 pixels. The dataset is split into 70% training, 20% validation and 10% test data. Our results showcase the effectiveness of our approach, achieving a Root Mean Square Error (RMSE) of approximately 0.1 degrees C in both stages. This level of accuracy demonstrates the viability of our two -stage model in capturing the complex dynamics of GHWP interactions in shallow aquifers. The proposed methodology not only outperforms known analytical approximations but also significantly reduces computational costs of simulations, making it a promising tool for practical applications in city planning and beyond.

关键词： Heat flow Ground water Geothermal energy Heat pump Convolutional neural network Interaction

来源：评论

学校读者我要写书评

暂无评论

Numerical analysis of small-strain elasto-plastic deformation using local Radial Basis Function approximation with Picard iteration

引用

APPLIED MATHEMATICAL MODELLING 2025年 137卷

作者： Strnisa, Filip Jancic, Mitja Kosec, Gregor Jozef Stefan Inst Parallel & Distributed Syst Lab Jamova cesta 39 Ljubljana 1000 Slovenia

In this paper, we discuss a von Mises plasticity model with nonlinear isotropic hardening assuming small strains in a plane strain example of internally pressurised thick-walled cylinder subjected to different loading conditions. The elastic deformation is modelled using the Navier-Cauchy equation. In regions where the von Mises stress exceeds the yield stress, corrections are made locally through a return mapping algorithm. We present a novel method that uses a Radial Basis Function-Finite Difference (RBF-FD) approach with Picard iteration to solve the system of nonlinear equations arising from plastic deformation. This technique eliminates the need to stabilise the divergence operator and avoids special positioning of the boundary nodes, while preserving the elegance of the meshless discretisation and avoiding the introduction of new parameters that would require tuning. The results of the proposed method are compared with analytical and Finite Element Method (FEM) solutions. The results show that the proposed method achieves comparable accuracy to FEM while offering significant advantages in the treatment of complex geometries without the need for conventional meshing or special treatment of boundary nodes or differential operators.

关键词： Meshless Plasticity Isotropic hardening von Mises model Picard iteration RBF-FD Plane strain

来源：评论

学校读者我要写书评

暂无评论

Training large-scale language models with limited GPU memory:a survey

引用

Frontiers of Information Technology & Electronic Engineering 2025年第3期26卷 309-331页

作者： Yu TANG Linbo QIAO Lujia YIN Peng LIANG Ao SHEN Zhilin YANG Lizhi ZHANG Dongsheng LI National Key Laboratory of Parallel and Distributed Computing College of ComputerNational University of Defense TechnologyChangsha 410073China

Large-scale models have gained significant attention in a wide range of fields,such as computer vision and natural language processing,due to their effectiveness across various ***,a notable hurdle in training these large-scale models is the limited memory capacity of graphics processing units(GPUs).In this paper,we present a comprehensive survey focused on training large-scale models with limited GPU *** exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process,namely model parameters,model states,and model *** this analysis,we present an in-depth overview of the relevant research work that addresses these aspects ***,the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models,emphasizing the necessity for continued research and innovation in this *** survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.

关键词： Training techniques Memory optimization Model parameters Model states Model activations

来源：评论

学校读者我要写书评

暂无评论

Leaders and Collaborators: Addressing Sparse Reward Challenges in Multi-Agent Reinforcement Learning

引用

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2025年第2期9卷 1976-1989页

作者： Sun, Shaoqi Liu, Hui Xu, Kele Ding, Bo Natl Univ Def Technol Natl Key Lab Parallel & Distributed Proc Changsha 410003 Peoples R China

Cooperative multi-agent reinforcement learning (MARL) has emerged as an effective tool for addressing complex control tasks. However, sparse team rewards present significant challenges for MARL, leading to low exploration efficiency, slow learning speed, and homogenized behaviors among agents. To address these issues, we propose a novel Leader-Collaborator (LC) MARL framework inspired by human social collaboration. The LC framework introduces parallel online knowledge distillation for policy networks (KDPN). KDPN extracts knowledge from two policy networks with different training objectives: one aims to maximize individual rewards, while the other aims to maximize team rewards. The extracted knowledge is utilized to construct team leaders and collaborators. By effectively balancing individual and team rewards, our approach enhances exploration efficiency and promotes behavioral diversity among agents. This addresses the issue of low learning efficiency caused by the lack of objectives early in the agent's learning process and facilitates the development of more effective and differentiated team interaction policies. Additionally, we present the Self-Repairing Strategy (SRS) and Self-Augmenting Strategy (SAS) to facilitate team policies learning while preserving the initial team goal. We evaluate the effectiveness of the LC framework by conducting extensive experiments on the Multi-Agent Particle Environment (MPE), the Google Research Football (GRF), and StarCraft Multi-Agent Challenge (SMAC) with varying levels of difficulty. Our experimental results demonstrate that LC significantly improves the efficiency of the agent's exploration, achieves state-of-the-art performance, and accelerates the learning of the optimal policy. Specifically, in the SMAC scenarios, our method increases the winning rate by 21.9%, increases the average cumulative reward by 12%, and reduces the training time by 57% to achieve optimal performance.

关键词： Computational modeling Training Reinforcement learning Sports Knowledge engineering Computational intelligence Teamwork Sun Standards Predictive models Multi-agent reinforcement learning sparse rewards online knowledge distillation exploration efficiency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：