检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

774 篇 会议
242 篇 期刊文献
3 册 图书

馆藏范围

1,019 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

645 篇 工学
- 506 篇 计算机科学与技术...
- 363 篇 软件工程
- 112 篇 信息与通信工程
- 66 篇 电子科学与技术（可...
- 63 篇 控制科学与工程
- 45 篇 生物工程
- 43 篇 机械工程
- 32 篇 电气工程
- 23 篇 仪器科学与技术
- 21 篇 动力工程及工程热...
- 15 篇 材料科学与工程（可...
- 15 篇 化学工程与技术
- 14 篇 网络空间安全
- 13 篇 土木工程
- 11 篇 力学（可授工学、理...
- 11 篇 农业工程
- 11 篇 环境科学与工程（可...
- 10 篇 建筑学
- 10 篇 交通运输工程
249 篇 理学
- 161 篇 数学
- 48 篇 生物学
- 37 篇 物理学
- 36 篇 系统科学
- 35 篇 统计学（可授理学、...
- 17 篇 化学
158 篇 管理学
- 106 篇 管理科学与工程(可...
- 55 篇 图书情报与档案管...
- 29 篇 工商管理
18 篇 法学
- 15 篇 社会学
12 篇 农学
11 篇 经济学
- 11 篇 应用经济学
8 篇 教育学
4 篇 医学
3 篇 文学
3 篇 军事学
2 篇 艺术学

主题

42 篇 distributed proc...
41 篇 laboratories
37 篇 computational mo...
29 篇 kernel
28 篇 concurrent compu...
28 篇 benchmark testin...
28 篇 algorithm design...
26 篇 fault tolerance
25 篇 computer archite...
25 篇 graphics process...
24 篇 hardware
22 篇 feature extracti...
22 篇 cloud computing
22 篇 training
21 篇 parallel process...
21 篇 throughput
21 篇 servers
21 篇 protocols
20 篇 semantics
19 篇 optimization

机构

169 篇 national laborat...
134 篇 science and tech...
93 篇 college of compu...
88 篇 national laborat...
81 篇 national laborat...
38 篇 national laborat...
35 篇 school of comput...
29 篇 national key lab...
22 篇 science and tech...
22 篇 national key lab...
18 篇 national laborat...
16 篇 national laborat...
14 篇 science and tech...
14 篇 national laborat...
13 篇 national key lab...
13 篇 school of comput...
12 篇 national laborat...
12 篇 national key lab...
10 篇 national laborat...
10 篇 national key lab...

作者

40 篇 wang huaimin
37 篇 wang ji
37 篇 yong dou
35 篇 liu jie
35 篇 ji wang
31 篇 jie liu
29 篇 huaimin wang
28 篇 dongsheng li
28 篇 dou yong
27 篇 xiaodong wang
26 篇 peng yuxing
26 篇 yin gang
25 篇 yuxing peng
24 篇 li dongsheng
24 篇 yijie wang
23 篇 wang yijie
21 篇 xicheng lu
21 篇 xingming zhou
20 篇 gang yin
20 篇 zhigang luo

语言

952 篇 英文
62 篇 中文
5 篇 其他

检索条件"机构=National Laboratory for Parallel and Distributed Processing PDL"

共 1019 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

A Novel Multi-objective Neural Architecture Search Algorithm via Gaussian Progress Sampling 5

A Novel Multi-objective Neural Architecture Search Algorithm...

引用

5th IEEE International Conference on Artificial Intelligence and Big Data, ICAIBD 2022

作者： Chen, Xuehui Jiang, Jingfei Niu, Xin Pan, Hengyue Dong, Peijie Wei, Zimian National University of Defense Technology National Laboratory for Parallel and Distributed Processing Changsha China

ISBN: (纸本)9781665499132

Multi-objective neural architecture search (NAS) algorithms aim to automatically search the neural architecture suitable for different computing power platforms by using multi-objective optimization methods. The LEMONADE algorithm, which is a representative algorithm of multi-objective NAS algorithms, maintains a population of networks on an approximation of the Pareto front of the multiple objectives, such as predictive performance, number of parameters or FLOPs. To address the irrationality and repeatability of only sampling based on cheap objectives in LEMONADE, we propose a novel multi-objective neural architecture search algorithm via Gaussian Process sampling, dubbed GP-LEMONADE. Meanwhile, to make the sampling process more efficient, we design the online predictor based on Gaussian Process to predict expensive objectives, and sample candidate networks by combining cheap objectives and expensive objectives, so as to ensure the rationality and efficiency of sampling. Experiments show that the GP-LEMONADE algorithm evolves 100 generations and obtains the SOTA model with 3.98% test error. This process only takes 7.38 GPU days, which is 26.75 GPU days shorter than that of LEMONADE. Our methods have improved the performance of the LEMONADE algorithm and ensured the rationality and efficiency of sampling during the evolution, which effectively improving the search efficiency of multi-objective NAS algorithms. © 2022 IEEE.

关键词： Multiobjective optimization

来源：评论

学校读者我要写书评

暂无评论

EdgeAnchor: A Rapid and Balanced File Storage Strategy at the Network Edge 29

EdgeAnchor: A Rapid and Balanced File Storage Strategy at th...

引用

29th IEEE International Conference on parallel and distributed Systems, ICPADS 2023

作者： Liu, Han Xie, Xingrui Zhang, Zhuopu Cheng, Geyao Luo, Lailong Guo, Deke National University of Defense Technology Science and Technology on Information Systems Engineering Laboratory China National University of Defense Technology National Laboratory for Parallel and Distributed Processing China

ISBN: (纸本)9798350330717

Storing files at the network edge has become a new paradigm of storage systems, which is promising to mitigate network congestion and reduce file retrieval latency. However, the traditional file storage scheme cannot effectively meet the requirements of rapid indexing and load balance when applied directly to the edge. Moreover, due to the dynamic nature of the edge environment where edge servers can join or leave at will, it is necessary for the storage scheme to adjust with minimal disruption. In this paper, we propose EdgeAnchor, a novel edge storage strategy that is composed of the two-layer hash mappings. The first layer, file-to-bucket mapping, adopts the pseudo-deletion algorithm to deal with the variations in file size, while the second layer utilizes the multiple bucket-to-server mapping to adapt to the heterogeneity in the servers' storage capacities. Furthermore, EdgeAnchor constructs a list of deleted or added working sets for each bucket and creates a dictionary for the mappings between buckets and edge servers. In the manner, EdgeAnchor ensures a rapid file index and balances server load at the dynamic network edge. We also attach the mathematical analyses to EdgeAnchor, which theoretically proves its logarithmic complexity of hash operations and memory accesses. The experiments conducted on real-world datasets demonstrate that EdgeAnchor achieves the file index throughput twice as high as that of Consistent Hashing, under the constraints of load balance. Additionally, it ensures a low and stable data migration volume, when adding or removing edge servers consecutively. © 2023 IEEE.

关键词： dynamic network edge file storage load balance rapid indexing

来源：评论

学校读者我要写书评

暂无评论

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies 20th

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Frie...

引用

20th IFIP WG 10.3 International Conference on Network and parallel Computing, NPC 2024

作者： Guo, Mingfeng Deng, Liang Dai, Zhe Li, Ruitian Lin, Gaofeng Liu, Jie Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819628292

Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm on GPU due to its short preprocessing time and outstanding performance. However, we observed that this algorithm still has two performance bottlenecks. Firstly, the thread-level parallel mode can introduce to thread divergence issues within GPU warps during the writing phase. Secondly, the thread-level and warp-level fusion mode may struggles to fully exploit GPU resources due to suboptimal mapping relationships between rows and threads. To address these issues, this paper proposes DaCPSpTRSV, a new synchronization-free algorithm with GPU-friendly data communication and parallelism strategies. Specifically, we first develop a fast-forward thread-level approach, incorporating an efficient global memory access pattern and a light-weight dependency control mechanism, to optimize data communication and alleviate thread divergence. A fine-grained fusion strategy is then proposed to maximize GPU parallelism by adaptively selecting the suitable thread-level or warp-level modes. Moreover, the commonly-used compressed sparse row (CSR) format is employed in our DaCPSpTRSV, enhancing the versatility of our algorithm. We evaluate our approach using 245 matrices from the SuiteSparse Matrix Collection on two NVIDIA GPUs, demonstrating speedup ratios of up to 4.77×, 4.94×, 1.67×, and 1.62× compared to cuSPARSE, Sync-Free, CapelliniSpTRSV, and YuenyeungSpTRSV, respectively. The project is open-sourced at https://***/gmfff12334/DaCP. © IFIP International Federation for Information processing 2025.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Prophet: Fine-grained Load Balancing for parallel Training of Large-scale MoE Models

Prophet: Fine-grained Load Balancing for Parallel Training o...

引用

IEEE International Conference on Cluster Computing

作者： Wei Wang Zhiquan Lai Shengwei Li Weijie Liu Keshi Ge Yujie Liu Ao Shen Dongsheng Li National Laboratory for Parallel and Distributed Processing(PDL) College Of Computer National University Of Defense Technology Changsha China

Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-large size with negligible increases in computation. The MoE model has achieved the highest accuracy in several domains. However, a significant load imbalance occurs in the device during the training of a MoE model, resulting in significantly reduced throughput. Previous works on load balancing either harm model convergence or suffer from high execution overhead. To address these issues, we present Prophet: a fine-grained load balancing method for parallel training of large-scale MoE models, which consists of a planner and a scheduler. Prophet planner first employs a fine-grained resource allocation method to determine the possible scenarios for the expert placement in a fine-grained manner, and then efficiently searches for a well-balanced expert placement to balance the load without introducing additional overhead. Prophet scheduler exploits the locality of the token distribution to schedule the resource allocation operations using a layer-wise fine-grained schedule strategy to hide their overhead. We conduct extensive experiments in four clusters and five representative models. The results indicate that Prophet gains up to 2.3x speedup compared to the state-of-the-art MoE frameworks including Deepspeed-MoE and FasterMoE. Additionally, Prophet achieves a load balancing enhancement of up to 12.06x when compared to FasterMoE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

CD-Sched: An Automated Scheduling Framework for Accelerating Neural Network Training on Shared Memory CPU-DSP Platforms 23

CD-Sched: An Automated Scheduling Framework for Accelerating...

引用

Proceedings of the 2023 International Conference on Power, Communication, Computing and Networking Technologies

作者： Yuanyuan Xiao Zhiquan Lai Dongsheng Li National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology China National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology China

ISBN: (纸本)9781450399951

DSP holds significant potential for important applications in Deep Neural Networks. However, there is currently a lack of research focused on shared-memory CPU-DSP heterogeneous chips. This paper proposes CD-Sched, an automated scheduling framework that aims to address this gap. By predicting the latency of operators on both CPU and DSP, CD-Sched automatically schedules the computation of operators to the appropriate computing device. This scheduling optimization accelerates the computation of individual operators and ultimately improves the overall training time of neural networks. In end-to-end training tasks, CD-Sched can significantly reduce the overall training time, with an average reduction of approximately 10.77%.

关键词： computation offloading

来源：评论

学校读者我要写书评

暂无评论

Factorization Machine-based Unsupervised Model Selection Method

Factorization Machine-based Unsupervised Model Selection Met...

引用

2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022

作者： Zhang, Ruyi Wang, Yijie Xu, Hongzuo Zhou, Haifang National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory China

ISBN: (数字)9781665452588

ISBN: (纸本)9781665452588

Machine learning is broadly used in many intelligent cybernetic systems. With the burgeoning of the communities of AI, the number of machine learning-based models is rapidly increasing, but picking a suitable and optimal (or relatively good) model from overwhelming options has become a conundrum when deploying a new system. Therefore, we are motivated by an intriguing question: Can we automatically select a proper model for new data? However, unsupervised model selection poses two main challenges: (i) Evaluation and comparison of candidate models on the new data are infeasible due to the lack of labels;and (ii) It is non-trivial to build relationships between model performance and data characteristics when the interaction between these characteristics should be considered. In light of these limitations, this paper proposes a factorization machine-based unsupervised model selection method. Following mainstream model selection protocols, we also leverage model performance on prior known datasets. Differently, we learn higher-order complex relationships between model performance and dataset characteristics. Specifically, our method transfers the historical performance into a second-order function of meta-features and embedding weights by harnessing the power of factorization machine. This function can be subsequently used to select a proper model when given a new dataset. Extensive experiments show that our method obtains more superior model selection performance than five state-of-the-art approaches, and our method executes faster than its competitors by approximate three magnitudes. © 2022 IEEE.

关键词： Factorization

来源：评论

学校读者我要写书评

暂无评论

A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs Clusters 24th

A Data-Centric Approach for Efficient and Scalable CFD Imple...

引用

24th International Conference on parallel and distributed Computing, Applications and Technologies, PDCAT 2023

作者： Li, Ruitian Deng, Liang Dai, Zhe Zhang, Jian Liu, Jie Liu, Gang China Aerodynamic Research and Development Center Computational Aerodynamic Institute Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819982103

Scalability is a crucial factor determining the performance of massive heterogeneous parallel CFD applications on the multi-GPUs platforms, particularly after the single-GPU implementations have achieved optimal performance through numerous optimizations. A novel Data-Centric hybrid MPI-CUDA CFD model is proposed in this paper to enable efficient scalability of CFD applications on large-scale heterogeneous platforms. Based on the Data-Centric approach, Minimum-cost MPI transfer strategy and the code refactoring technique are realized for a better balance between data transfer and floating-point computation performance, which could significantly improve the scalability and reduce the time-to-solution. Subsequently, those approaches are integrated into the industrial unstructured CFD software, FlowStar, to evaluate their effectiveness. Numerical results demonstrate that Minimum-cost MPI strategy achieves more than 2.0 times performance improvement compared to the traditional Model-Centric implementation, and the code refactoring technique boosts performance by 40% to 50% over the minimum-cost MPI version. Moreover, the Data-Centric implementation on 64 A100 GPUs platform show a speedup ratio of over 120 when compared to the original MPI implementation with 64 ranks. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

AFMA-Track: Adaptive Fusion of Motion and Appearance for Robust Multi-object Tracking 27th

AFMA-Track: Adaptive Fusion of Motion and Appearance for ...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Liao, Wei Luo, Lei Zhang, Chunyuan College of Computer Science and Technology National University of Defence Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (纸本)9783031784439

Motion and appearance cues play a crucial role in Multi-object Tracking (MOT) algorithms for associating objects across consecutive frames. While most MOT methods prioritize accurate motion modeling and distinctive appearance representations, the use of appearance and motion cues is often confined to simplistic association techniques. For instance, fixed weights are commonly employed to combine the intersection-over-union (IoU) matrix and appearance similarity matrix, yielding an association cost matrix. To harness the full potential of motion and appearance cues across diverse scenarios, we propose an innovative approach that dynamically balances motion and appearance cues based on scene and object information during the association process. Furthermore, we introduce a new mechanism for updating appearance representations, effectively mitigating noise introduced by occlusion. Our method demonstrates state-of-the-art performance on the MOT17 and MOT20 test sets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of SHA256 on Multizone Heterogeneous Systems 21

Parallel Implementation of SHA256 on Multizone Heterogeneous...

引用

21st IEEE International Symposium on parallel and distributed processing with Applications, 13th IEEE International Conference on Big Data and Cloud Computing, 16th IEEE International Conference on Social Computing and Networking and 13th International Conference on Sustainable Computing and Communications, ISPA/BDCloud/SocialCom/SustainCom 2023

作者： Luo, Yongtao Liu, Jie Xiao, Tiaojie Gong, Chunye National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Digitizing Program for Frontier Equipment Changsha China National Supercomputer Center in Tianjin Tianjin China

ISBN: (纸本)9798350329223

SHA-256 plays an important role in widely used applications, such as data security, data integrity, digital signatures, and cryptocurrencies. However, most of the current optimized implementations of SHA-256 are based on CPUs or dedicated hardware, such as ASICs and FPGAs. Consequently, there is a need to explore whether new heterogeneous parallel framework can improve the computational performance of the hash function. To address this issue, we conducted a study on the MT-3000 platform, which is a special architecture processor for the next-generation exascale prototype supercomputer. We proposed MT-SHA256, a heterogeneous multistage parallel implementation for hashing multiple messages on the MT-3000. Combining the architectural features of this processor, we developed an effective solution that significantly improved the computational performance of SHA-256. As a result, MT-SHA256 achieved a maximum throughput of 1045.68 MB/s on a single acceleration core of MT-3000. This is 9.84x higher than the C code implementation on one CPU core of MT-3000. We also performed a scalability test and found that MT-SHA256 achieved a throughput of 98.04 GB/s on a computing node, and extended to 512 nodes (2048 acceleration clusters) on this system with good scalability. © 2023 IEEE.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

ROGC: Role-Oriented Graph Convolution Based Multi-Agent Reinforcement Learning

ROGC: Role-Oriented Graph Convolution Based Multi-Agent Rein...

引用

2022 IEEE International Conference on Multimedia and Expo, ICME 2022

作者： Liu, Yuntao Li, Yuan Xu, Xinhai Liu, Donghong Dou, Yong National University of Defense Technology National Laboratory for Parallel and Distributed Processing China Academy of Military Sciences China

ISBN: (数字)9781665485630

ISBN: (纸本)9781665485630

The role-oriented learning approach could improve the performance of multi-agent reinforcement learning by decomposing complex multi-agent tasks into different roles. However, due to the dynamic environment and interactions among agents, the role undertaken by an agent changes rapidly with time going on. Therefore, the roles of agents should be adapted to the varying situation during the learning process. In this paper, we propose a role-oriented graph convolution based multi-agent reinforcement learning framework (ROGC). Firstly, we design a role assigner based on samples generated from the environment to learn roles for classifying agents into different groups. To further enhance cooperation among agents in the same group for higher performance, we design a graph convolutional module to achieve intra-role communications based on discovered roles. With roles and extracted role features, we design a role-oriented policy learning module that embeds the role information into the algorithm and generates effective policies for individuals. Further, we introduce an auto-encoder to learn the intra-role cooperation knowledge in the graph convolutional module, which ensures our framework executes in a decentralized way. Extensive experiments show that our framework can learn dynamic roles and make full use of learned roles, which makes it outperform popular MARL methods. © 2022 IEEE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共102页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：