检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

16,240 篇 会议
369 篇 期刊文献
22 册 图书

馆藏范围

16,631 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

9,338 篇 工学
- 8,537 篇 计算机科学与技术...
- 4,020 篇 软件工程
- 1,985 篇 电气工程
- 1,383 篇 信息与通信工程
- 673 篇 电子科学与技术（可...
- 535 篇 控制科学与工程
- 228 篇 网络空间安全
- 187 篇 仪器科学与技术
- 140 篇 机械工程
- 115 篇 生物医学工程（可授...
- 106 篇 动力工程及工程热...
- 105 篇 测绘科学与技术
- 97 篇 光学工程
- 91 篇 生物工程
- 82 篇 建筑学
- 70 篇 土木工程
- 63 篇 环境科学与工程（可...
- 61 篇 安全科学与工程
1,973 篇 理学
- 1,505 篇 数学
- 245 篇 物理学
- 203 篇 统计学（可授理学、...
- 177 篇 系统科学
- 115 篇 生物学
- 100 篇 地球物理学
- 69 篇 化学
1,462 篇 管理学
- 1,204 篇 管理科学与工程(可...
- 468 篇 工商管理
- 321 篇 图书情报与档案管...
106 篇 医学
- 86 篇 临床医学
96 篇 经济学
- 93 篇 应用经济学
56 篇 法学
53 篇 农学
18 篇 教育学
12 篇 文学
9 篇 军事学
1 篇 艺术学

主题

2,212 篇 parallel process...
1,199 篇 computer archite...
1,129 篇 concurrent compu...
1,116 篇 distributed comp...
1,063 篇 computational mo...
1,038 篇 application soft...
1,017 篇 distributed proc...
991 篇 hardware
905 篇 computer science
710 篇 graphics process...
595 篇 runtime
527 篇 scalability
520 篇 parallel process...
507 篇 algorithm design...
496 篇 parallel program...
490 篇 parallel algorit...
470 篇 graphics process...
460 篇 kernel
446 篇 processor schedu...
440 篇 conferences

机构

38 篇 ibm thomas j. wa...
33 篇 college of compu...
31 篇 school of comput...
27 篇 oak ridge nation...
26 篇 university of ch...
26 篇 oak ridge natl l...
25 篇 georgia inst tec...
25 篇 ohio state univ ...
24 篇 department of co...
23 篇 pacific northwes...
22 篇 tsinghua univers...
21 篇 argonne national...
21 篇 oak ridge nation...
20 篇 georgia inst tec...
19 篇 college of compu...
19 篇 school of comput...
19 篇 department of co...
19 篇 argonne natl lab...
19 篇 pacific northwes...
19 篇 national laborat...

作者

39 篇 jack dongarra
31 篇 dongarra jack
29 篇 zomaya albert y.
26 篇 bader david a.
23 篇 feng wu-chun
22 篇 boukerche azzedi...
19 篇 hoefler torsten
18 篇 gagan agrawal
18 篇 schulz martin
16 篇 dhabaleswar k. p...
16 篇 p. sadayappan
16 篇 wang yijie
15 篇 ito yasuaki
15 篇 yves robert
14 篇 h. casanova
14 篇 alexey lastovets...
14 篇 azad ariful
13 篇 dongsheng li
13 篇 wang guojun
13 篇 kishore kothapal...

语言

16,421 篇 英文
180 篇 其他
27 篇 中文
2 篇 土耳其文
1 篇 葡萄牙文

检索条件"任意字段=IEEE International Symposium on Parallel and Distributed Processing with Applications"

共 16631 条记录，以下是121-130 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Qualitative QoS-aware Scheduling of Moldable parallel Jobs on HPC Clusters 22

Qualitative QoS-aware Scheduling of Moldable Parallel Jobs o...

引用

22nd ieee international symposium on parallel and distributed processing with applications, ISPA 2024

作者： Hou, Zhengxiong Liu, Yubing Shen, Hong Liu, Jiyuan Gu, Jianhua Northwestern Polytechnical University School of Computer Science Xi'an China

ISBN: (纸本)9798331509712

In service oriented high-performance computing (HPC) clusters, end users have various Quality of Service (QoS) requirements. Most of the existing research work focuses on quantitative QoS requirements, such as deadlines, for rigid jobs. While, in many cases, it is more convenient for users to qualitatively state QoS requirements (such as performance-sensitive) at the submission of their jobs. Almost all kinds of QoS requirements will be greatly impacted by job scheduling, which determine the degree of job parallelism, execution time and waiting time, etc. Most modern parallel applications are moldable in the sense that they can choose a resource allocation before execution. Traditional sequential job scheduling mechanism with fixed resource allocation appears to be an obstacle to improve QoS for end users. To address this issue, we propose a novel qualitative QoS-aware sub-queue simultaneous scheduling method for moldable parallel jobs (with variable resource allocation) on HPC Clusters. We first define the qualitative QoS models for end users, then present our sub-queue simultaneous scheduling method, including job sequencing and resource allocation algorithms for a set of moldable parallel jobs on multi-core clusters. Our method can efficiently sequence queuing jobs and allocate appropriate resources for simultaneously running some performance-sensitive jobs in a sub-queue rather than running them one by one. Experimental results demonstrate the effectiveness of our method to improving QoS for end users. © 2024 ieee.

关键词： Quality of service

来源：评论

学校读者我要写书评

暂无评论

CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction 38

CliZ: Optimizing Lossy Compression for Climate Datasets with...

引用

international parallel and distributed processing symposium (IPDPS)

作者： Jian, Zizhe Di, Sheng Liu, Jinyang Zhao, Kai Liang, Xin Xu, Haiying Underwood, Robert Wu, Shixun Huang, Jiajun Chen, Zizhong Cappello, Franck Univ Calif Riverside Riverside CA USA Argonne Natl Lab Lemont IL USA Florida State Univ Tallahassee FL USA Univ Kentucky Lexington KY USA Natl Ctr Atmospher Res Boulder CO USA

ISBN: (纸本)9798350387117;9798350387124

Benefiting from the cutting-edge supercomputers that support extremely large-scale scientific simulations, climate research has advanced significantly over the past decades. However, new critical challenges have arisen regarding efficiently storing and transferring large-scale climate data among distributed repositories and databases for post hoc analysis. In this paper, we develop CliZ, an efficient online error-controlled lossy compression method with optimized data prediction and encoding methods for climate datasets across various climate models. On the one hand, we explored how to take advantage of particular properties of the climate datasets (such as mask-map information, dimension permutation/fusion, and data periodicity pattern) to improve the data prediction accuracy. On the other hand, CliZ features a novel multi-Huffman encoding method, which can significantly improve the encoding efficiency. Therefore significantly improving compression ratios. We evaluated CliZ versus many other state-of-the-art error-controlled lossy compressors (including SZ3, ZFP, SPERR, and QoZ) based on multiple real-world climate datasets with different models. Experiments show that CliZ outperforms the second-best compressor (SZ3, SPERR, or QoZ1.1) on climate datasets by 20%-200% in compression ratio. CliZ can significantly reduce the data transfer cost between the two remote Globus endpoints by 32%-38%.

关键词： error-controlled lossy compression climate datasets distributed data repository/database

来源：评论

学校读者我要写书评

暂无评论

A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication Delay 37

A Guaranteed Approximation Algorithm for Scheduling Fork-Joi...

引用

37th ieee international parallel and distributed processing symposium (IPDPS)

作者： Dutot, Pierre-Francois Fu, Yeu-Shin Prasad, Nikhil Sinnen, Oliver Unin Grenoble Alpes CNRS INRIA Grenoble INPLIG F-38000 Grenoble France Univ Auckland Parallel & Reconfigurable Comp Lab Dept Elect Comp & Software Engn Auckland New Zealand

ISBN: (纸本)9798350337662

Scheduling task graphs with communication delay is a widely studied NP-hard problem. Many heuristics have been proposed, but there is no constant approximation algorithm for this classic model. In this paper, we focus on the scheduling of the important class of fork-join task graphs (describing many types of common computations) on homogeneous processors. For this sub-case, we propose a guaranteed algorithm with a (1+ m m-1)approximation factor, where m is the number of processors. The algorithm is not only the first constant approximation for an important sub-domain of the classic scheduling problem, it is also a practical algorithm that can obtain shorter makespans than known heuristics. To demonstrate this, we propose adaptations of known scheduling heuristic for the specific fork-join structure. In an extensive evaluation, we then implemented these algorithms and scheduled many fork-join graphs with up to thousands of tasks and various computation time distributions on up to hundreds of processors. Comparing the obtained results demonstrates the competitive nature of the proposed approximation algorithm.

关键词： task scheduling communication delays approximation algorithm fork-join DAGs

来源：评论

学校读者我要写书评

暂无评论

Accelerating distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication 37

Accelerating Distributed Deep Learning Training with Compres...

引用

37th ieee international parallel and distributed processing symposium (IPDPS)

作者： Zhou, Qinghua Anthony, Quentin Xu, Lang Shafi, Aamir Abduljabbar, Mustafa Subramoni, Hari Panda, Dhabaleswar K. ( DK) Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA

ISBN: (纸本)9798350337662

Fully Sharded Data parallel (FSDP) technology achieves higher performance by scaling out data-parallel training of Deep Learning (DL) models. It shards the model parameters, gradients, and optimizer states of the model among multiple GPUs. Consequently, this requires data-intensive Allgather and Reduce-Scatter communication to share the model parameters, which becomes a bottleneck. Existing schemes that use GPUaware MPI libraries are highly prone to saturating the interconnect bandwidth. Therefore, integrating GPU-based compression into MPI libraries has proven efficient to achieve faster training time. In this paper, we propose an optimized Ring algorithm of Allgather and Reduce-Scatter collectives that encompass an efficient collective-level online compression scheme. At the microbenchmark level, Allgather achieves benefits of up to 83.6% and 30.3% compared to the baseline and existing pointto-point-based compression in a state-of-the-art MPI library on modern GPU clusters. Reduce-Scatter achieves 88.1% and 40.6% compared to baseline and point-to-point compression, respectively. For distributed DL training with PyTorch-FSDP, our approach yields 31.7% faster training than the baseline, and up to 12.5% compared to the existing point-to-point-based compression while maintaining similar accuracy.

关键词： Allgather Reduce-Scatter Compression GPUAware MPI Deep Learning FSDP

来源：评论

学校读者我要写书评

暂无评论

LBCB: One-sided RDMA-based distributed B+ tree Index with Low Bandwidth Consumption 22

LBCB: One-sided RDMA-based Distributed B+ tree Index with Lo...

引用

22nd ieee international symposium on parallel and distributed processing with applications, ISPA 2024

作者： Liu, Jibo Xi, Rui Cao, Qinzhen Nie, Xiaowen Hou, Zhuohan University of Electronic Science and Technology of China Chengdu China

ISBN: (纸本)9798331509712

Disaggregated memory architecture segregates computing and memory resources into distinct pools interconnected by a high-speed one-sided RDMA (Remote Direct Memory Access) network, enhancing memory utilization, reducing costs, and facilitating elastic scaling of computing and memory resources. However, optimizing index structures to maximize the benefits of this framework poses significant challenges. Despite progress in B+ tree indexes for disaggregated memory systems, they still suffer from severe write and read amplification issues, which impede latency and throughput *** this paper, we propose LBCB, a B+ tree index for disaggregated memory, which significantly reduces the bandwidth consumption of index operations. First, LBCB introduces an RDMA friendly B+ tree leaf node structure, which improves concurrency while reducing the bandwidth consumption. Second, LBCB designs a logical fusion FAA lock that synchronizes more information with a single RDMA communication, significantly reducing the number of RDMA network round trips. Finally, LBCB optimistically compresses the critical path of index operations, further optimizing latency and throughput. Evaluation results show that compared with the state-of-the-art distributed B+ tree index, the write performance of LBCB is improved by 1.42 times. © 2024 ieee.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Twenty Years of Automated Methods for Mapping applications on CGRA 36

Twenty Years of Automated Methods for Mapping Applications o...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Martin, Kevin J. M. Univ Bretagne Sud Lab STICC UMR CNRS 6285 Lorient France

ISBN: (纸本)9781665497473

Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. The very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process. Numerous surveys on these architectures exist. Other surveys also gather the tools and methods, but none of them focuses on the mapping process only. This paper focuses solely on automated methods and techniques for mapping applications on CGRA and covers the last two decades of research. This paper aims at providing the terminology, the problem formulation, and a classification of existing methods. The paper ends with research challenges and trends for the future.

关键词： CGRA Mapping Compilation

来源：评论

学校读者我要写书评

暂无评论

Collecting Clustering Coefficient of distributed Graph Data with Shuffled Differential Privacy 22

Collecting Clustering Coefficient of Distributed Graph Data ...

引用

22nd ieee international symposium on parallel and distributed processing with applications, ISPA 2024

作者： Ding, Hongfa Fu, Peiwang Luo, Yingxuan Jiang, Heling Liu, Hai Guizhou University of Fiance and Economics School of Information Guiyang China Co. Ltd Guiyang China Jiangxi Datang International Fuzhou Power Generation Co. Ltd Fuzhou China

ISBN: (纸本)9798331509712

The intricate properties and relevance of graph data make it difficult to collect graph statistics privately via differential privacy (DP). Traditional centralized or local DP on graph data, face challenges like third-party threats and low data utility when collecting the clustering coefficient. In this regard, we introduce GCC-SDP, a scheme for collecting distributed Graph Clustering Coefficient with Shuffled DP (SDP). GCC-SDP gathers the local wedge lists of all edges and adjacency bit vectors through SDP and random response for calculating the noisy local triangle counts. It then collects the local degree values of all users by using Laplace mechanism, followed by estimating the global clustering coefficient of the global graph data by data collector. We provide specific steps of GCC-SDP and demonstrate through theoretical analysis that GCC-SDP conforms to various DPs, with unbiased results. Empirical experiments show that GCC-SDP performs better than existing local DP-based techniques across most accuracy metrics. © 2024 ieee.

关键词： Data collection

来源：评论

学校读者我要写书评

暂无评论

FHNTT: A flexible Number Theoretic Transform design based on hybrid-radix butterfly 22

FHNTT: A flexible Number Theoretic Transform design based on...

引用

22nd ieee international symposium on parallel and distributed processing with applications, ISPA 2024

作者： Li, Li Li, Rengang Zhao, Yaqian Li, Ruyang Su, Zhiyuan Li, Xuelei Ieit Systems Co. Ltd Jinan China Electronic Information Industry Co. Ltd Beijing China

ISBN: (纸本)9798331509712

Emerging technologies, such as cloud computing and artificial intelligence, significantly arouse concern about data security and privacy. Homomorphic encryption (HE) is a promising invention, which enables computation on encrypted data without decrypting it so as to ensure data security and privacy. Nevertheless, computation within homomorphic encryption involves time-consuming operations, e.g., Number Theoretic Transform (NTT). The tremendous computation overhead is the critical obstacle in deploying HE applications widely. Besides, in order to meet the performance and security requirements of different applications, it is pivotal to design parametric NTT architecture. In this paper, we propose a flexible and parametric NTT accelerating scheme based on hybrid-radix butterfly, named FHNTT. Specifically, we construct high radix butterfly units and divide the computation of them into several stages such that every stage can be performed pipelined. The number of required twiddle factors declines with the increase of radix value. In addition, we adopt address offset strategy to reduce memory consumption. We implement FHNTT on FPGA due to its fine-grained parallel computing capabilities and customized architecture. Empirical results show that FHNTT has an improved performance compared with other NTT architectures and supports a wide range of parameters. Concretely, FHNTT achieves up to 1.99 × to 2.78 × improvement in latency over other FPGA implementations and the memory utilization rate is up to 94%. Moreover, the flexibility makes FHNTT applicable to multiple use cases. © 2024 ieee.

关键词： FPGA Acceleration Number Theoretic Transform parallel Computing Parametric architecture

来源：评论

学校读者我要写书评

暂无评论

Accelerating BFT Database with Transaction Reconstruction

Accelerating BFT Database with Transaction Reconstruction

引用

1st international Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Kida, Aoi Kawashima, Hideyuki Keio Univ Yokohama Kanagawa Japan

ISBN: (纸本)9798350364613;9798350364606

Data stores utilized in modern data-intensive applications are expected to demonstrate rapid read and write capabilities and robust fault tolerance. Byzantine fault-tolerant database (BFT database) can execute transactions concurrently and tolerate arbitrary faults (Byzantine fault). We consider cryptographic and communication processing as performance bottlenecks in the transaction processing of BFT databases. This paper presents a transaction reconstruction method, reconstructing a single transaction from multiple transactions to streamline cryptographic and communication processes. We evaluated the proposed method with Basil (state-of-the-art BFT database) in experiments. In an environment where nodes are geographically centralized, the proposed method demonstrates up to approximately 2.5 times higher throughput and reduces latency by up to about 30% than vanilla Basil. In an environment where nodes are geographically distributed, the proposed method demonstrates up to approximately 50 times higher throughput and reduces latency by up to about 75% than vanilla Basil.

关键词： Byzantine fault tolerance distributed database Transaction processing

来源：评论

学校读者我要写书评

暂无评论

Intelligent Data Source Emission Rate Control for Optimising the Performance of Streaming applications 24

Intelligent Data Source Emission Rate Control for Optimising...

引用

24th ieee/ACM international symposium on Cluster, Cloud, and Internet Computing (CCGrid)

作者： Xiao, Ziren Leckie, Christopher Rodriguez, Maria A. Univ Melbourne Melbourne Vic Australia

ISBN: (纸本)9798350395679;9798350395662

Streaming applications are expected to process an ever-increasing amount of data with high throughput and stringent latency requirements. Flooding these applications with incoming data may overload the stream processing engine, leading to a system with unstable queues and infinitely growing latencies. Existing stream processing systems are equipped to deal with such overload scenarios reactively, either through back pressure or load shedding mechanisms. These mechanisms, however, have considerable drawbacks as they consume additional system resources, incur in non-negligible performance overheads, and may compromise the quality of application-level results. To address this gap, we propose a strategy based on reinforcement learning to throttle the input rate of data sources in streaming applications. The proposed strategy mitigates overload scenarios by addressing the source of the problem, thus allowing resources to be better utilized by application and system components and mitigating the performance overhead of system-level reactive mechanisms. Through our experiments with two different applications, we demonstrate that our proposed approach reduces end-to-end latencies by up to 82% and increases throughput by up to 10% compared to back pressure mechanisms implemented in state-of-the-art stream processing engines.

关键词： distributed stream processing rate control reinforcement learning deep learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 9 10 11 12 13 14 15 16 17 18 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：