检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

2,501 篇 会议
69 篇 期刊文献
4 册 图书

馆藏范围

2,574 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,951 篇 工学
- 1,911 篇 计算机科学与技术...
- 889 篇 软件工程
- 388 篇 信息与通信工程
- 292 篇 电气工程
- 139 篇 电子科学与技术（可...
- 102 篇 控制科学与工程
- 66 篇 网络空间安全
- 32 篇 动力工程及工程热...
- 25 篇 建筑学
- 24 篇 机械工程
- 21 篇 生物医学工程（可授...
- 20 篇 土木工程
- 16 篇 生物工程
- 15 篇 交通运输工程
- 15 篇 安全科学与工程
- 14 篇 环境科学与工程（可...
- 13 篇 光学工程
- 10 篇 力学（可授工学、理...
- 10 篇 化学工程与技术
419 篇 理学
- 345 篇 数学
- 42 篇 统计学（可授理学、...
- 40 篇 系统科学
- 38 篇 物理学
- 21 篇 生物学
- 18 篇 化学
353 篇 管理学
- 304 篇 管理科学与工程(可...
- 126 篇 工商管理
- 74 篇 图书情报与档案管...
23 篇 经济学
- 22 篇 应用经济学
14 篇 法学
- 14 篇 社会学
8 篇 农学
7 篇 医学
5 篇 教育学
2 篇 文学
2 篇 军事学

主题

188 篇 parallel process...
155 篇 application soft...
137 篇 graphics process...
130 篇 parallel process...
122 篇 computer archite...
114 篇 hardware
110 篇 computational mo...
102 篇 distributed comp...
101 篇 concurrent compu...
94 篇 computer science
86 篇 runtime
86 篇 distributed comp...
84 篇 parallel program...
67 篇 scalability
65 篇 graphics process...
61 篇 libraries
61 篇 instruction sets
60 篇 resource managem...
56 篇 kernel
55 篇 bandwidth

机构

15 篇 oak ridge natl l...
12 篇 cent s univ sch ...
11 篇 argonne natl lab...
11 篇 univ tennessee k...
10 篇 guangzhou univ s...
10 篇 school of comput...
10 篇 univ manchester ...
10 篇 ohio state univ ...
8 篇 oak ridge natl l...
7 篇 univ chinese aca...
7 篇 hunan univ coll ...
6 篇 chinese acad sci...
6 篇 iit dept comp sc...
6 篇 oak ridge nation...
6 篇 hunan engn lab r...
6 篇 univ illinois de...
6 篇 department of co...
5 篇 univ sci & techn...
5 篇 georgia state un...
5 篇 georgia inst tec...

作者

16 篇 dongarra jack
13 篇 wang guojun
12 篇 sun xian-he
10 篇 cerin christophe
9 篇 schulz martin
9 篇 guo minyi
9 篇 agrawal gagan
9 篇 wolf felix
9 篇 robert yves
8 篇 matsuoka satoshi
8 篇 jin hai
7 篇 li kenli
7 篇 prasad sushil k.
7 篇 banicescu ioana
7 篇 antoniu gabriel
7 篇 kale laxmikant v...
7 篇 zhou xuehai
7 篇 labarta jesus
6 篇 li xi
6 篇 hoefler torsten

语言

2,572 篇 英文
1 篇 葡萄牙文
1 篇 其他

检索条件"任意字段=13th IEEE International Symposium on Parallel and Distributed Processing with Applications"

共 2574 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

PARSIR: a Package for Effective parallel Discrete Event Simulation on Multi-processor Machines 28

PARSIR: a Package for Effective Parallel Discrete Event Simu...

引用

28th ieee/ACM international symposium on distributed Simulation and Real Time applications, DS-RT 2024

作者： Quaglia, Francesco DICII - University of Rome Tor Vergata Italy

ISBN: (纸本)9798331527211

In this article we present PARSIR (parallel SImulation Runner), a package that enables the effective exploitation of shared-memory multi-processor machines for running discrete event simulation models. PARSIR is a compile/run-time environment for discrete event simulation models developed with the C programming language. the architecture of PARSIR has been designed in order to keep low the amount of CPU-cycles required for running models. this is achieved via the combination of a set of techniques like: 1) causally consistent batch-processing of simulation events at an individual simulation object for caching effectiveness;2) high likelihood of disjoint access parallelism;3) the favoring of memory accesses on local NUMA (Non-Uniform-Memory-Access) nodes in the architecture, while still enabling well balanced workload distribution via work-stealing from remote nodes;4) the use of RMW (Read-Modify-Write) machine instructions for fast access to simulation engine data required by the worker threads for managing the concurrent simulation objects and distributing the workload. Furthermore, any architectural solution embedded in the PARSIR engine is fully transparent to the application level code implementing the simulation model. We also provide experimental results showing the effectiveness of PARSIR when running the reference PHOLD benchmark on a NUMA shared-memory multi-processor machine equipped with 40 CPUs. © 2024 ieee.

关键词： Batch data processing

来源：评论

学校读者我要写书评

暂无评论

Strategies for Integrating Deep Learning Surrogate Models with HPC Simulation applications 36

Strategies for Integrating Deep Learning Surrogate Models wi...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Yin, Junqi Wang, Feiyi Shankar, Mallikarjun (Arjun) Oak Ridge Natl Lab Oak Ridge TN 37830 USA

ISBN: (纸本)9781665497473

the emerging trend of the convergence of high performance computing (HPC), machine learning/deep learning (ML/DL), and big data analytics presents a host of challenges for large-scale computing campaigns that seek best practices to interleave traditional scientific simulation-based workloads with ML/DL models. A portfolio of systematic approaches to incorporate deep learning into modeling and simulation serves a vital need when we support AI for science at a computing facility. In this paper, we evaluate several strategies for deploying deep learning surrogate models in a representative physics application on supercomputers at the Oak Ridge Leadership Computing Facility (OLCF). We discuss a set of recommended deployment architectures and implementation approaches. We analyze and evaluate these alternatives and show their performance and scalability up to 1000 GPUs on two mainstream platforms equipped with different deep learning hardware and software stacks.

关键词： Al surrogate model HPC Simulation SmartRedis

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithms for Adding a Collection of Sparse Matrices 36

Parallel Algorithms for Adding a Collection of Sparse Matric...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Hussain, Md Taufique Abhishek, Guttu Sai Buluc, Aydin Azad, Ariful Indiana Univ Bloomington IN 47405 USA Indian Inst Technol Mumbai Maharashtra India Lawrence Berkeley Natl Lab Berkeley CA USA

ISBN: (纸本)9781665497473

We develop a family of parallel algorithms for the SpKAdd operation that adds a collection of k sparse matrices. SpKAdd is a much needed operation in many applications including distributed memory sparse matrix-matrix multiplication (SpGEMM), streaming accumulations of graphs, and algorithmic sparsification of the gradient updates in deep learning. While adding two sparse matrices is a common operation in Matlab, Python, Intel MKL, and various GraphBLAS libraries, these implementations do not perform well when adding a large collection of sparse matrices. We develop a series of algorithms using tree merging, heap, sparse accumulator, hash table, and sliding hash table data structures. Among them, hash-based algorithms attain the theoretical lower bounds both on the computational and I/O complexities and perform the best in practice. the newly-developed hash SpKAdd makes the computation of a distributed-memory SpGEMM algorithm at least 2x faster than previous state-of-the-art algorithms.

关键词： Deep learning distributed processing Instruction sets Merging Memory management Data structures Libraries

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of parallel FFT on Large Multi-GPU Systems 36

Performance Analysis of Parallel FFT on Large Multi-GPU Syst...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Ayala, Alan Tomov, Stan Stoyanov, Miroslav Haidar, Azzam Dongarra, Jack Univ Tennessee Knoxville TN 37996 USA Oak Ridge Natl Lab Oak Ridge TN USA Nvidia Corp Santa Clara CA USA Univ Manchester Manchester Lancs England

ISBN: (纸本)9781665497473

In this paper we present a performance study of multidimensional Fast Fourier Transforms (FFT) with GPU accelerators on modern hybrid architectures, as those expected for upcoming exascale systems. We assess and leverage features from traditional implementations of parallel FFTs and provide an algorithm that encompasses a wide range of their parameters, and adds novel developments such as FFT grid shrinking and batched transforms. Next, we create a bandwidth model to quantify the computational costs and analyze the well-known communication bottleneck for All-to-All and Point-to-Point MPI exchanges. then, using a tuning methodology, we are able to accelerate the FFT computation and reduce the communication cost, achieving linear scalability on a large-scale system with GPU accelerators. Finally, our performance analysis is extended to show that carefully tuning the algorithm can further accelerate applications heavily relying on FFTs, such is the case of molecular dynamics software. Our experiments were performed on Summit and Spock supercomputers with IBM Power9 cores, over 3000 NVIDIA V-100 GPUs, and AMD MI-100 GPUs.

关键词： FFT Multi-GPU MPI tuning Scalability

来源：评论

学校读者我要写书评

暂无评论

A General Offloading Approach for Near-DRAM processing-In-Memory Architectures 36

A General Offloading Approach for Near-DRAM Processing-In-Me...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Chen, Dan Jin, Hai Zheng, Long Huang, Yu Yao, Pengcheng Gui, Chuangyi Wang, Qinggang Liu, Haifeng He, Haiheng Liao, Xiaofei Zheng, Ran Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Clusters & Grid Comp Lab Wuhan Peoples R China

ISBN: (纸本)9781665481069

processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of applications. Early studies mainly focus on improving the programmability for computation offloading on PIM architectures. they lack a comprehensive analysis of computation locality and hence fail to accelerate a wide variety of applications. In this paper, we present a general-purpose instruction-level offloading technique for near-DRAM PIM architectures, namely IOTPIM, to exploit PIM features comprehensively. IOTPIM is novel with two technical advances: 1) a new instruction offloading policy that fully considers the locality of the whole on-chip cache hierarchy, and 2) an offloading performance benefit prediction model that directly predicts offloading performance benefits of an instruction based on the input dataset characterizes, preserving low analysis overheads. the evaluation demonstrates that IOTPIM can be applied to accelerate a wide variety of applications, including graph processing, machine learning, and image processing. IOTPIM outperforms the state-of-the-art PIM offloading techniques by 1.28x-1.51x while ensuring offloading accuracy as high as 91.89% on average.

关键词： processing-in-memory PIM offloading technique data locality

来源：评论

学校读者我要写书评

暂无评论

Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines 36

Sequre: a high-performance framework for rapid development o...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Smajlovic, Haris Shajii, Ariya Berger, Bonnie Cho, Hyunghoon Numanagic, Ibrahim Univ Victoria Victoria BC Canada MIT Cambridge MA 02139 USA Broad Inst MIT & Harvard Cambridge MA 02142 USA

ISBN: (纸本)9781665497473

Genomic data leaks are irreversible. Leaked DNA cannot be changed, stays disclosed indefinitely, and affects the owner's family members as well. the recent large-scale genomic data collections [1], [2] render the traditional privacy protection mechanisms, like the Health Insurance Portability and Accountability Act (HIPAA), inadequate for protection against the novel security attacks [3]. On the other hand, data access restrictions hinder important clinical research that requires large datasets to operate [4]. these concerns can be naturally addressed by the employment of privacy-enhancing technologies, such as a secure multiparty computation (MPC) [5]–[10]. Secure MPC enables computation on data without disclosing the data itself by dividing the data and computation between multiple computing parties in a distributed manner to prevent individual computing parties from accessing raw data. MPC systems are being increasingly adopted in fields that operate on sensitive datasets [11]–[13], such as computational genomics and biomedical research [14]–[22].

关键词： distributed processing Data privacy Pipelines Employment distributed databases Genomics Insurance

来源：评论

学校读者我要写书评

暂无评论

A Local Search for Automatic Parameterization of distributed Tree Search Algorithms 36

A Local Search for Automatic Parameterization of Distributed...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Carneiro, Tiago Koutsantonis, Loizos Melab, Nouredine Kieffer, Emmanuel Bouvry, Pascal Univ Luxembourg FSTM Luxembourg Luxembourg Univ Luxembourg DCS FSTM SnT Luxembourg Luxembourg INRIA Lille Nord Europe Lille France Univ Lille CNRS CRIStAL Lille France

ISBN: (纸本)9781665497473

Tree-based search algorithms applied to combinatorial optimization problems are highly irregular and timeconsuming when solving instances of NP-Hard problems. Due to their parallel nature, algorithms for this class of complexity have been revisited for different architectures over the years. However, parallelization efforts have always been guided by the performance objective setting aside productivity. Using Chapel's high productivity for the design and implementation of distributed tree search algorithms keeps the programmer from lower-level details, such as communication and load balancing. However, the parameterization of such parallel applications is complex, consisting of several parameters, even if a high-productivity language is used in their conception. this work presents a local searchbased heuristic for automatic parameterization of ChapelBB, a distributed tree search application for solving combinatorial optimization problems written in Chapel. the main objective of the proposed heuristic is to overcome the limitation of manual parameterization, which covers a limited feasible space. the reported results show that the heuristic-based parameterization increases up to 30% the performance of ChapelBB on 2048 cores (4096 threads) solving the N-Queens problem and up to 31% solving instances of the Flow-shop scheduling problem to the optimality.

关键词： parallel tree-based search Combinatorial optimization Parameter configuration High-productivity languages Chapel

来源：评论

学校读者我要写书评

暂无评论

Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture 36

Excavating the Potential of Graph Workload on RDMA-based Far...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Wang, Jing Li, Chao Wang, Taolei Zhang, Lu Wang, Pengyu Mei, Junyi Guo, Minyi Shanghai Jiao Tong Univ Dept Comp Sci & Engn Shanghai Peoples R China Shanghai Qi Zhi Inst Shanghai Peoples R China

ISBN: (纸本)9781665481069

Disaggregated architecture brings new opportunities to memory-consuming applications like graph processing. It allows one to outspread memory access pressure from local to far memory, providing an attractive alternative to disk-based processing. Although existing works on general-purpose far memory platforms show great potentials for application expansion, it is unclear how graph processing applications could benefit from disaggregated architecture, and how different optimization methods influence the overall performance. In this paper, we take the first step to analyze the impact of graph processing workload on disaggregated architecture by extending the GridGraph framework on top of the RDMA-based far memory system. We design Fargraph, a far memory coordination strategy for enhancing graph processing workload. Specifically, Fargraph reduces the overall data movement through a well-crafted, graph-aware data segment offloading mechanism. In addition, we use optimal data segment splitting and asynchronous data buffering to achieve graph iteration-friendly far memory access. We show that Fargraph achieves near-oracle performance for typical in-local-memory graph processing systems. Fargraph shows up to 8.3x speedup compared to Fastswap, the state-of-the-art, general-purpose far memory platform.

关键词： far memory RDMA graph processing

来源：评论

学校读者我要写书评

暂无评论

Modeling Memory Contention between Communications and Computations in distributed HPC Systems 36

Modeling Memory Contention between Communications and Comput...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Denis, Alexandre Jeannot, Emmanuel Swartvagher, Philippe Inria Bordeaux Sud Ouest Bordeaux France

ISBN: (纸本)9781665497473

To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time. In this paper we propose a model to predict memory band-width for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. the model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4 %.

关键词： HPC MPI Memory Contention NUMA Band-width Predictive Models Multicore processing

来源：评论

学校读者我要写书评

暂无评论

Evaluating Hardware Memory Disaggregation under Delay and Contention 36

Evaluating Hardware Memory Disaggregation under Delay and Co...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Patke, Archit Qiu, Haoran Jha, Saurabh Venugopal, Srikumar Gazzetti, Michele Pinto, Christian Kalbarczyk, Zbigniew Iyer, Ravishankar Univ Illinois Champaign IL 61820 USA IBM Res Yorktown Hts NY USA IBM Res Dublin Ireland

ISBN: (纸本)9781665497473

Hardware memory disaggregation is an emerging trend in datacenters that provides access to remote memory as part of a shared pool or unused memory on machines across the network. Memory disaggregation aims to improve memory utilization and scale memory-intensive applications. Current stateof-the-art prototypes have shown that hardware disaggregated memory is a reality at the rack-scale. However, the memory utilization benefits of memory disaggregation can only be fully realized at larger scales enabled by a datacenter-wide network. Introduction of a datacenter network results in new performance and reliability failures that may manifest as higher network latency. Additionally, sharing of the network introduces new points of contention between multiple applications. In this work, we characterize the impact of variable network latency and contention in an open-source hardware disaggregated memory prototype - thymesisFlow. To support our characterization, we have developed a delay injection framework that introduces delays in remote memory access to emulate network latency. Based on the characterization results, we develop insights into how reliability and resource allocation mechanisms should evolve to support hardware memory disaggregation beyond rack-scale in datacenters.

关键词： distributed processing Conferences Prototypes Market research Hardware Delays Reliability

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共258页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：