检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

3,672 篇 会议
122 篇 期刊文献
22 册 图书

馆藏范围

3,816 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,671 篇 工学
- 2,547 篇 计算机科学与技术...
- 1,152 篇 软件工程
- 412 篇 信息与通信工程
- 411 篇 电气工程
- 207 篇 电子科学与技术（可...
- 136 篇 控制科学与工程
- 78 篇 网络空间安全
- 40 篇 动力工程及工程热...
- 37 篇 机械工程
- 37 篇 建筑学
- 33 篇 生物医学工程（可授...
- 29 篇 光学工程
- 29 篇 生物工程
- 28 篇 土木工程
- 22 篇 仪器科学与技术
- 20 篇 化学工程与技术
- 20 篇 安全科学与工程
- 18 篇 力学（可授工学、理...
634 篇 理学
- 493 篇 数学
- 88 篇 物理学
- 67 篇 统计学（可授理学、...
- 56 篇 系统科学
- 35 篇 生物学
- 31 篇 化学
402 篇 管理学
- 339 篇 管理科学与工程(可...
- 157 篇 工商管理
- 84 篇 图书情报与档案管...
28 篇 医学
- 25 篇 临床医学
26 篇 经济学
- 25 篇 应用经济学
18 篇 法学
- 18 篇 社会学
12 篇 农学
6 篇 教育学
3 篇 文学
1 篇 军事学
1 篇 艺术学

主题

348 篇 parallel process...
302 篇 application soft...
238 篇 distributed comp...
208 篇 computer archite...
204 篇 concurrent compu...
197 篇 hardware
181 篇 computational mo...
177 篇 parallel process...
172 篇 graphics process...
171 篇 computer science
129 篇 runtime
120 篇 parallel program...
104 篇 processor schedu...
103 篇 distributed comp...
101 篇 distributed proc...
100 篇 grid computing
97 篇 scalability
96 篇 high performance...
96 篇 delay
94 篇 libraries

机构

12 篇 school of comput...
12 篇 ohio state univ ...
10 篇 argonne natl lab...
9 篇 univ chinese aca...
9 篇 hiroshima univ d...
9 篇 oak ridge natl l...
7 篇 ibm thomas j. wa...
7 篇 oak ridge nation...
7 篇 univ warwick dep...
7 篇 carnegie mellon ...
7 篇 department of co...
7 篇 ibm corp thomas ...
6 篇 oak ridge natl l...
6 篇 iit dept comp sc...
6 篇 lawrence berkele...
6 篇 georgia inst tec...
6 篇 department of co...
6 篇 univ coll dublin...
6 篇 department of co...
6 篇 department of co...

作者

20 篇 nakano koji
17 篇 lastovetsky alex...
16 篇 ito yasuaki
11 篇 dongarra jack
11 篇 jarvis stephen a...
11 篇 sun xian-he
11 篇 agrawal gagan
10 篇 wolf felix
9 篇 schulz martin
9 篇 guo minyi
9 篇 robert yves
8 篇 hoefler torsten
8 篇 h. casanova
8 篇 jack dongarra
8 篇 prasad sushil k.
8 篇 casanova henri
8 篇 magoules frederi...
8 篇 kale laxmikant v...
8 篇 labarta jesus
7 篇 bader david a.

语言

3,808 篇 英文
6 篇 其他
1 篇 土耳其文
1 篇 中文

检索条件"任意字段=4th International Symposium on Parallel and Distributed Processing and Applications"

共 3816 条记录，以下是51-60 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound applications 29

Dalorex: A Data-Local Program Execution and Architecture for...

引用

29th IEEE international symposium on High-Performance Computer Architecture (HPCA)

作者： Orenes-Vera, Marcelo Tureci, Esin Wentzlaff, David Martonosi, Margaret Princeton Univ Princeton NJ 08544 USA

ISBN: (纸本)9781665476522

applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching, decoupling, or pipelining can mitigate memory latency and improve core utilization, memory bottlenecks persist due to limited off-chip bandwidth. Approaches doing processing in-memory (PIM) with Hybrid Memory Cube (HMC) overcome bandwidth limitations but fail to achieve high core utilization due to poor task scheduling and synchronization overheads. Moreover, the high memory-per-core ratio available with HMC limits strong scaling. We introduce Dalorex, a hardware-software co-design that achieves high parallelism and energy efficiency, demonstrating strong scaling with >16,000 cores when processing graph and sparse linear algebra workloads. Over the prior work in PIM, both using 256 cores, Dalorex improves performance and energy consumption by two orders of magnitude through (1) a tile-based distributed-memory architecture where each processing tile holds an equal amount of data, and all memory operations are local;(2) a task-based parallel programming model where tasks are executed by the processing unit that is co-located with the target data;(3) a network design optimized for irregular traffic, where all communication is one-way, and messages do not contain routing metadata;(4) novel traffic-aware task scheduling hardware that maintains high core utilization;and (5) a data-placement strategy that improves work balance. this work proposes architectural and software innovations to provide the greatest scalability to date for running graph algorithms while still being programmable for other domains.

关键词： distributed scalable parallel data-local near-memory architecture sparse graph NoC network bandwidth

来源：评论

学校读者我要写书评

暂无评论

A Research-Based Course Module to Study Non-determinism in High Performance applications 36

A Research-Based Course Module to Study Non-determinism in H...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Bell, Patrick Suarez, Kae Fossum, Barbara Chapp, Dylan Bhowmick, Sanjukta Taufer, Michela Univ Tennessee Knoxville TN 37996 USA Univ North Texas Denton TX 76203 USA

ISBN: (纸本)9781665497473

We present a research-based course module to teach computer science students, software developers, and scientists the effects of non-determinism on high performance applications. the course module uses the ANACIN-X software package, a suite of software modules developed by the authors;ANACIN-X provides test cases, analytic tools to run different scenarios (e.g., using different numbers of processes and different communication patterns), and visualization tools for beginner, intermediate, and advanced level understandings in non-determinism. through our course module, students in computer science, software developers, and scientists gain an understanding of non-determinism, how to measure its occurrence in an execution, and how to identify its root causes within an application's code.

关键词： High Performance Computing parallelism Message Passing Workforce Development Debugging

来源：评论

学校读者我要写书评

暂无评论

Lightning: Scaling the GPU Programming Model Beyond a Single GPU 36

Lightning: Scaling the GPU Programming Model Beyond a Single...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Heldens, Stijn Hijma, Pieter Werkhoven, Ben Van Maassen, Jason van Nieuwpoort, Rob, V Netherlands eSci Ctr Amsterdam Netherlands Univ Amsterdam Amsterdam Netherlands Vrije Univ Amsterdam Amsterdam Netherlands

ISBN: (纸本)9781665481069

the GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing Lightning to infer their data requirements and the dependencies between subsequent kernel launches. Lightning efficiently distributes the work/data across GPUs and maximizes efficiency by overlapping scheduling, data movement, and kernel execution when possible. We present the design and implementation of Lightning, as well as experimental results on up to 32 GPUs for eight benchmarks and one real-world application. Evaluation shows excellent performance and scalability, such as a speedup of 57.2x over the CPU using Lighting with 16 GPUs over 4 nodes and 80 GB of data, far beyond the memory capacity of one GPU.

关键词： GPU distributed computing CUDA programming model

来源：评论

学校读者我要写书评

暂无评论

A General Offloading Approach for Near-DRAM processing-In-Memory Architectures 36

A General Offloading Approach for Near-DRAM Processing-In-Me...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Chen, Dan Jin, Hai Zheng, Long Huang, Yu Yao, Pengcheng Gui, Chuangyi Wang, Qinggang Liu, Haifeng He, Haiheng Liao, Xiaofei Zheng, Ran Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Clusters & Grid Comp Lab Wuhan Peoples R China

ISBN: (纸本)9781665481069

processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of applications. Early studies mainly focus on improving the programmability for computation offloading on PIM architectures. they lack a comprehensive analysis of computation locality and hence fail to accelerate a wide variety of applications. In this paper, we present a general-purpose instruction-level offloading technique for near-DRAM PIM architectures, namely IOTPIM, to exploit PIM features comprehensively. IOTPIM is novel with two technical advances: 1) a new instruction offloading policy that fully considers the locality of the whole on-chip cache hierarchy, and 2) an offloading performance benefit prediction model that directly predicts offloading performance benefits of an instruction based on the input dataset characterizes, preserving low analysis overheads. the evaluation demonstrates that IOTPIM can be applied to accelerate a wide variety of applications, including graph processing, machine learning, and image processing. IOTPIM outperforms the state-of-the-art PIM offloading techniques by 1.28x-1.51x while ensuring offloading accuracy as high as 91.89% on average.

关键词： processing-in-memory PIM offloading technique data locality

来源：评论

学校读者我要写书评

暂无评论

Falcon: A Timestamp-based Protocol to Maximize the Cache Efficiency in the distributed Shared Memory 36

Falcon: A Timestamp-based Protocol to Maximize the Cache Eff...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Zhang, Jin Yu, Xiangyao Qi, Zhengwei Guan, Haibing Shanghai Jiao Tong Univ Shanghai Peoples R China Univ Wisconsin Madison WI USA

ISBN: (纸本)9781665481069

distributed shared memory (DSM) systems can handle data-intensive applications and recently receiving more attention. A majority of existing DSM implementations are based on write-invalidation (WI) protocols, which achieve sub-optimal performance when the cache size is small. Specifically, the vast majority of invalidation messages become useless when evictions are frequent. the problem is troublesome regarding scarce memory resources in data centers. To this end, we propose a self-invalidation protocol Falcon to eliminate invalidation messages. It relies on per-operation timestamps to achieve the global memory order required by sequential consistency (SC). Furthermore, we conduct a comprehensive discussion on the two protocols with an emphasis on the cache size impact. We also implement both protocols atop a recent DSM system, Grappa. the evaluation shows that the optimal protocol can improve the performance of a KV database by 27% and a graph processing application by 71.4% against the vanilla cache-free scheme.

关键词： distributed shard memory cache coherence self-invalidation

来源：评论

学校读者我要写书评

暂无评论

Using Performance Attributes for Managing Heterogeneous Memory in HPC applications 36

Using Performance Attributes for Managing Heterogeneous Memo...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Goglin, Brice Proano, Andres Rubio Univ Bordeaux LaBRI INRIA Talence France

ISBN: (纸本)9781665497473

the complexity of memory systems has increased considerably over the past decade. Supercomputers may now include several levels of heterogeneous and non-uniform memory, with significantly different properties in terms of performance, capacity, persistence, etc. Developers of scientific applications face a huge challenge: efficiently exploit the memory system to improve performance, but keep productivity high by using portable solutions. In this work, we present a new API and a method to manage the complexity of modern memory systems. Our portable and abstracted API is designed to identify memory kinds and describe hardware characteristics using metrics, for example bandwidth, latency and capacity. It allows runtime systems, parallel libraries, and scientific applications to select the appropriate memory by expressing their needs for each allocation without having to remodify the code for each platform. Furthermore we present a survey of existing ways to determine sensitivity of application buffers using static code analysis, profiling and benchmarking. We show in a use case that combining these approaches with our API indeed enables a portable and productive method to match application requirements and hardware memory characteristics.

关键词： heterogeneous memory multi-level memory NUMA performance attributes profiling benchmarking code analysis

来源：评论

学校读者我要写书评

暂无评论

the GraphBLAS 3.0 Project

The GraphBLAS 3.0 Project

引用

1st international Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Kimmerer, Raye Mattson, Timothy G. McMillan, Scott Brock, Benjamin Welch, Erik Pelletier, Michel Moreira, Jose E. MIT JuliaLab 77 Massachusetts Ave Cambridge MA 02139 USA Human Learning Grp Ocean Park WA USA Carnegie Mellon Univ Software Engn Inst Pittsburgh PA 15213 USA Intel Parallel Comp Lab Santa Clara CA USA NVIDIA Corp Austin TX USA Graphegon Portland OR USA IBM Thomas J Watson Res Ctr Yorktown Hts NY USA

ISBN: (纸本)9798350364613;9798350364606

the GraphBLAS C API is mature with an updated specification (version 2.1) and a compliant implementation (SuiteSparse GraphBLAS). We are now focused on GraphBLAS 3.0;the next major GraphBLAS revision. Potential changes include: (1) a separate math spec to make new language bindings easier to write, (2) better support for user-defined types, rank promotion, and enhanced non-blocking execution, (3) expanded scope of GraphBLAS to address a wider range of applications, and (4) support for complex heterogeneous and distributed systems.

关键词： Graph Algorithms GraphBLAS Sparse Linear Algebra

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 2023 IEEE international Conference on parallel and distributed processing with applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking, ISPA/BDCloud/SocialCom/SustainCom 2023

Proceedings - 2023 IEEE International Conference on Parallel...

引用

21st IEEE international symposium on parallel and distributed processing with applications, 13th IEEE international Conference on Big Data and Cloud Computing, 16th IEEE international Conference on Social Computing and Networking and 13th international Conference on Sustainable Computing and Communications, ISPA/BDCloud/SocialCom/SustainCom 2023

ISBN: (纸本)9798350329223

the proceedings contain 153 papers. the topics discussed include: transaction data management optimization based on multi-partitioning in blockchain systems;semi-asynchronous federated learning optimized for NON-IID data communication based on tensor decomposition;HKTGNN: hierarchical knowledge transferable graph neural network-based supply chain risk assessment;DQR-TTS: semi-supervised text-to-speech synthesis with dynamic quantized representation;deep reinforcement learning-based network moving target defense in DPDK;iNUMAlloc: towards intelligent memory allocation for AI accelerators with NUMA;and predictive queue-based low latency congestion detection in data center networks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithms for Adding a Collection of Sparse Matrices 36

Parallel Algorithms for Adding a Collection of Sparse Matric...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Hussain, Md Taufique Abhishek, Guttu Sai Buluc, Aydin Azad, Ariful Indiana Univ Bloomington IN 47405 USA Indian Inst Technol Mumbai Maharashtra India Lawrence Berkeley Natl Lab Berkeley CA USA

ISBN: (纸本)9781665497473

We develop a family of parallel algorithms for the SpKAdd operation that adds a collection of k sparse matrices. SpKAdd is a much needed operation in many applications including distributed memory sparse matrix-matrix multiplication (SpGEMM), streaming accumulations of graphs, and algorithmic sparsification of the gradient updates in deep learning. While adding two sparse matrices is a common operation in Matlab, Python, Intel MKL, and various GraphBLAS libraries, these implementations do not perform well when adding a large collection of sparse matrices. We develop a series of algorithms using tree merging, heap, sparse accumulator, hash table, and sliding hash table data structures. Among them, hash-based algorithms attain the theoretical lower bounds both on the computational and I/O complexities and perform the best in practice. the newly-developed hash SpKAdd makes the computation of a distributed-memory SpGEMM algorithm at least 2x faster than previous state-of-the-art algorithms.

关键词： Deep learning distributed processing Instruction sets Merging Memory management Data structures Libraries

来源：评论

学校读者我要写书评

暂无评论

Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture 36

Excavating the Potential of Graph Workload on RDMA-based Far...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Wang, Jing Li, Chao Wang, Taolei Zhang, Lu Wang, Pengyu Mei, Junyi Guo, Minyi Shanghai Jiao Tong Univ Dept Comp Sci & Engn Shanghai Peoples R China Shanghai Qi Zhi Inst Shanghai Peoples R China

ISBN: (纸本)9781665481069

Disaggregated architecture brings new opportunities to memory-consuming applications like graph processing. It allows one to outspread memory access pressure from local to far memory, providing an attractive alternative to disk-based processing. Although existing works on general-purpose far memory platforms show great potentials for application expansion, it is unclear how graph processing applications could benefit from disaggregated architecture, and how different optimization methods influence the overall performance. In this paper, we take the first step to analyze the impact of graph processing workload on disaggregated architecture by extending the GridGraph framework on top of the RDMA-based far memory system. We design Fargraph, a far memory coordination strategy for enhancing graph processing workload. Specifically, Fargraph reduces the overall data movement through a well-crafted, graph-aware data segment offloading mechanism. In addition, we use optimal data segment splitting and asynchronous data buffering to achieve graph iteration-friendly far memory access. We show that Fargraph achieves near-oracle performance for typical in-local-memory graph processing systems. Fargraph shows up to 8.3x speedup compared to Fastswap, the state-of-the-art, general-purpose far memory platform.

关键词： far memory RDMA graph processing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共382页 << < 2 3 4 5 6 7 8 9 10 11 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：