检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

6 篇 会议
1 篇 期刊文献

馆藏范围

7 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

5 篇 工学
- 5 篇 计算机科学与技术...
- 1 篇 电气工程
- 1 篇 控制科学与工程
- 1 篇 软件工程
1 篇 理学
- 1 篇 数学
- 1 篇 统计学（可授理学、...

主题

2 篇 bandwidth
1 篇 cdmac
1 篇 dma
1 篇 bridges
1 篇 indexes
1 篇 memory managemen...
1 篇 sparse matrices
1 篇 cascaded dma con...
1 篇 graphics process...
1 篇 sparse matrix ve...
1 篇 random access me...
1 篇 spmv
1 篇 benchmark testin...
1 篇 dmac
1 篇 smvm
1 篇 instruction sets
1 篇 graphics process...
1 篇 indirect memory ...
1 篇 data transfer
1 篇 multicore proces...

机构

1 篇 georgia inst tec...
1 篇 georgia institut...
1 篇 nvidia corporati...
1 篇 waseda universit...
1 篇 pacific northwes...
1 篇 waseda univ
1 篇 new jersey inst ...
1 篇 new jersey insti...
1 篇 nvidia corp sant...

作者

1 篇 feo john
1 篇 david bader
1 篇 kashimata tomoya
1 篇 kimura keiji
1 篇 green oded
1 篇 kasahara hironor...
1 篇 hironori kasahar...
1 篇 bader david
1 篇 toshiaki kitamur...
1 篇 oded green
1 篇 castellana vito ...
1 篇 young jeffrey
1 篇 jeff young
1 篇 jun shirako
1 篇 keiji kimura
1 篇 james fox
1 篇 tomoya kashimata
1 篇 kitamura toshiak...
1 篇 shirako jun
1 篇 fox james

语言

7 篇 英文

检索条件"任意字段=9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019"

共 7 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

2019 ieee/acm 9th workshop on irregular applications: architectures and algorithms, ia3 2019

2019 IEEE/ACM 9th Workshop on Irregular Applications: Archit...

引用

9th ieee/acm workshop on irregular applications: architectures and algorithms, ia3 2019

ISBN: (纸本)9781728159874

the proceedings contain 11 papers. the topics discussed include: conveyors for streaming many-to-many communication;extending a work-stealing framework with priorities and weights;RDMA vs. RPC for implementing distributed data structures;mixed-precision tomographic reconstructor computations on hardware accelerators;iPregel: strategies to deal with an extreme form of irregularity in vertex-centric graph processing;stretching jacobi: two-stage pivoting in block-based factorization;a hardware prefetching mechanism for vector gather instructions;and performance impact of memory channels on sparse and irregular algorithms.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Cascaded DMA Controller for Speedup of Indirect Memory Access in irregular applications 9

Cascaded DMA Controller for Speedup of Indirect Memory Acces...

引用

9th ieee/acm workshop on irregular applications - architectures and algorithms (ia3)

作者： Kashimata, Tomoya Kitamura, Toshiaki Kimura, Keiji Kasahara, Hironori Waseda Univ Tokyo Japan

ISBN: (纸本)9781728159874

Indirect memory accesses caused by sparse linear algebra calculations are widely used in important real applications. However, they also cause serious inefficient memory accesses and pipeline stalls resulting in low execution efficiency even with high memory bandwidth and much computational resource. One of the important issues of indirect memory accesses, such as accessing A[B[i]], is it requires two successive memory accesses: the index loads (B[i]) and the following data element accesses (A[B[i]]). To overcome this situation, we propose the Cascaded-DMAC (CDMAC). this CDMAC is intended to be attached in each core of a multicore chip in addition to a CPU core, a vector accelerator, and a local data memory. It performs data transfers between an off-chip main memory and an in-core local data memory, which provides data to the accelerator. the key idea of the CDMAC is cascading two DMACs so that the first one loads indices, then the second one accesses data elements by using these indices. thus, this organization realizes the autonomous indirect memory accesses by giving an index array and an element array, and obtains the efficient SIMD computations by lining up the sparse data into the local data memory. We implemented a multicore processor having the proposed CDMAC on an FPGA board. the evaluation result of sparse matrix-vector multiplications on the FPGA shows that the CDMAC achieves a maximum speedup of 17x compared with the CPU data transfer.

关键词： Cascaded DMA Controller CDMAC DMAC DMA indirect memory access sparse matrix vector multiplication SpMV SMVM

来源：评论

学校读者我要写书评

暂无评论

Performance Impact of Memory Channels on Sparse and irregular algorithms 9

Performance Impact of Memory Channels on Sparse and Irregula...

引用

9th ieee/acm workshop on irregular applications - architectures and algorithms (ia3)

作者： Green, Oded Fox, James Young, Jeffrey Shirako, Jun Bader, David NVIDIA Corp Santa Clara CA 95051 USA Georgia Inst Technol Atlanta GA 30332 USA New Jersey Inst Technol Newark NJ 07102 USA

ISBN: (纸本)9781728159874

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we demonstrate that the key factor in the utilization of the memory system for graph algorithms is not necessarily the raw bandwidth or even the latency of memory requests. Instead, we show that performance is proportional to the number of memory channels available to handle small data transfers with limited spatial locality. Using several widely used graph frameworks, including Gunrock (on the GPU) and GAPBS & Ligra (for CPUs), we evaluate key graph analytics kernels using two unique memory hierarchies, DDR-based and HBM/MCDRAM. Our results show that the differences in the peak bandwidths of several Pascal-generation GPU memory subsystems aren't reflected in the performance of various analytics. Furthermore, our experiments on CPU and Xeon Phi systems (see extended version [11]) demonstrate that the number of memory channels utilized can be a decisive factor in performance across several different applications. For CPU systems with smaller thread counts, the memory channels can be underutilized while systems with high thread counts can oversaturate the memory subsystem, which leads to limited performance. Finally, we model the potential performance improvements of adding more memory channels with narrower access widths than are found in current platforms (see [11]). We analyze performance trade-offs for the two most prominent types of memory accesses found in graph algorithms, streaming and random accesses.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Message from the workshop Co-chairs

Message from the Workshop Co-chairs

引用

workshop on irregular applications: Architecture and algorithms (ia3)

Presents the introductory welcome message from the conference proceedings. May include the conference officers9; congratulations to all involved with the conference event and publication of the proceedings record.

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Message from the workshop co-chairs

2019 IEEE/ACM 9th Workshop on Irregular Applications: Archit...

引用

2019 ieee/acm 9th workshop on irregular applications: architectures and algorithms, ia3 2019 2019年 IV页

作者： Tumeo, Antonino Castellana, Vito Giovanni Feo, John Pacific Northwest National Laboratory United States

来源：评论

学校读者我要写书评

暂无评论

Cascaded DMA Controller for Speedup of Indirect Memory Access in irregular applications

Cascaded DMA Controller for Speedup of Indirect Memory Acces...

引用

workshop on irregular applications: Architecture and algorithms (ia3)

作者： Tomoya Kashimata Toshiaki Kitamura Keiji Kimura Hironori Kasahara Waseda University Tokyo Japan

the following topics are dealt with: storage management; parallel programming; graph theory; data structures; graphics processing units; multiprocessing systems; resource allocation; application program interfaces; pa... 详细信息

ISBN: (纸本)9781728159881

关键词： Indexes Data transfer Sparse matrices Field programmable gate arrays Multicore processing Bridges Bandwidth

来源：评论

学校读者我要写书评

暂无评论

Performance Impact of Memory Channels on Sparse and irregular algorithms

Performance Impact of Memory Channels on Sparse and Irregula...

引用

workshop on irregular applications: Architecture and algorithms (ia3)

作者： Oded Green James Fox Jeff Young Jun Shirako David Bader Nvidia Corporation USA Georgia Institute of Technology USA Georgia Institute of Technology New Jersey Institute of Technology USA

ISBN: (纸本)9781728159881

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we show that this is not necessarily the case. We demonstrate that the key factor in the utilization of the memory system for graph algorithms is not the raw bandwidth, or even latency of memory requests, but instead is the number of memory channels available to handle small data transfers with low locality. Using several widely used graph frameworks, including Gunrock (on the GPU) and GAPBS & Ligra (for CPUs), we characterize two very distinct memory hierarchies with respect to key graph analytics kernels. Our results show that the differences in peak bandwidths of several of the latest Pascal-generation GPU memory subsystems aren't reflected in the performance of various analytics. Furthermore, our experiments on CPU and Xeon Phi systems show that the number of memory channels utilized can be a decisive factor in performance across several different applications. For CPU systems with smaller thread counts, the memory channels can be underutilized while systems with high thread counts can oversaturate the memory subsystem, which leads to limited performance. Lastly, we model the performance of including more channels with narrower access widths than those found in existing memory subsystems, and we analyze the trade-offs in terms of the two most prominent types of memory accesses found in graph algorithms, streaming and random accesses.

关键词： Bandwidth Graphics processing units Instruction sets Random access memory Memory management Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共1页 << < 1 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：