检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

16 篇 会议

馆藏范围

16 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

13 篇 工学
- 13 篇 软件工程
- 1 篇 计算机科学与技术...

主题

2 篇 shared memory pa...
2 篇 parallel program...
2 篇 parallel computi...
2 篇 cache-coherence
2 篇 concurrency
2 篇 commutativity
1 篇 non-blocking dat...
1 篇 performance mode...
1 篇 latent dirichlet...
1 篇 co-design
1 篇 learning systems
1 篇 cilk
1 篇 communication op...
1 篇 llvm
1 篇 distributed algo...
1 篇 discrete distrib...
1 篇 simd
1 篇 optimizing 4-ind...
1 篇 data parallelism
1 篇 fault tolerance

机构

2 篇 tsinghua univ de...
2 篇 inria rocquencou...
2 篇 carnegie mellon ...
1 篇 fudan univ sch c...
1 篇 nyu comp sci dep...
1 篇 coll william & m...
1 篇 university of to...
1 篇 ohio state univ ...
1 篇 tech univ darmst...
1 篇 pacific northwes...
1 篇 carnegie mellon ...
1 篇 swiss fed inst t...
1 篇 pacific northwes...
1 篇 tsinghua univ bn...
1 篇 intel corp santa...
1 篇 natl supercomp c...
1 篇 shanghai jiao to...
1 篇 fudan univ shang...
1 篇 shanghai jiao to...
1 篇 peking univ sch ...

作者

2 篇 chen wenguang
2 篇 chen haibo
2 篇 krishnamoorthy s...
1 篇 guan haibing
1 篇 dhruva tirumala
1 篇 tirumala dhruva
1 篇 hoefler torsten
1 篇 lucia brandon
1 篇 tithi jesmin jah...
1 篇 yi ding
1 篇 cao huanqi
1 篇 lin heng
1 篇 mihai burcea
1 篇 kowalski karol
1 篇 sai varikooty
1 篇 steele guy l. jr...
1 篇 yin wanwang
1 篇 chowdhury rezaul
1 篇 zheng weimin
1 篇 battig martin

语言

16 篇 英文

检索条件"任意字段=22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2017"

共 16 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Exploiting Vector and Multicore parallelism for Recursive, Data- and Task-parallel Programs 17

Exploiting Vector and Multicore Parallelism for Recursive, D...

引用

22nd acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Ren, Bin Krishnamoorthy, Sriram Agrawal, Kunal Kulkarni, Milind Coll William & Mary Williamsburg VA 23187 USA Pacific Northwest Natl Labs Richland WA USA Washington Univ St Louis St Louis MO USA Purdue Univ W Lafayette IN 47907 USA

ISBN: (纸本)9781450344937

Modern hardware contains parallel execution resources that are well-suited for data-parallelism vector units and task parallelism multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task blocks on vector units or multicores. We show that these schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel programs into task block based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14 x-108 x speedup over sequential baselines.

关键词： Task parallelism Data parallelism General Scheduler

来源：评论

学校读者我要写书评

暂无评论

Eunomia: Scaling Concurrent Search Trees under Contention Using HTM 17

Eunomia: Scaling Concurrent Search Trees under Contention Us...

引用

22nd acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Wang, Xin Zhang, Weihua Wang, Zhaoguo Wei, Ziyun Chen, Haibo Zhao, Wenyun Fudan Univ Software Sch Shanghai Peoples R China Fudan Univ Shanghai Key Lab Data Sci Shanghai Peoples R China Fudan Univ Sch Comp Sci Shanghai Peoples R China Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst Shanghai Peoples R China NYU Comp Sci Dept New York NY 10003 USA

ISBN: (纸本)9781450344937

While hardware transactional memory (HTM) has recently been adopted to construct efficient concurrent search tree structures, such designs fail to deliver scalable performance under contention. In this paper, we first conduct a detailed analysis on an HTM-based concurrent B+Tree, which uncovers several reasons for excessive HTM aborts induced by both false and true conflicts under contention. Based on the analysis, we advocate Eunomia, a design pattern for search trees which contains several principles to reduce HTM aborts, including splitting HTM regions with version based concurrency control to reduce HTM working sets, partitioned data layout to reduce false conflicts, proactively detecting and avoiding true conflicts, and adaptive con currency control. To validate their effectiveness, we apply such designs to construct a scalable concurrent B+Tree using HTM. Evaluation using key-value store benchmarks on a 20-core HTM-capable multi-core machine shows that Eunomia leads to 5X-11X speedup under high contention, while incurring small overhead under low contention.

关键词： Hardware Transactional Memory Concurrent Search Tree Opportunistic Consistency

来源：评论

学校读者我要写书评

暂无评论

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its practice on Fault-Tolerant HPL 17

Self-Checkpoint: An In-Memory Checkpoint Method Using Less S...

引用

22nd acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Tang, Xiongchao Zhai, Jidong Yu, Bowen Chen, Wenguang Zheng, Weimin Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China

ISBN: (纸本)9781450344937

Fault tolerance is increasingly important in high performance computing due to the substantial growth of system scale and decreasing system reliability. In-memory/diskless checkpoint has gained extensive attention as a solution to avoid the IO bottleneck of traditional disk-based checkpoint methods. However, applications using previous in-memory checkpoint suffer from little available memory space. To provide high reliability, previous in-memory checkpoint methods either need to keep two copies of checkpoints to tolerate failures while updating old checkpoints or trade performance for space by flushing in-memory checkpoints into disk. In this paper, we propose a novel in-memory checkpoint method, called self-checkpoint, which can not only achieve the same reliability of previous in-memory checkpoint methods, but also increase the available memory space for applications by almost 50%. To validate our method, we apply the self-checkpoint to an important problem, fault tolerant HPL. We implement a scalable and fault tolerant HPL based on this new method, called SKT-HPL, and validate it on two large-scale systems. Experimental results with 24,576 processes show that SKT-HPL achieves over 95% of the performance of the original HPL. Compared to the state-of-the-art in-memory checkpoint method, it improves the available memory size by 47% and the performance by 5%.

关键词： Fault Tolerance In-Memory Checkpoint Fault-Tolerant HPL Memory Consumption

来源：评论

学校读者我要写书评

暂无评论

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis 17

Optimizing the Four-Index Integral Transform Using Data Move...

引用

22nd acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Rajbhandari, Samyam Rastello, Fabrice Kowalski, Karol Krishnamoorthy, Sriram Sadayappan, P. Ohio State Univ Columbus OH 43210 USA INRIA Rocquencourt France Pacific Northwest Natl Lab Richland WA 99352 USA

ISBN: (纸本)9781450344937

The four-index integral transform is a fundamental and computationally demanding calculation used in many computational chemistry suites such as NWChem. It transforms a four-dimensional tensor from one basis to another. This transformation is most efficiently implemented as a sequence of four tensor contractions that each contract a four-dimensional tensor with a two-dimensional transformation matrix. Differing degrees of permutation symmetry in the intermediate and final tensors in the sequence of contractions cause intermediate tensors to be much larger than the final tensor and limit the number of electronic states in the modeled systems. Loop fusion, in conjunction with tiling, can be very effective in reducing the total space requirement, as well as data movement. However, the large number of possible choices for loop fusion and tiling, and data/computation distribution across a parallel system, make it challenging to develop an optimized parallel implementation for the four-index integral transform. We develop a novel approach to address this problem, using lower bounds modeling of data movement complexity. We establish relationships between available aggregate physical memory in a parallel computer system and ineffective fusion configurations, enabling their pruning and consequent identification of effective choices and a characterization of optimality criteria. This work has resulted in the development of a significantly improved implementation of the four-index transform that enables higher performance and the ability to model larger electronic systems than the current implementation in the NWChem quantum chemistry software suite.

关键词： four-index distributed algorithm tensors lower bounds parallel algorithm fusion 4-index processor mapping optimal schedule communication optimization scheduling tensor contraction optimizing 4-index transform

来源：评论

学校读者我要写书评

暂无评论

POSTER: An Architecture and programming Model for Accelerating parallel Commutative Computations via Privatization 17

POSTER: An Architecture and Programming Model for Accelerati...

引用

Proceedings of the 22nd acm sigplan symposium on principles and practice of parallel programming

作者： Vignesh Balaji Dhruva Tirumala Brandon Lucia Carnegie Mellon University Pittsburgh PA USA

ISBN: (纸本)9781450344937

Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to data. This limits parallelism because it forces threads to sequentially access shared resources. Additionally, systems use cache coherence to ensure that processors always operate on the most up-to-date version of a value even in the presence of private caches. Coherence protocol implementations cause processors to serialize their accesses to shared data, further limiting parallelism and performance.

关键词： cache-coherence shared memory parallel programming commutativity

来源：评论

学校读者我要写书评

暂无评论

POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support 17

POSTER: HythTM: Extending the Applicability of Intel TSX Har...

引用

Proceedings of the 22nd acm sigplan symposium on principles and practice of parallel programming

作者： Arnamoy Bhattacharyya Mike Dai Wang Mihai Burcea Yi Ding Allen Deng Sai Varikooty Shafaaf Hossain Cristiana Amza University of Toronto Toronto ON Canada

ISBN: (纸本)9781450344937

In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by interposing a hybrid fall-back layer before the sequential, big-lock fall-back path, used by standard TSX-supported solutions in order to guarantee progress. In our experimental evaluation we use SynQuake, a realistic game benchmark modeled after Quake. Our results show that our hybrid transactional system,which we call HythTM, is able to reduce the number of transactions that go to the sequential software layer, hence avoiding hardware transaction aborts and loss of parallelism. HythTM optimizes application throughput and scalability up to 5.05x, when compared to the hardware TM with sequential fall-back path.

关键词： commutativity cache-coherence shared memory parallel programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：