检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

344 篇 会议
19 篇 期刊文献
1 册 图书

馆藏范围

364 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

305 篇 工学
- 261 篇 软件工程
- 250 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 9 篇 信息与通信工程
- 5 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
- 1 篇 网络空间安全
57 篇 理学
- 53 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 2 篇 化学
18 篇 管理学
- 12 篇 管理科学与工程(可...
- 11 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 5 篇 应用经济学
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

54 篇 performance
50 篇 parallel process...
34 篇 parallel program...
33 篇 algorithms
27 篇 languages
25 篇 design
20 篇 parallel algorit...
20 篇 gpu
9 篇 experimentation
9 篇 measurement
8 篇 parallel
7 篇 scalability
7 篇 graphics process...
7 篇 theory
7 篇 parallel computi...
6 篇 parallelism
6 篇 mpi
6 篇 concurrency
5 篇 graph algorithms
5 篇 logic programmin...

机构

7 篇 carnegie mellon ...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 georgetown univ ...
2 篇 univ wisconsin d...
2 篇 yale university ...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...
2 篇 shanghai jiao to...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
4 篇 dhulipala laxman
4 篇 chen wenguang
4 篇 tsigas philippas
4 篇 tan guangming
4 篇 wang haojie
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori
3 篇 pingali keshav
3 篇 kim jungwon

语言

361 篇 英文
3 篇 其他

检索条件"任意字段=Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming"

共 364 条记录，以下是241-250 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Improving parallelism and Locality with Asynchronous Algorithms

Improving Parallelism and Locality with Asynchronous Algorit...

引用

15th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Liu, Lixia Li, Zhiyuan Purdue Univ Dept Comp Sci W Lafayette IN 47907 USA

ISBN: (纸本)9781605587080

As multicore chips become the main building blocks for high performance computers, many numerical applications face a performance impediment due to the limited hardware capacity to move data between the CPU and the off-chip memory. this is especially true for large computing problems solved by iterative algorithms because of the large data set typically used. Loop tiling, also known as loop blocking, was shown previously to be an effective way to enhance data locality, and hence to reduce the memory bandwidth pressure, for a class of iterative algorithms executed on a single processor. Unfortunately, the tiled programs suffer from reduced parallelism because only the loop iterations within a single tile can be easily parallelized. In this work, we propose to use the asynchronous model to enable effective loop tiling such that both parallelism and locality can be attained simultaneously. Asynchronous algorithms were previously proposed to reduce the communication cost and synchronization overhead between processors. Our new discovery is that carefully controlled asynchrony and loop tiling can significantly improve the performance of parallel iterative algorithms on multicore processors due to simultaneously attained data locality and loop-level parallelism. We present supporting evidence from experiments with three well-known numerical kernels.

关键词： asynchronous algorithms loop tiling parallel numerical programs data locality memory performance

来源：评论

学校读者我要写书评

暂无评论

PHANTOM: Predicting Performance of parallel Applications on Large-Scale parallel Machines Using a Single Node

PHANTOM: Predicting Performance of Parallel Applications on ...

引用

15th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Zhai, Jidong Chen, Wenguang Zheng, Weimin Tsinghua Univ Tsinghua Natl Lab Informat Sci & Technol Dept Comp Sci & Technol Beijing 100084 Peoples R China

ISBN: (纸本)9781605587080

For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines. this paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute any process of a parallel application on a single node at real speed. As a result, we can simply measure the real sequential computation time on a target node for each process one by one. Second, we observe that computation behavior of processes in parallel applications can be clustered into a few groups while processes in each group have similar computation behavior. this observation helps us reduce measurement time significantly because we only need to execute representative parallel processes instead of all of them. We have implemented a performance prediction framework, called PHANTOM, which integrates the above computation-time acquisition approach with a trace-driven network simulator. We validate our approach on several platforms. For ASCI Sweep3D, the error of our approach is less than 5% on 1024 processor cores. Compared to a recent regression-based prediction approach, PHANTOM presents better prediction accuracy across different platforms.

关键词： Performance Prediction parallel Application Deterministic Replay Trace-driven Simulation

来源：评论

学校读者我要写书评

暂无评论

Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs?

Does Cache Sharing on Modern CMP Matter to the Performance o...

引用

15th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Zhang, Eddy Z. Jiang, Yunlian Shen, Xipeng Coll William & Mary Dept Comp Sci Williamsburg VA 23187 USA

ISBN: (纸本)9781605587080

Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also results in cache contention. A number of studies have examined the influence of cache sharing on multithreaded applications, but most of them have concentrated on the design or management of shared cache, rather than a systematic measurement of the influence. Consequently, prior measurements have been constrained by the reliance on simulators, the use of out-of-date benchmarks, and the limited coverage of deciding factors. the influence of CMP cache sharing on contemporary multithreaded applications remains preliminarily understood. In this work, we conduct a systematic measurement of the influence on two kinds of commodity CMP machines, using a recently released CMP benchmark suite, PARSEC, with a number of potentially important factors on program, OS, and architecture levels considered. the measurement shows some surprising results. Contrary to commonly perceived importance of cache sharing, neither positive nor negative effects from the cache sharing are significant for most of the program executions, regardless of the types of parallelism, input datasets, architectures, numbers of threads, and assignments of threads to cores. After a detailed analysis, we find that the main reason is the mismatch of current development and compilation of multithreaded applications and CMP architectures. By transforming the programs in a cache-sharing-aware manner, we observe up to 36% performance increase when the threads are placed on cores appropriately.

关键词： Shared Cache thread Scheduling parallel Program Optimizations Chip Multiprocessors

来源：评论

学校读者我要写书评

暂无评论

thread to Strand Binding of parallel Network Applications in Massive Multi-threaded Systems

Thread to Strand Binding of Parallel Network Applications in...

引用

15th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Radojkovic, Petar Cakarevic, Vladimir Verdu, Javier Pajuelo, Alex Cazorla, Francisco J. Nemirovsky, Mario Valero, Mateo Univ Politecn Cataluna E-08028 Barcelona Spain CSIC Madrid Spain

ISBN: (纸本)9781605587080

In processors with several levels of hardware resource sharing, like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once the operating system selects the set of applications to simultaneously schedule on the processor (workload), each application/thread must be assigned to one of the hardware contexts (strands). We call this last scheduling step the thread to Strand Binding or TSB. In this paper, we show that the TSB impact on the performance of processors with several levels of shared resources is high. We measure a variation of up to 59% between different TSBs of real multithreaded network applications running on the UltraSPARC T2 processor which has three levels of resource sharing. In our view, this problem is going to be more acute in future multithreaded architectures comprising more cores, more contexts per core, and more levels of resource sharing. We propose a resource-sharing aware TSB algorithm (TSBSched) that significantly facilitates the problem of thread to strand binding for software-pipelined applications, representative of multithreaded network applications. Our systematic approach encapsulates both, the characteristics of multithreaded processors under the study and the structure of the software pipelined applications. Once calibrated for a given processor architecture, our proposal does not require hardware knowledge on the side of the programmer, nor extensive profiling of the application. We validate our algorithm on the UltraSPARC T2 processor running a set of real multithreaded network applications on which we report improvements of up to 46% compared to the current state-of-the-art dynamic schedulers.

关键词： Algorithms Measurement Performance Process Scheduling Simultaneous Multithreading CMT UltraSPARC T2

来源：评论

学校读者我要写书评

暂无评论

Architectural Support for Cilk Computations on Many-core Architectures

Architectural Support for Cilk Computations on Many-core Arc...

引用

14th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Long, Guoping Fan, Dongrui Zhang, Junchao Chinese Acad Sci Inst Comp Technol Key Lab Comp Syst & Architecture Beijing 100864 Peoples R China

来源：评论

学校读者我要写书评

暂无评论

L2C2: Logic-based LSC Consistency Checking

L2C2: Logic-based LSC Consistency Checking

引用

11th International ACM SIGPLAN symposium on principles and practice of Declarative programming (PPDP 09)

作者： Guo, Hai-Feng Zheng, Wen Subramaniam, Mahadevan Univ Nebraska Dept Comp Sci Omaha NE 68182 USA

ISBN: (纸本)9781605585680

Live sequence charts (LSCs) have been proposed as an inter-object scenario-based specification and visual programming language for reactive systems. In this paper, we introduce a logic-based framework to check the consistency of an LSC specification. An LSC simulator has been implemented in logic programming, utilizing a memoized depth-first search strategy, to show how a reactive system in LSCs would response to a set of external event sequences. A formal notation is defined to specify external event sequences, extending the regular expression with a parallel operator and a testing control. the parallel operator allows interleaved parallel external events to be tested in LSCs simultaneously;while the testing control provides users to a new approach to specify and test certain temporal properties (e.g., CTL formula) in a form of LSC. Our framework further provides either a state transition graph or a failure trace to justify the consistency checking results.

关键词： live sequence chart (LSC) scenario-based programming PLAY-tree logic programming memoization

来源：评论

学校读者我要写书评

暂无评论

Exploiting Global Optimizations for OpenMP Programs in the OpenUH Compiler

Exploiting Global Optimizations for OpenMP Programs in the O...

引用

14th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Huang, Lei Eachempati, Deepak Hervey, Marcus W. Chapman, Barbara Univ Houston Dept Comp Sci Houston TX 77004 USA

ISBN: (纸本)9781605583976

the advent of new parallel architectures has increased the need for parallel optimizing compilers to assist developers in creating efficient code. OpenUH is a state-of-the-art optimizing compiler, but it only performs a limited set of optimizations for OpenMP programs due to its conservative assumptions of shared memory programming. these limitations may prevent some OpenMP applications from being fully optimized to the extent of its sequential counterpart. this paper describes our design and implementation of a parallel data flow framework, consisting of a parallel Control Flow Graph (PCFG) and a parallel SSA (PSSA) representation in OpenUH, to model data flow for OpenMP programs. this framework enables the OpenUH compiler to perform all classical scalar optimizations for OpenMP programs, in addition to conducting OpenMP specific optimizations.

关键词： Language Performance theory Compiler Analysis OpenMP parallel SSA

来源：评论

学校读者我要写书评

暂无评论

Turbocharging Boosted Transactions or: How I Learnt to Stop Worrying and Love Longer Transactions

Turbocharging Boosted Transactions or: How I Learnt to Stop ...

引用

14th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Kulkarni, Chinmay Unsal, Osman Cristal, Adrian Ayguade, Eduard Valero, Mateo Birla Inst Technol & Sci Pilani Rajasthan India Tech Univ Catalunya Catalunya Spain

ISBN: (纸本)9781605583976

Boosted transactions offer an attractive method that enables programmers to create larger transactions that scale well and offer deadlock-free guarantees. However, as boosted transactions get larger, they become more susceptible to conflicts and aborts. We describe a linear-time algorithm to detect transactions that cannot make progress, which transactions need to be aborted, and when. the algorithm guarantees zero false positives with minimal aborts. Our proposals, as implemented in DSTM2, increase the transactional throughput of the system, often by more than 30%.

关键词： Algorithms Performance Concurrency parallel programming transactional memory deadlocks deadlock-detection

来源：评论

学校读者我要写书评

暂无评论

Preliminary Results on NB-FEB, a Synchronization Primitive for parallel programming

Preliminary Results on NB-FEB, a Synchronization Primitive f...

引用

14th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Ha, Phuong Hoai Tsigas, Philippas Anshus, Otto J. Univ Tromso N-9001 Tromso Norway Chalmers Univ Technol Gothenburg Sweden

ISBN: (纸本)9781605583976

We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).

关键词： Algorithms Reliability theory many-core architectures non-blocking synchronization full/empty bit universal primitives combinability

来源：评论

学校读者我要写书评

暂无评论

Comparability Graph Coloring for Optimizing Utilization of Stream Register Files in Stream Processors

Comparability Graph Coloring for Optimizing Utilization of S...

引用

14th ACM SIGPLAN symposium on principles and practice of parallel programming

作者： Yang, Xuejun Wang, Li Xue, Jingling Deng, Yu Zhang, Ying UNSW Programming Languages & Compilers Grp Sch Comp Sci & Engn Sydney NSW Australia

ISBN: (纸本)9781605583976

A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated through the SRF (Stream Register File), a non-bypassing software-managed on-chip memory. therefore, optimizing utilization of the SRF is crucial for good performance. the key insight is that the interference graphs formed by the streams in stream applications tend to be comparability graphs or decomposable into a set of multiple comparability graphs. We present a compiler algorithm that can find optimal or near-optimal colorings in stream IGs, thereby improving SRF utilization than the First-Fit bin-packing algorithm, the best in the literature.

关键词： Algorithms Languages Performance Stream processor stream programming comparability graph coloring software-managed cache

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 21 22 23 24 25 26 27 28 29 30 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：