检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

510 篇 会议
49 篇 期刊文献
1 册 图书

馆藏范围

560 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

464 篇 工学
- 413 篇 软件工程
- 359 篇 计算机科学与技术...
- 21 篇 电子科学与技术（可...
- 21 篇 控制科学与工程
- 9 篇 信息与通信工程
- 5 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 3 篇 生物医学工程（可授...
- 2 篇 力学（可授工学、理...
- 2 篇 建筑学
- 2 篇 土木工程
- 2 篇 农业工程
- 1 篇 冶金工程
87 篇 理学
- 78 篇 数学
- 12 篇 系统科学
- 7 篇 统计学（可授理学、...
- 4 篇 生物学
- 2 篇 物理学
- 2 篇 化学
- 1 篇 大气科学
- 1 篇 地质学
26 篇 管理学
- 19 篇 管理科学与工程(可...
- 14 篇 工商管理
- 7 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 教育学
- 2 篇 教育学
2 篇 农学
- 2 篇 作物学

主题

74 篇 performance
72 篇 parallel process...
62 篇 parallel program...
44 篇 algorithms
42 篇 languages
35 篇 design
26 篇 parallel algorit...
25 篇 gpu
14 篇 computer program...
13 篇 parallel computi...
13 篇 parallel
12 篇 experimentation
12 篇 measurement
10 篇 mpi
10 篇 transactional me...
9 篇 graphics process...
9 篇 theory
9 篇 concurrency
8 篇 synchronization
7 篇 multithreading

机构

13 篇 carnegie mellon ...
7 篇 indiana univ blo...
4 篇 univ wisconsin d...
4 篇 univ chinese aca...
4 篇 univ illinois ur...
4 篇 swiss fed inst t...
4 篇 mit csail united...
4 篇 shanghai jiao to...
4 篇 mit comp sci & a...
4 篇 rice university
4 篇 univ rochester r...
4 篇 purdue univ w la...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 massachusetts in...
3 篇 ohio state univ ...
3 篇 carnegie mellon ...
3 篇 inria rocquencou...
3 篇 itmo univ st pet...
3 篇 tsinghua univ pe...

作者

9 篇 chen haibo
8 篇 hoefler torsten
8 篇 blelloch guy e.
8 篇 agrawal kunal
7 篇 garland michael
7 篇 leiserson charle...
6 篇 sun yihan
6 篇 zhai jidong
6 篇 shun julian
6 篇 mellor-crummey j...
5 篇 rainey mike
5 篇 miller barton p.
5 篇 krishnamoorthy s...
5 篇 tsigas philippas
5 篇 padua david
5 篇 nikolopoulos dim...
5 篇 lam monica s.
5 篇 valero mateo
5 篇 scott michael l.
4 篇 taura kenjiro

语言

521 篇 英文
39 篇 其他

检索条件"任意字段=2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 560 条记录，以下是161-170 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Adding Approximate Counters 16

Adding Approximate Counters

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Steele, Guy L., Jr. Tristan, Jean-Baptiste Oracle Labs Burlington MA 01803 USA

ISBN: (纸本)9781450340922

We describe a general framework for adding the values of two approximate counters to produce a new approximate counter value whose expected estimated value is equal to the sum of the expected estimated values of the given approximate counters. (To the best of our knowledge, this is the first published description of any algorithm for adding two approximate counters.) We then work out implementation details for five different kinds of approximate counter and provide optimized pseudocode. For three of them, we present proofs that the variance of a counter value produced by adding two counter values in this way is bounded, and in fact is no worse, or not much worse, than the variance of the value of a single counter to which the same total number of increment operations have been applied. Addition of approximate counters is useful in massively parallel divide-and-conquer algorithms that use a distributed representation for large arrays of counters. We describe two machine-learning algorithms for topic modeling that use millions of integer counters, and confirm that replacing the integer counters with approximate counters is effective, speeding up a GPU-based implementation by over 65% and a CPU-based by nearly 50%, as well as reducing memory requirements, without degrading their statistical effectiveness.

关键词： approximate counters distributed computing divide and conquer multithreading parallel computing statistical counters

来源：评论

学校读者我要写书评

暂无评论

Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing 16

Keep calm and react with foresight: strategies for low-laten...

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： De Matteis, Tiziano Mencagli, Gabriele Univ Pisa Dept Comp Sci Largo BPontecorvo 3 I-56127 Pisa Italy

ISBN: (纸本)9781450340922

This paper addresses the problem of designing scaling strategies for elastic data stream processing. Elasticity allows applications to rapidly change their configuration on-the-fly (e.g., the amount of used resources) in response to dynamic workload fluctuations. In this work we face this problem by adopting the Model Predictive Control technique, a control-theoretic method aimed at finding the optimal application configuration along a limited prediction horizon in the future by solving an online optimization problem. Our control strategies are designed to address latency constraints, using Queueing Theory models, and energy consumption by changing the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) support available in the modern multicore CPUs. The proactive capabilities, in addition to the latency- and energy-awareness, represent the novel features of our approach. To validate our methodology, we develop a thorough set of experiments on a high-frequency trading application. The results demonstrate the high-degree of flexibility and configurability of our approach, and show the effectiveness of our elastic scaling strategies compared with existing state-of-the-art techniques used in similar scenarios.

关键词： Data Stream Processing Elasticity Multicore programming Model Predictive Control DVFS

来源：评论

学校读者我要写书评

暂无评论

AUTOGEN: Automatic Discovery of Cache-Oblivious parallel Recursive Algorithms for Solving Dynamic Programs 16

AUTOGEN: Automatic Discovery of Cache-Oblivious Parallel Rec...

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Chowdhury, Rezaul Ganapathi, Pramod Tithi, Jesmin Jahan Bachmeier, Charles Kuszmaul, Bradley C. Leiserson, Charles E. Solar-Lezama, Armando Tang, Yuan SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA MIT Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA Fudan Univ Shanghai Key Lab Intelligent Informat Proc Sch Software Shanghai Peoples R China

ISBN: (纸本)9781450340922

We present AUTOGEN-an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions of DP recurrences. AUTOGEN analyzes the set of DP table locations accessed by the iterative algorithm when run on a DP table of small size, and automatically identifies a recursive access pattern and a corresponding provably correct recursive algorithm for solving the DP recurrence. We use AUTOGEN to autodiscover efficient algorithms for several well-known problems. Our experimental results show that several autodiscovered algorithms significantly outperform parallel looping and tiled loop-based algorithms. Also these algorithms are less sensitive to fluctuations of memory and bandwidth compared with their looping counterparts, and their running times and energy profiles remain relatively more stable. To the best of our knowledge, AUTOGEN is the first algorithm that can automatically discover new nontrivial divide-and-conquer algorithms.

关键词： AutoGen automatic discovery dynamic programming recursive divide-and-conquer cache-efficient parallel cacheoblivious energy-efficient cache-adaptive

来源：评论

学校读者我要写书评

暂无评论

NUMA-aware Scheduling and Memory Allocation for data-flow task-parallel Applications 16

NUMA-aware Scheduling and Memory Allocation for data-flow ta...

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Drebes, Andi Pop, Antoniu Heydemann, Karine Drach, Nathalie Cohen, Albert Univ Manchester Sch Comp Sci Manchester M13 9PL Lancs England UPMC Paris 06 Sorbonne Univ CNRS LIP6UMR 7606 Paris France Inria Ecole Normale Super Rocquencourt France

ISBN: (纸本)9781450340922

Dynamic task parallelism is a popular programming model on shared-memory systems. Compared to data parallel loop-based concurrency, it promises enhanced scalability, load balancing and locality. These promises, however, are undermined by non-uniform memory access (NUMA) systems. We show that it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models, for both computing and memory resources, while achieving near-optimal data locality. Our run-time algorithms for NUMA-aware task and data placement are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences and reuse. This information is readily available in the run-time systems of modern task-parallel programming frameworks, and from the operating system regarding the placement of previously allocated memory. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability through the elimination of false dependences and enables fine-grained dynamic control over the placement of application data. We demonstrate that the benefits of dynamically managing data placement outweigh the privatization cost, even when comparing with target-specific optimizations through static, NUMA-aware data interleaving. Our implementation and the experimental evaluation on a set of high-performance benchmarks executing on a 192-core system with 24 NUMA nodes show that the fraction of local memory accesses can be increased to more than 9 9 %, resulting in a speedup of up to 5x compared to a NUMA-aware hierarchical work-stealing baseline.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

High Performance Model Based Image Reconstruction 16

High Performance Model Based Image Reconstruction

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Wang, Xiao Sabne, Amit Kisner, Sherman Raghunathan, Anand Bouman, Charles Midkiff, Samuel Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA High Performance Imaging LLC W Lafayette IN USA

ISBN: (纸本)9781450340922

Computed Tomography (CT) Image Reconstruction is an important technique used in a wide range of applications, ranging from explosive detection, medical imaging to scientific imaging. Among available reconstruction methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images and allows for the use of more general CT scanner geometries than is possible with more commonly used methods. The high computational cost of MBIR, however, often makes it impractical in applications for which it would otherwise be ideal. This paper describes a new MBIR implementation that significantly reduces the computational cost of MBIR while retaining its benefits. It describes a novel organization of the scanner data into super-voxels (SV) that, combined with a super-voxel buffer (SVB), dramatically increase locality and prefetching, enable parallelism across SVs and lead to an average speedup of 187 on 20 cores.

关键词： Applications Algorithms Multicore parallel algorithm CT image reconstruction MBIR

来源：评论

学校读者我要写书评

暂无评论

Multi-Core On-The-Fly SCC Decomposition 16

Multi-Core On-The-Fly SCC Decomposition

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Bloemen, Vincent Laarman, Alfons van de Pol, Jaco Univ Twente Formal Methods & Tools POB 217 NL-7500 AE Enschede Netherlands Vienna Univ Technol FORSYTE Vienna Austria

ISBN: (纸本)9781450340922

The main advantages of Tarjan's strongly connected component (SCC) algorithm are its linear time complexity and ability to return SCCs on-the-fly, while traversing or even generating the graph. Until now, most parallel SCC algorithms sacrifice both: they run in quadratic worst-case time and/or require the full graph in advance. The current paper presents a novel parallel, on-the-fly SCC algorithm. It preserves the linear-time property by letting workers explore the graph randomly while carefully communicating partially completed SCCs. We prove that this strategy is correct. For efficiently communicating partial SCCs, we develop a concurrent, iterable disjoint set structure (combining the union-find data structure with a cyclic list). We demonstrate scalability on a 64-core machine using 75 real-world graphs (from model checking and explicit data graphs), synthetic graphs (combinations of trees, cycles and linear graphs), and random graphs. Previous work did not show speedups for graphs containing a large SCC. We observe that our parallel algorithm is typically 10-30x faster compared to Tarjan's algorithm for graphs containing a large SCC. Comparable performance (with respect to the current state-of-the-art) is obtained for graphs containing many small SCCs.

关键词： strongly connected components SCC algorithm graph digraph parallel multi-core union-find depth-first search

来源：评论

学校读者我要写书评

暂无评论

A High-Performance parallel Algorithm for Nonnegative Matrix Factorization 16

A High-Performance Parallel Algorithm for Nonnegative Matrix...

引用

21st acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Kannan, Ramakrishnan Ballard, Grey Park, Haesun Georgia Tech Atlanta GA 30332 USA Sandia Natl Labs Livermore CA 94550 USA

ISBN: (纸本)9781450340922

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A approximate to WH. NMF is a useful tool for many applications in di ff erent domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of e ffi cient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.

关键词： Non-negative matrix factorization

来源：评论

学校读者我要写书评

暂无评论

POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support 17

POSTER: HythTM: Extending the Applicability of Intel TSX Har...

引用

Proceedings of the 22nd acm sigplan symposium on principles and practice of parallel programming

作者： Arnamoy Bhattacharyya Mike Dai Wang Mihai Burcea Yi Ding Allen Deng Sai Varikooty Shafaaf Hossain Cristiana Amza University of Toronto Toronto ON Canada

ISBN: (纸本)9781450344937

In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by interposing a hybrid fall-back layer before the sequential, big-lock fall-back path, used by standard TSX-supported solutions in order to guarantee progress. In our experimental evaluation we use SynQuake, a realistic game benchmark modeled after Quake. Our results show that our hybrid transactional system,which we call HythTM, is able to reduce the number of transactions that go to the sequential software layer, hence avoiding hardware transaction aborts and loss of parallelism. HythTM optimizes application throughput and scalability up to 5.05x, when compared to the hardware TM with sequential fall-back path.

关键词： commutativity cache-coherence shared memory parallel programming

来源：评论

学校读者我要写书评

暂无评论

Scalable adaptive NUMA-aware Lock combining local locking and remote locking for efficient concurrency 16

Scalable adaptive NUMA-aware Lock combining local locking an...

引用

21st acm sigplan symposium on principles and practice of parallel programming, PPoPP 2016

作者： Zhang, Mingzhe Lau, Francis C.M. Wang, Cho-Li Cheng, Luwei Chen, Haibo Dept. Computer Science University of Hong Kong Hong Kong Facebook United States Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China

ISBN: (纸本)9781450340922

Scalable locking is a key building block for scalable multi-threaded software. Its performance is especially critical in multi-socket, multi-core machines with non-uniform memory access (NUMA). Previous schemes such as local locking and remote locking only perform well under a certain level of contention, and often require non-trivial tuning for a particular configuration. Besides, for large NUMA systems, because of unmanaged lock server's nomination, current distance-first NUMA policies cannot perform satisfactorily. In this work, we propose SANL, a locking scheme that can de-liver high performance under various contention levels by adap-tively switching between the local and the remote lock scheme. Furthermore, we introduce a new NUMA policy for the remote lock that jointly considers node distances and server utilization when choosing lock servers. A comparison with seven represen-tative locking schemes shows that SANL outperforms the others in most contention situations. In one group test, SANL is 3.7 times faster than RCL lock and 17 times faster than POSIX mutex. © 2016 acm.

关键词： Locks (fasteners)

来源：评论

学校读者我要写书评

暂无评论

PPoPP 2013 - Proceedings of the 2013 acm sigplan symposium on principles and practice of parallel programming

PPoPP 2013 - Proceedings of the 2013 ACM SIGPLAN Symposium o...

引用

18th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2013

ISBN: (纸本)9781450319225

The proceedings contain 45 papers. The topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共56页 << < 13 14 15 16 17 18 19 20 21 22 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：