检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

5,038 篇 会议
1,414 篇 期刊文献
130 册 图书
45 篇 学位论文

馆藏范围

6,627 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,940 篇 工学
- 3,356 篇 计算机科学与技术...
- 1,983 篇 软件工程
- 978 篇 电气工程
- 238 篇 信息与通信工程
- 180 篇 电子科学与技术（可...
- 138 篇 控制科学与工程
- 67 篇 机械工程
- 52 篇 生物医学工程（可授...
- 52 篇 生物工程
- 44 篇 仪器科学与技术
- 33 篇 材料科学与工程（可...
- 29 篇 力学（可授工学、理...
- 28 篇 动力工程及工程热...
- 27 篇 土木工程
- 21 篇 光学工程
- 20 篇 石油与天然气工程
684 篇 理学
- 401 篇 数学
- 117 篇 物理学
- 87 篇 生物学
- 78 篇 系统科学
- 33 篇 化学
- 28 篇 统计学（可授理学、...
- 26 篇 地球物理学
352 篇 管理学
- 260 篇 管理科学与工程(可...
- 98 篇 图书情报与档案管...
- 62 篇 工商管理
67 篇 教育学
- 62 篇 教育学
57 篇 医学
- 43 篇 临床医学
- 22 篇 基础医学(可授医学...
28 篇 法学
- 27 篇 社会学
15 篇 经济学
15 篇 农学
12 篇 文学
6 篇 艺术学
4 篇 军事学

主题

6,627 篇 parallel program...
1,096 篇 concurrent compu...
1,033 篇 parallel process...
585 篇 programming prof...
497 篇 application soft...
483 篇 computer archite...
467 篇 computer science
438 篇 hardware
354 篇 distributed comp...
335 篇 message passing
319 篇 computational mo...
317 篇 libraries
254 篇 computer languag...
241 篇 program processo...
230 篇 runtime
227 篇 high performance...
202 篇 yarn
191 篇 parallel archite...
189 篇 parallel algorit...
183 篇 costs

机构

15 篇 carnegie mellon ...
14 篇 barcelona superc...
13 篇 school of comput...
11 篇 intel corporatio...
10 篇 univ pisa dept c...
10 篇 univ illinois de...
10 篇 stanford univ st...
9 篇 school of applie...
9 篇 department of co...
9 篇 carnegie mellon ...
9 篇 mathematics and ...
9 篇 department of co...
9 篇 univ texas austi...
8 篇 department of co...
8 篇 ibm thomas j. wa...
8 篇 univ alberta dep...
8 篇 barcelona superc...
8 篇 department of co...
8 篇 irisa rennes
8 篇 tech univ berlin

作者

32 篇 griebler dalvan
26 篇 sarkar vivek
24 篇 danelutto marco
20 篇 fernandes luiz g...
18 篇 badia rosa m.
18 篇 loulergue freder...
16 篇 torquati massimo
15 篇 mencagli gabriel...
15 篇 ayguade eduard
14 篇 olukotun kunle
14 篇 wolf felix
12 篇 g. runger
12 篇 gonzalez-escriba...
12 篇 valero mateo
12 篇 fernandes luiz g...
12 篇 m. sato
11 篇 hoefler torsten
11 篇 dinavahi venkata
11 篇 pingali keshav
11 篇 benini luca

语言

6,407 篇 英文
167 篇 其他
22 篇 中文
17 篇 俄文
6 篇 土耳其文
2 篇 德文
2 篇 朝鲜文
1 篇 西班牙文
1 篇 日文
1 篇 葡萄牙文

检索条件"主题词=Parallel programming"

共 6627 条记录，以下是1881-1890 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

A Cache- and Memory-Aware Mapping Algorithm for Big Data Applications 5

A Cache- and Memory-Aware Mapping Algorithm for Big Data App...

引用

Fifth International Conference on Digital Information Processing and Communications (ICDIPC)

作者： Xu, Thomas Canhao Leppanen, Ville Univ Turku Dept Informat Technol Turku Finland

ISBN: (纸本)9781467368322

In this paper, we propose and investigate a task mapping algorithm for big data applications. As a critical resource, data are produced faster than ever before. parallel programs that process these data on massive parallel systems are widely adopted. The task mapping algorithm however, has not been well optimized for these applications. We explore the characteristics of big data applications based on a shared cache/memory multicore processor. The latencies of cache and memory sub-systems are analysed. The proposed algorithm is designed to optimize the cache/memory latency, as well as intra-application latency. We introduce an efficient greedy algorithm to calculate the mapping result based on the congregate degree of nodes. Different numbers of search spaces are discussed and evaluated. Experiments are conducted based on synthetic simulation and running real applications on a full system simulation environment. Results confirmed the effectiveness of the proposed algorithm. Average execution time of five selected big data applications is reduced by 8% compared with the first fit algorithm.

关键词： Big Data cache storage greedy algorithms parallel programming search problems shared memory systems big data applications cache subsystem latency analysis cache-aware mapping algorithm data processing efficient greedy algorithm first fit algorithm intraapplication latency massive parallel systems memory latency optimization memory subsystem latency analysis memory-aware mapping algorithm parallel programs search space shared cache-memory multicore processor Algorithm design and analysis Artificial intelligence Big data Databases Greedy algorithms Measurement Multicore processing

来源：评论

学校读者我要写书评

暂无评论

parallelizing High-Frequency Trading Applications by Using C++11 Attributes 14

Parallelizing High-Frequency Trading Applications by Using C...

引用

14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015

作者： Danelutto, Marco Matteis, Tiziano De Mencagli, Gabriele Torquati, Massimo Department of Computer Science University of Pisa Largo B. Pontecorvo 3 PisaI-56127 Italy

ISBN: (纸本)9781467379519

With the wide diffusion of parallel architectures parallelism has become an indispensable factor in the application design. However, the cost of the parallelization process of existing applications is still too high in terms of time-to-development, and often requires a large effort and expertise by the programmer. The REPARA methodology consists in a systematic way to express parallel patterns by annotating the source code using C++11 attributes transformed automatically in a target parallel code based on parallel programming libraries (e.g. FastFlow, Intel TBB). In this paper we apply this approach in the parallelization of a real high-frequency trading application. The description shows the effectiveness of the approach in easily prototyping several parallel variants of the same code. We also propose an extension of a REPARA attribute to express a user-defined scheduling strategy, which makes it possible to design a high-throughput and low-latency parallelization of our code outperforming the other parallel variants in most of the considered test-cases. © 2015 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Extending access to HPC skills through a blended online course 15

Extending access to HPC skills through a blended online cour...

引用

4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015

作者： Gordon, Steven I. Demmel, James Destefano, Lizanne Rivera, Lorna Ohio Supercomputer Center ColumbusOH43212 United States University of California BerkeleyCA94720 United States University of Illinois at Urbana-Champaign 704 South Sixth Street MC308 United States University of Illinois UrbanIL61801 United States

ISBN: (纸本)9781450337205

Extending expertise in parallel computing is critical to all those using high performance computing to gain insights into science and engineering problems. Many campuses do not offer such a course because of course load limits, a lack of faculty expertise, and/or lack of access to appropriate computing resources. MOOCs for this type of course are difficult to scale both because of constraints on computing resources as well as the need for individual attention with programming problems. Using a blended online course with collaborating faculty that offer academic credit for their students, we have been able to facilitate course participation at many institutions that might not otherwise have covered the topic. This has had a significant benefit for both the faculty and students. Our paper summarizes the nature of these impacts and offers some insights on best practices for extending technical courses to multiple institutions. © 2015 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Enhancing GPU Programmability and Correctness Through Transactional Execution

Enhancing GPU Programmability and Correctness Through Transa...

引用

作者： Holey, Anup Purushottam University of Minnesota

学位级别：Ph.D.

Graphics Processing Units (GPUs) are becoming increasingly popular not only across various scientific communities, but also as integrated data-parallel accelerators on existing multicore processors. Support for massive fine-grained parallelism in contemporary GPUs provides a tremendous amount of computing power. GPUs support thousands of lightweight threads to deliver high computational throughput. Popularity of GPUs is facilitated by easy-to-adopt programming models such as CUDA and OpenCL that aim to ease programmers' efforts while developing parallel GPU applications. However, designing and implementing correct and efficient GPU programs is still challenging since programmers must consider interaction between thousands of parallel threads. Therefore, addressing these challenges is essential for improving programmers' productivity as well as software reliability. Towards this end, this dissertation proposes mechanisms for improving programmability of irregular applications and ensuring correctness of compute kernels. Some applications possess abundant data-level parallelism, but are unable to take advantage of GPU's parallelism. They exhibit irregular memory access patterns to the shared data structures. programming such applications on GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose transactional memory (TM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Transactional execution can incur runtime overheads due to activities such as detecting confl

关键词： Computera architecture Concurrency bugs GPUs parallel programming Speculative execution Transactional memory Computer science

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel numerical constraint solver using global load balancing 5

Scalable parallel numerical constraint solver using global l...

引用

5th ACM SIGPLAN Workshop on X10, X10 2015

作者： Ishii, Daisuke Yoshizoe, Kazuki Suzumura, Toyotaro Tokyo Institute of Technology Japan University of Tokyo Japan IBM T.J. Watson Research Center United States University College Dublin Ireland JST Japan

ISBN: (纸本)9781450335867

We present a scalable parallel solver for numerical constraint satisfaction problems (NCSPs). Our parallelization scheme consists of homogeneous worker solvers, each of which runs on an available core and communicates with others via the global load balancing (GLB) method. The search tree of the branch and prune algorithm is split and distributed through the two phases of GLB: a random workload stealing phase and a workload distribution and termination phase based on a hyper-cube-shaped graph called lifeline. The parallel solver is simply implemented with X10 that provides an implementation of GLB as a library. In experiments, NCSPs from the literature were solved and attained up to 516-fold speedup using 600 cores of the TSUBAME2.5 supercomputer. Optimal GLB configurations are analyzed.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Educational and research systems for studying of parallel methods 1

Educational and research systems for studying of parallel me...

引用

1st Russian Conference on Supercomputing Days 2015, RuSCDays 2015

作者： Kozinov, Evgeniy Gergel, Victor Linev, Alexey Shtanyuk, Anton

Teaching supercomputing technologies is very important and complex task. A wide variety of computer systems and technologies complicate training process. Educational and research systems can help. We are presenting two such systems: ParaLab and ParaLib. User can select task from a predefined list. For a given method, and different initial conditions can be performed computational experiments. For tasks can be explored methods of implementation of parallel algorithms. ParaLab allows you visualize the solution of the problem on a simulated computer system, as well as to obtain estimates of the performance and efficiency of algorithms. ParaLib demonstrates the possibility of using various modern technologies of parallel programming and parallel languages for the task (for example OpenMP,MPI,Co-Array Fortran and Chapel). Both systems allow store results for later analysis. The open architecture allows students to add their own implementation of algorithms. ParaLab and ParaLib are used in teaching students in the Nizhny Novgorod State University. © Copyright 2015 for the individual papers by the papers' authors.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A library for portable and composable data locality optimizations for NUMA systems 2015

A library for portable and composable data locality optimiza...

引用

20th ACM SIGPLAN Symposium on Principles and Practice of parallel programming, PPoPP 2015

作者： Majo, Zoltan Gross, Thomas R. Department of Computer Science ETH Zurich Switzerland

ISBN: (纸本)9781450332057

Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Optimizing NUMA memory system performance is difficult and costly for three principal reasons: (1) today's programming languages/libraries have no explicit support for NUMA systems, (2) NUMA optimizations are not portable, and (3) optimizations are not composable (i.e., they can become ineffective or worsen performance in environments that support composable parallel software). This paper presents TBB-NUMA, a parallel programming library based on Intel Threading Building Blocks (TBB) that supports portable and composable NUMA-aware programming. TBB-NUMA provides a model of task affinity that captures a programmer's insights on mapping tasks to resources. NUMA-awareness affects all layers of the library (i.e., resource management, task scheduling, and high-level parallel algorithm templates) and requires close coupling between all these layers. Optimizations implemented with TBB-NUMA (for a set of standard benchmark programs) result in up to 44% performance improvement over standard TBB, but more important, optimized programs are portable across different NUMA architectures and preserve data locality also when composed with other parallel computations. Copyright 2015 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The lock-free k-LSM relaxed priority queue 2015

The lock-free k-LSM relaxed priority queue

引用

20th ACM SIGPLAN Symposium on Principles and Practice of parallel programming, PPoPP 2015

作者： Wimmer, Martin Gruber, Jakob Träff, Jesper Larsson Tsigas, Philippas Faculty of Informatics Parallel Computing Vienna University of Technology Vienna/Wien1040 Austria Computer Science and Engineering Chalmers University of Technology Göteborg412 96 Sweden

ISBN: (纸本)9781450332057

We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ+1 smallest keys instead of only a minimal one, where ρ is a parameter that can be configured at runtime. It is built from a logarithmic number of sorted arrays, similar to log-structured merge-trees (LSM). For keys added and removed by the same thread the behavior is identical to a non-relaxed priority queue. We compare to state-of-the-art lock-free priority queues with both relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Fast data-dependence profiling by skipping repeatedly executed memory operations 15th

Fast data-dependence profiling by skipping repeatedly execut...

引用

15th International Conference on Algorithms and Architectures for parallel Processing, ICA3PP 2015

作者： Li, Zhen Beaumont, Michael Jannesari, Ali Wolf, Felix Technische Universität Darmstadt Darmstadt64289 Germany RWTH Aachen University Aachen52062 Germany

ISBN: (纸本)9783319271392

Nowadays, more and more program analysis tools adopt pro- filing approaches in order to obtain data dependences because of their ability of tracking dynamically allocated memory, pointers, and array indices. However, dependence profiling suffers from high time overhead. To lower the overhead, former dependence profiling techniques either exploit features of the specific program analyses they are designed for, or let the profiling process run in parallel. Although they successfully lowered the time overhead of dependence profiling by a certain amount, none of them have tried to solve the fundamental problem that causes the high time overhead: the memory operations that are repeatedly executed in loops. In most of the time, these memory operations lead to exactly the same data dependences. However, a profiling method has to profile all these memory operations over and over again in order to not miss a single dependence that may occur just once. In this paper, we present a method that allow a dependence profiling technique to skip memory operations that are repeatedly executed in loops without missing any sin- gle data dependence. Our method works with all types of loops and does not require any prepossessing like source annotation of the input code. Experiment results show that our method can lower the time overhead of data-dependence profiling by up to 52%. © Springer International Publishing Switzerland 2015.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

High-performance determinism with total store order consistency 15

High-performance determinism with total store order consiste...

引用

10th European Conference on Computer Systems, EuroSys 2015

作者： Merrifield, Timothy Devietti, Joseph Eriksson, Jakob University of Illinois Chicago United States University of Pennsylvania United States

ISBN: (纸本)9781450332385

We present Consequence, a deterministic multi-threading library. Consequence achieves deterministic execution via store buffering and strict ordering of synchronization operations. To ensure high performance under a wide variety of conditions, the ordering of synch operations is based on a deterministic clock [25], and store buffering is implemented using version-controlled memory [23]. Recent work on deterministic concurrency [14, 19] has proposed relaxing the consistency model beyond total store order (TSO). Through novel optimizations, Consequence achieves the same or better performance on the Phoenix, PARSEC and SPLASH-2 benchmark suites, while retaining TSO memory consistency. Across 19 benchmark programs, Consequence incurs a worst-case slowdown of 3.9× vs. pthreads, with 14 out of 19 programs at or below 2.5×. We believe this performance improvement takes parallel programming one step closer to "determinism by default.".

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 185 186 187 188 189 190 191 192 193 194 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：