检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

5,038 篇 会议
1,444 篇 期刊文献
129 册 图书
75 篇 学位论文

馆藏范围

6,686 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,970 篇 工学
- 3,387 篇 计算机科学与技术...
- 2,002 篇 软件工程
- 990 篇 电气工程
- 237 篇 信息与通信工程
- 178 篇 电子科学与技术（可...
- 137 篇 控制科学与工程
- 66 篇 机械工程
- 52 篇 生物医学工程（可授...
- 52 篇 生物工程
- 44 篇 仪器科学与技术
- 32 篇 材料科学与工程（可...
- 30 篇 力学（可授工学、理...
- 28 篇 动力工程及工程热...
- 28 篇 土木工程
- 21 篇 光学工程
- 21 篇 石油与天然气工程
677 篇 理学
- 396 篇 数学
- 118 篇 物理学
- 87 篇 生物学
- 78 篇 系统科学
- 33 篇 化学
- 28 篇 统计学（可授理学、...
- 25 篇 地球物理学
355 篇 管理学
- 263 篇 管理科学与工程(可...
- 98 篇 图书情报与档案管...
- 62 篇 工商管理
68 篇 教育学
- 62 篇 教育学
59 篇 医学
- 44 篇 临床医学
- 22 篇 基础医学(可授医学...
30 篇 法学
- 27 篇 社会学
17 篇 农学
15 篇 经济学
12 篇 文学
6 篇 艺术学
4 篇 军事学

主题

6,686 篇 parallel program...
1,067 篇 concurrent compu...
1,005 篇 parallel process...
572 篇 programming prof...
482 篇 application soft...
466 篇 computer science
466 篇 computer archite...
401 篇 hardware
340 篇 message passing
334 篇 distributed comp...
320 篇 libraries
315 篇 computational mo...
248 篇 computer languag...
231 篇 high performance...
230 篇 program processo...
229 篇 runtime
198 篇 parallel archite...
196 篇 parallel algorit...
193 篇 yarn
179 篇 costs

机构

14 篇 carnegie mellon ...
13 篇 barcelona superc...
11 篇 brno university ...
11 篇 univ illinois de...
11 篇 school of comput...
11 篇 intel corporatio...
10 篇 univ pisa dept c...
10 篇 stanford univ st...
9 篇 school of applie...
9 篇 department of co...
9 篇 carnegie mellon ...
9 篇 mathematics and ...
9 篇 department of co...
9 篇 rice univ housto...
8 篇 department of co...
8 篇 ibm thomas j. wa...
8 篇 univ alberta dep...
8 篇 department of co...
8 篇 irisa rennes
8 篇 tech univ berlin

作者

31 篇 griebler dalvan
25 篇 sarkar vivek
21 篇 danelutto marco
20 篇 fernandes luiz g...
19 篇 loulergue freder...
17 篇 badia rosa m.
16 篇 torquati massimo
15 篇 mencagli gabriel...
15 篇 olukotun kunle
14 篇 wolf felix
12 篇 g. runger
12 篇 gonzalez-escriba...
12 篇 ayguade eduard
12 篇 m. sato
11 篇 hoefler torsten
11 篇 dinavahi venkata
11 篇 benini luca
11 篇 valero mateo
11 篇 sato mitsuhisa
11 篇 t. rauber

语言

6,494 篇 英文
139 篇 其他
21 篇 中文
17 篇 俄文
7 篇 土耳其文
2 篇 德文
2 篇 朝鲜文
1 篇 西班牙文
1 篇 日文
1 篇 葡萄牙文

检索条件"主题词=Parallel programming"

共 6686 条记录，以下是1581-1590 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Minimization of Xeon Phi Core Use with Negligible Execution Time Impact 16

Minimization of Xeon Phi Core Use with Negligible Execution ...

引用

Conference on Diversity, Big Data, and Science at Scale (XSEDE)

作者： Barranco, Roberto Camacho Teller, Patricia J. Gerndt, Michael Univ Texas El Paso 500 W Univ Ave El Paso TX 79968 USA Tech Univ Munich Boltzmannstr 3 D-85748 Garching Germany

ISBN: (纸本)9781450347556

For many years GPUs have been components of HPC clusters (Titan and Piz Daint), while only in recent years has the Intel (R) Xeon Phi (TM) been included (Tianhe-2and Stampede). For example, GPUs are in 14% of systems in the November 2015 Top500 list, while the Xeon Phi (TM) is in 6%. Intel (R) came out with Xeon Phi (TM) to compete with NVIDIA GPUs by offering a unified environment that supports OpenMP and MPI, and by providing competitive and easier-to-utilize processing power with less energy consumption. Maximum Xeon Phi (TM) execution-time performance requires that programs have high data parallelism and good scalability, and use parallel algorithms. And, improved Phi (TM) power performance and throughput can be achieved by reducing the number of cores employed for application execution. Accordingly, the objectives of this paper are to: (1) Demonstrate that some applications can be executed with fewer cores than are available to users with a negligible impact on execution time: For 59.3% of the 27 application instances studied, doing this results in better performance and for 37% using less than half of the available cores results in performance degradation of not more than 10% in the worst case. (2) Develop a tool that provides the user with the optimal number of cores to employ: We designed an algorithm and developed a plugin for the Periscope Tuning Framework, an automatic performance tuner, that for a given application provide the user with an estimation of this number. (3) Understand if performance metrics can be used to identify applications that can be executed with fewer cores with a negligible impact on execution time: We identified, via statistical analysis, the following three metrics that are indicative of this, at least for the application instances studied: low L1 Compute to Data Access ratio, i.e., the average number of computations that are performed per byte of data loaded/stored in the L1 cache, high use of data bandwidth, and, to a lesser exte

关键词： Performance parallel programming Intel (R) Xeon Phi (TM) Periscope autotuning

来源：评论

学校读者我要写书评

暂无评论

Optimizing PGAS overhead in a multi-locale Chapel implementation of CoMD 1

Optimizing PGAS overhead in a multi-locale Chapel implementa...

引用

1st PGAS Applications Workshop (PAW)

作者： Haque, Riyaz Richards, David Lawrence Livermore Natl Lab Livermore CA 94550 USA

ISBN: (纸本)9781509052141

Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover, due to insufficient locality information the compiler ends up using "wide" pointers-that can point to non-local data-for objects referenced in an otherwise completely local manner, adding to the runtime overhead. In this work we describe CoMD-Chapel, our distributed Chapel implementation of the CoMD benchmark. We demonstrate that optimizing data access through replication and localization is crucial for achieving performance comparable to the reference implementation. We discuss limitations of existing scope-based locality optimizations and argue instead for a more general (and robust) type-based approach. Lastly, we also evaluate code performance and scaling characteristics. The fully optimized version of CoMD-Chapel can perform to within 62%-87% of the reference implementation.

关键词： Chapel CoMD PGAS Locality Distributed computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Faster Load Flow Analysis 1

引用

International Conference on Information and Communication Technology for Sustainable Development (ICT4SD)

作者： Saxena, Rahul Krishna, R. Jaya Sharma, D. P. Manipal Univ Jaipur Sch Comp & Informat Technol Jaipur Rajasthan India

ISBN: (数字)9789811001291

ISBN: (纸本)9789811001291;9789811001277

Over the past few decades, load flow algorithms for radial distribution networks have been an area of interest for researches, which has led to improvement in the approach and results for the problem. Different procedures and algorithms have been followed in lieu of performance enhancement in terms of simplicity of implementation, execution time, and memory space requirements. The implementation of load flow algorithm using CUDA parallel programming architecture for a radial distribution network is discussed. The computations involved in serial algorithm for load current, branch impedances, etc., have been parallelized using CUDA programming model. The end result will be an improvement in execution time of the algorithm as compared to the running time of the algorithm over CPU. Finally, a comparison has been drawn between the serial and parallel approaches, where an improvement in execution time has been shown over the functions involved in computations.

关键词： Load flow analysis Performance improvement parallel programming Branch impedances

来源：评论

学校读者我要写书评

暂无评论

A Cluster-As-Accelerator approach for SPMD-free Data parallelism 24

A Cluster-As-Accelerator approach for SPMD-free Data Paralle...

引用

24th Euromicro International Conference on parallel, Distributed, and Network-Based Processing (PDP)

作者： Drocco, Maurizio Misale, Claudia Aldinucci, Marco Univ Torino Dept Comp Sci Turin Italy

ISBN: (纸本)9781467387767

In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high-level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general interpreters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.

关键词： skeletons cluster computing data-centric parallel programming skedato exascale

来源：评论

学校读者我要写书评

暂无评论

Estimating Transaction Execution Times for a Software Transactional Memory 6

Estimating Transaction Execution Times for a Software Transa...

引用

6th International Conference on Information Science and Technology (ICIST)

作者： Popovic, Miroslav Kordic, Branislav Basicevic, Ilija Univ Novi Sad Fac Tech Sci Trg Dositeja Obradovica 6 Novi Sad 21000 Serbia

ISBN: (纸本)9781509012244

Over the last two decades, researchers developed many software, hardware, and hybrid Transactional Memories (TMs) with various APIs and semantics. However, reduced performance when exposed to high contention loads is still the major downside of all the TMs. Although many strategies and methods have been proposed, contention management and transaction scheduling still remains an open area of research. An important piece of unsolved contention management punk is plausible transaction execution time estimation. In this paper we proposed two methods for estimating transaction execution times, namely the method based on log-normal distribution and the method based on gamma distribution. Experimental results presented in this paper indicate that the method based on log-normal distribution has better estimation accuracy than the method based on gamma distribution. Even more importantly, the method based on log-normal distribution uses 10 times shorter sliding windows and its complexity is much lower than for the method based on gamma distribution, thus it is faster and requires less electrical power.

关键词： parallel programming transactional memory execution time estimation log-normal distribution gamma distribution

来源：评论

学校读者我要写书评

暂无评论

Energy Consumption Powered by Graphics Processing Units (GPU) in Response to the Number of Operating Computing Unit 2

Energy Consumption Powered by Graphics Processing Units (GPU...

引用

2nd International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM)

作者： Huzmiev, I. K. Chipirov, Z. Ah North Caucasian Inst Min & Met Vladikavkaz Russia

ISBN: (纸本)9781509013227

The article presents a method of measuring energy consumption with the NVIDIA graphics processing unit and energy consumption in response to the number of operating units. The architecture of graphics processing unit has been considered as well as the method of energy consumption of GPU. The experiment is based on multiplication of matrices. Brief results and dependency of counting time from number of computing elements are also demonstrated. A simple way to understand the difference between a CPU and GPU is to compare how they process tasks. The CPU consists of a few cores optimized for sequential serial processing while the GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.

关键词： microcontroller energy efficiency parallel programming graphics processing unit (GPU) computing unit energy consumption

来源：评论

学校读者我要写书评

暂无评论

Adaptation of Deep Belief Networks to Modern Multicore Architectures 11th

Adaptation of Deep Belief Networks to Modern Multicore Archi...

引用

11th International Conference on parallel Processing and Applied Mathematics (PPAM)

作者： Olas, Tomasz Mleczko, Wojciech K. Nowicki, Robert K. Wyrzykowski, Roman Czestochowa Tech Univ Inst Comp & Informat Sci Dabrowskiego 69 PL-42201 Czestochowa Poland Czestochowa Tech Univ Inst Computat Intelligence Armii Krajowej 36 PL-42201 Czestochowa Poland

ISBN: (纸本)9783319321493;9783319321486

In our previous paper [17], the parallel realization of Restricted Boltzman Machines (RBMs) was discussed. This research confirmed a potential usefulness of Intel MIC parallel architecture for implementation of RBMs. In this work, we investigate how the Intel MIC and Intel CPU architectures can be applied to implement the complete learning process using Deep Belief Networks (DBNs), which layers correspond to RBMs. The learning procedure is based on the matrix approach, where learning samples are grouped into packages, and represented as matrices. This approach is now applied for both the initial learning, and fine-tuning stages of learning. The influence of the package size on the accuracy of learning, as well as on the performance of computations are studied using conventional CPU and Intel Xeon Phi architectures.

关键词： Deep belief network Restricted Boltzman machine parallel programming Multicore architectures OpenMP Vectorization Intel Xeon Phi

来源：评论

学校读者我要写书评

暂无评论

Code parallelization through Sequential Code Search 16

Code Parallelization through Sequential Code Search

引用

38th IEEE/ACM International Conference on Software Engineering Companion (ICSE)

作者： Cai, Bowen Texas A&M Univ College Stn TX 77843 USA

ISBN: (纸本)9781450342056

In this paper, we propose a new technique to recommend programmers with high quality parallel code that are similar to a given sequential code. This is done by transforming well-grounded parallel code A into its sequential equivalent B, storing them (A->B) into database, given a sequential code C, search the database for syntactic or semantic similar code B and retrieve its parallel version code A, which can be used as the replacement or reference for the original code C. We also outline our solutions towards realizing this technique and present a preliminary study that shows promising results.

关键词： clone detection code sequentialization parallel programming

来源：评论

学校读者我要写书评

暂无评论

An Efficient and Scalable Algorithmic Method for Generating Large-Scale Random Graphs 16

An Efficient and Scalable Algorithmic Method for Generating ...

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Alam, Maksudul Khan, Maleq Vullikanti, Anil Marathe, Madhav Virginia Tech Biocomplex Inst Network Dynam & Simulat Sci Lab Blacksburg VA 24061 USA Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA

ISBN: (纸本)9781467388153

Many real-world systems and networks are modeled and analyzed using various random graph models. These models must incorporate relevant properties such as degree distribution and clustering coefficient. Many models, such as the Chung-Lu (CL), stochastic Kronecker, stochastic block model (SBM), and block two-level Erdos-Renyi (BTER) models have been devised to capture those properties. However, the generative algorithms for these models are mostly sequential and take prohibitively long time to generate large-scale graphs. In this paper, we present a novel time and space efficient algorithmic method to generate random graphs using CL, BTER, and SBM models. First, we present an efficient sequential algorithm and an efficient distributed-memory parallel algorithm for the CL model. Our sequential algorithm takes O(m) time and O(Lambda) space, where m and. are the number of edges and distinct degrees, and our parallel algorithm takes O (m/P + Lambda + P) time w.h.p. and O(Lambda) space using P processors. These algorithms are almost time optimal since any sequential and parallel algorithms need at least O(m) and O(m P) time, respectively. Our algorithms outperform the best known previous algorithms by a significant margin in terms of both time and space. Experimental results on various large-scale networks show that both of our sequential and parallel algorithms require 400-15000 times less memory than the existing sequential and parallel algorithms, respectively, making our algorithms suitable for generating very large-scale networks. Moreover, both of our algorithms are about 3-4 times faster than the existing sequential and parallel algorithms. Finally, we show how our algorithmic method also leads to efficient parallel and sequential algorithms for the SBM and BTER models.

关键词： network theory random graphs parallel programming distributed computing

来源：评论

学校读者我要写书评

暂无评论

Automatic Code parallelization with OpenMP Task Constructs

Automatic Code Parallelization with OpenMP Task Constructs

引用

International Conference on Information Science (ICIS)

作者： Mathews, Manju Abraham, Jisha P. Mar Athanasius Coll Engn Dept Comp Sci & Engn Kochi Kerala India

ISBN: (纸本)9781509019878

Multi-core processors are very common in the form of dual-core and quad-core processors. To take advantage of multiple cores, parallel programs are written. Existing legacy applications are sequential and runs on multiple cores utilizing only one core. Such applications should be either rewritten or parallelized to make efficient use of multiple cores. Manual parallelization requires huge efforts in terms of time and money and hence there is a need for automatic parallelization. Automatic Code parallelizer using OpenMP automates the insertion of compiler directives to facilitate parallel processing on multi-core shared memory machines. The proposed tool converts an input sequential C source code into a multi-threaded parallel C source code. The tool supports multi-level parallelization with the generation of nested OpenMP constructs. The proposed scheme statically decomposes a sequential C program into coarse grain tasks, analyze dependency among tasks and generates OpenMP parallel code. The focus is on coarse-grained task parallelism to improve performance beyond the limits of loop parallelism. Due to the broad support of OpenMP standard, the generated OpenMP codes can run on a wide range of SMP machines and may result in a performance improvement.

关键词： Automatic Code parallelization OpenMP Nested parallelization parallel programming Coarse grained parallelism

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 155 156 157 158 159 160 161 162 163 164 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：