检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

828 篇 会议
294 篇 期刊文献
2 册 图书

馆藏范围

1,124 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

712 篇 工学
- 556 篇 计算机科学与技术...
- 411 篇 软件工程
- 136 篇 信息与通信工程
- 82 篇 控制科学与工程
- 74 篇 电子科学与技术（可...
- 64 篇 生物工程
- 50 篇 机械工程
- 44 篇 电气工程
- 26 篇 动力工程及工程热...
- 24 篇 仪器科学与技术
- 24 篇 化学工程与技术
- 15 篇 网络空间安全
- 14 篇 材料科学与工程（可...
- 14 篇 土木工程
- 12 篇 力学（可授工学、理...
- 12 篇 交通运输工程
- 12 篇 农业工程
- 12 篇 环境科学与工程（可...
288 篇 理学
- 180 篇 数学
- 67 篇 生物学
- 50 篇 物理学
- 40 篇 统计学（可授理学、...
- 36 篇 系统科学
- 25 篇 化学
183 篇 管理学
- 126 篇 管理科学与工程(可...
- 60 篇 图书情报与档案管...
- 35 篇 工商管理
19 篇 法学
- 16 篇 社会学
13 篇 经济学
- 13 篇 应用经济学
13 篇 农学
11 篇 教育学
- 11 篇 教育学
8 篇 医学
4 篇 文学
3 篇 军事学
2 篇 艺术学

主题

44 篇 computational mo...
31 篇 concurrent compu...
31 篇 training
30 篇 laboratories
30 篇 algorithm design...
28 篇 computer archite...
28 篇 benchmark testin...
28 篇 feature extracti...
28 篇 kernel
27 篇 semantics
27 篇 distributed proc...
25 篇 graphics process...
25 篇 servers
25 篇 hardware
23 篇 parallel process...
23 篇 fault tolerance
23 篇 cloud computing
21 篇 task analysis
21 篇 throughput
21 篇 distributed comp...

机构

169 篇 national laborat...
134 篇 science and tech...
104 篇 college of compu...
81 篇 national laborat...
38 篇 national laborat...
36 篇 science and tech...
35 篇 school of comput...
34 篇 national laborat...
29 篇 national key lab...
22 篇 science and tech...
22 篇 national key lab...
18 篇 national laborat...
16 篇 science and tech...
16 篇 national laborat...
15 篇 school of comput...
14 篇 national laborat...
14 篇 laboratory of di...
13 篇 national key lab...
12 篇 college of compu...
12 篇 national key lab...

作者

44 篇 yong dou
41 篇 dou yong
41 篇 wang huaimin
36 篇 dongsheng li
36 篇 liu jie
35 篇 huaimin wang
31 篇 jie liu
30 篇 peng yuxing
29 篇 yuxing peng
29 篇 li dongsheng
29 篇 yijie wang
27 篇 xiaodong wang
26 篇 wang yijie
24 篇 yin gang
23 篇 wang ji
22 篇 zhigang luo
21 篇 xingming zhou
20 篇 gang yin
20 篇 qiao peng
20 篇 li kuan-ching

语言

1,047 篇 英文
59 篇 中文
18 篇 其他

检索条件"机构=The Science and Technology on Parallel and Distributed Processing Laboratory"

共 1124 条记录，以下是911-920 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Optimization and Implementation of LBM Benchmark on Multithreaded GPU

Optimization and Implementation of LBM Benchmark on Multithr...

引用

International Conference on Data Storage and Data Engineering (DSDE)

作者： Xiaoguang Ren Yuhua Tang Guibin Wang Tao Tang Xudong Fang National Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9781424456789;9780769539584

With fast development of transistor technology, Graphic processing Unit(GPU) is increasingly used in the non-graphics applications, and major GPU hardware vendors have introduced software stacks for their own GPUs, such as Brook+ for AMD GPU. Compared with the traditional parallel systems, heterogeneous systems integerating stream-based multi-threaded GPUs provide higher parallel computing capabilities with lower cost. However, porting traditional applications to the heterogeneous systems makes new demand of application optimization on GPU. Based on the AMD's Brook+ platform, we explored application optimization features on AMD GPU by optimizing and implementing the benchmark LBM from SPEC2006. To improve the program locality, we optimized the original data layout of LBM. Using the short vector data types mechanism provided by Brook+, we also optimized the GPU's bandwidth utilization and its thread processors' efficiency. Through the branch elimination technique, we reduced the performance lose caused by branch divergences in the kernel, which is due to the GPU's SIMD executing mode. The experiment results show that data layout, memory bandwidth, branch paths and other factors have a close effect on the performance of program execution on the GPU. Through all the optimizations, we finally got a speedup of 22x (single-precision) and 19x (double-precision) over the original serial benchmark code on a Quad-core CPU, and a speedup of 4x (single-precision) and 8.7x (double-precision) over the original OMP benchmark code on a 8-core CPU.

关键词： Application software Graphics processing unit parallel processing Computer architecture Biomedical computing Computational fluid dynamics Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Evaluating the Performance and Accuracy Impact of Trace Generation to the BigSim Emulator

Evaluating the Performance and Accuracy Impact of Trace Gene...

引用

International Conference on Computer and Information technology (CIT)

作者： Yonggang Che Chuanfu Xu Pingjing Lu National Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

This paper quantitatively studies the trace effects to the performance and accuracy of the BigSim Emulator, a scalable parallel emulator for large-scale computers. To assess the accuracy effect we modify the emulator code to collect the predicted computation time. Four MPI programs with different computation to communication ratios are used as benchmarks. The emulation time and the predicted computation time, both when trace generation are enabled and disabled, are collected on two parallel host machines. The results show that although the BigSim Emulator only traces communication events and dependencies, trace generation still evidently degrades the emulation performance for programs with high communication to computation ratios. Trace generation also significantly affects the accuracy of the predicted computation time for communication intensive programs, which is an issue that can not be overlooked.

关键词： Emulation Accuracy Runtime Benchmark testing Program processors Predictive models Jacobian matrices

来源：评论

学校读者我要写书评

暂无评论

A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU

A Case Study of SWIM: Optimization of Memory Intensive Appli...

引用

International Symposium on parallel Architectures, Algorithms and Programming (PAAP)

作者： Wei Yi Yuhua Tang Guibin Wang Xudong Fang National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-GPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case study. We attempt to optimize the performance and energy consumption of the application utilizing different memory access mechanisms and present optimization methods including matrix transposition and kernel fusion. The experimental results on the Intel Core TM i920 CPU plus GeForce GTX 295 platform shows that, the proposed optimizing methods achieve a speedup of 8.7X over the original OpenMP program and reduce the energy consumption by 83% for the problem size of 2048*2048.

关键词： Instruction sets Kernel Bandwidth Graphics processing unit Memory management Energy consumption Central processing Unit

来源：评论

学校读者我要写书评

暂无评论

Optimizing Adaptive Synchronization in parallel Simulators for Large-scale parallel Systems and Applications

Optimizing Adaptive Synchronization in Parallel Simulators f...

引用

International Conference on Computer and Information technology (CIT)

作者： Chuanfu Xu Yonggang Che Jianbin Fang Zhenghua Wang National Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

This paper addresses the optimization of parallel simulators for large-scale parallel systems and applications. Such simulators are often based on parallel discrete event simulation with conservative or optimistic protocols to synchronize the simulating processes. The paper considers how available future information about events and application behaviors can be efficiently extracted and further exploited to improve the performance of adaptive optimistic protocols. First, we extract information about future events and their dependencies in application traces to guide adaptive adjustments of time window in trace-driven parallel simulation. Second, we use information about application behaviors, specifically the iterative behavior found in many applications, to avoid the unnecessary adjustments of time window. These techniques are implemented in the BigSim simulator and tested by real-world and standard benchmark applications including Jacobi3D and HPL. The results show that our optimization approaches can reduce the execution times of simulation ranging from 11% up to 32%. Moreover, our methods are easy to implement and don't need to augment compilers or even modify the core codes of parallel simulators.

关键词： Adaptation model Computational modeling Schedules Optimization Protocols Synchronization Analytical models

来源：评论

学校读者我要写书评

暂无评论

Efficient Virtual Machine Deployment in Large Scale Resource Environment

Efficient Virtual Machine Deployment in Large Scale Resource...

引用

International Conference on parallel and distributed Systems (ICPADS)

作者： Feng Huang Dongsheng Li National Laboratory of Parallel and Distributed Processing College of Computer National University of Defense Technology Changsha China

Combining virtual machine technology, virtual computing is able to effectively aggregate the widely distributed resources to provide users services. We view the federation of multiple data centers and voluntary resources on the Internet as a very large scale resource pool. Based on the tree structure of the pool, this paper proposes a virtual machine deployment algorithm, called iVDA, considers users' requests and the capabilities of the physical resources as well as the dynamic load, implements an adaptive mechanism to scheduling servers to host virtual machines forming virtual execution environments for various applications, and supports on-demand computing.

关键词： Virtual machining Servers Heuristic algorithms Internet Dynamic scheduling Cloning Processor scheduling

来源：评论

学校读者我要写书评

暂无评论

Towards Online Application Cache Behaviors Identification in CMPs

Towards Online Application Cache Behaviors Identification in...

引用

IEEE International Conference on High Performance Computing and Communications (HPCC)

作者： Xiaomin Jia Jiang Jiang Tianlei Zhao Shubo Qi Minxuan Zhang National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

On chip multiprocessors (CMPs) platforms, multiple co-scheduled applications can severely degrade performance and quality of service (QoS) when they contend for last-level cache (LLC) resources. Whether an application will impose destructive interference on co-scheduled applications is largely dependent on its own inherent cache access behavior characteristics. In this work, we first present case studies that show how inter-application interferences result in undesirable performance in both shared and private cache based LLC designs. We then propose a new online approach for application cache behavior identification on the basis of detailed simulation and analysis with SPEC CPU2006 benchmarks. We demonstrate that our approach can more concisely identify application cache behaviors. Moreover, the proposed approach can be implemented directly in hardware to dynamically identify the application cache behaviors at runtime. Finally, we show with two case studies that how the proposed approach can be adopted by both shared and private based cache sharing mechanisms, i.e. cache partitioning algorithms (CPAs) and cache spilling techniques, for more concise cache resource management.

关键词： Benchmark testing Measurement Throughput Servers Interference Aerospace electronics Heuristic algorithms

来源：评论

学校读者我要写书评

暂无评论

Effect of self and cross-coupling capacitance on stability diagram in a metallic double-dot device

Effect of self and cross-coupling capacitance on stability d...

引用

International Conference on Nanoscience and Nanotechnology, ICONN

作者： Bingcai Sui Liang Fang Yaqing Chi National Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Hunan China

We investigate the effect of self and cross-coupling capacitance on stability diagram in a metallic double-dot device by theory and method. In linear transport regime, cross-coupling capacitances affect the dimension of the honeycomb cell and the distance of two triple points, while self capacitances only slightly broaden the boundary of the cell and make two triple point closer. In nonlinear transport regime, cross-coupling capacitances stretch the current region and charge region in the mid-line direction, while self capacitances extend the region of current regions but not change the shape of the stability cells. Cross-coupling capacitances make stronger impact on the dimensions of stability diagram than self capacitance. But the self-capacitance must be included in the current calculation if its value can not be neglected with respect to the device parameters.

关键词： Stability analysis Tunneling Couplings Quantum capacitance Shape Junctions

来源：评论

学校读者我要写书评

暂无评论

Communication delay analysis based on network calculus

Communication delay analysis based on network calculus

引用

International Conference on Software technology and Engineering (ICSTE)

作者： Yufei Lin Xinhai Xu Yisong Lin National Laboratory of Parallel and Distributed Processing Computer School National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781424486670

Network calculus is a promising theory for analyzing and modeling networks based on min-plus algebra. Using network calculus theory, we propose formulas of arrival curve and service curve for end-to-end communication, build the corresponding time model, and derive the communication delay formulas for two scenarios of the model respectively. Then we take fat tree topology, which is widely used in Infiniband interconnection, as an example to analyze the delay of one-to-all broadcast. This paper, as a groundwork, provides a new approach for the network researchers to delve communication delay in future researches.

关键词： Delay Calculus Bandwidth Algebra Computational modeling Topology Analytical models

来源：评论

学校读者我要写书评

暂无评论

Power analysis and optimizations for GPU architecture using a power simulator

Power analysis and optimizations for GPU architecture using ...

引用

International Conference on Advanced Computer Theory and Engineering, ICACTE

作者： Guibin Wang National Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781424465392

As one of the most popular many-core architecture, GPUs have illustrated power in many non-graphic applications. Traditional general purpose computing systems tend to integrate GPU as the co-processor to accelerate parallel computing tasks. Meanwhile, GPUs also result in high power consumption, which accounts for a large proportion of the total system power consumption. In this paper, we mainly focus on the power analysis and optimizations for GPU architecture. The main contributions of this paper are: firstly, we establish a GPU power research platform, which is extended from an existing GPU simulator with several power models; secondly, we validate that, as the gap between shader core and memory speed becomes larger and larger, integrating more shader cores or enhancing running frequencies may not bring better performance, but results in higher energy consumption; thirdly, we show that traditional power optimization methods for CPUs, such as dynamic frequency scaling and concurrency-throttling, could be effectively applied on GPU architectures for better power efficiency, especially for memory-intensive applications.

关键词： Computational modeling Benchmark testing Graphics processing unit Arrays Analytical models Clocks Integrated circuit interconnections

来源：评论

学校读者我要写书评

暂无评论

A multithreaded extension to the OR1200 processor

A multithreaded extension to the OR1200 processor

引用

International Conference on Computer science and Information technology (CSIT)

作者： Kun Zeng Fudong Liu National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha Hunan China

Multithreading is a promising technique that widely used in general purpose processors to hide long latency events such as cache misses. This paper proposes an embedded processor design with multithreading support based on the OR1200 processor. The multithreaded OR1200 processor supports interleaved execution of four threads in a round-robin way. The hardware design is evaluated through RTL-simulation of the verilog code. Results show that the interleaved execution of multiple threads can tolerate the memory latency effectively and an average speed-up of 1.16 can be achieved.

关键词： Registers Artificial neural networks Field-flow fractionation Reduced instruction set computing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共113页 << < 88 89 90 91 92 93 94 95 96 97 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：