检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

3,437 篇 会议
108 篇 期刊文献
3 册 图书

馆藏范围

3,548 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,425 篇 工学
- 2,335 篇 计算机科学与技术...
- 1,551 篇 软件工程
- 465 篇 信息与通信工程
- 335 篇 电气工程
- 252 篇 控制科学与工程
- 185 篇 网络空间安全
- 179 篇 电子科学与技术（可...
- 41 篇 生物医学工程（可授...
- 34 篇 动力工程及工程热...
- 29 篇 机械工程
- 29 篇 建筑学
- 29 篇 生物工程
- 28 篇 安全科学与工程
- 25 篇 土木工程
- 23 篇 光学工程
- 22 篇 力学（可授工学、理...
- 20 篇 化学工程与技术
- 20 篇 交通运输工程
- 17 篇 环境科学与工程（可...
1,069 篇 理学
- 949 篇 数学
- 131 篇 统计学（可授理学、...
- 80 篇 系统科学
- 77 篇 物理学
- 37 篇 生物学
- 26 篇 化学
462 篇 管理学
- 363 篇 管理科学与工程(可...
- 231 篇 工商管理
- 123 篇 图书情报与档案管...
43 篇 经济学
- 43 篇 应用经济学
21 篇 法学
- 21 篇 社会学
15 篇 农学
14 篇 医学
11 篇 教育学
3 篇 文学
1 篇 军事学

主题

490 篇 parallel process...
381 篇 parallel process...
313 篇 concurrent compu...
294 篇 computer science
276 篇 distributed comp...
266 篇 distributed comp...
217 篇 parallel algorit...
163 篇 computer archite...
162 篇 application soft...
132 篇 computational mo...
130 篇 parallel program...
121 篇 costs
118 篇 hardware
110 篇 algorithm design...
110 篇 computer network...
108 篇 delay
107 篇 processor schedu...
88 篇 distributed proc...
84 篇 parallel archite...
78 篇 hypercubes

机构

13 篇 pacific northwes...
11 篇 ieee
9 篇 syracuse univ sy...
9 篇 georgia inst of ...
9 篇 department of co...
8 篇 new jersey inst ...
7 篇 ibm thomas j. wa...
7 篇 school of comput...
7 篇 purdue univ west...
7 篇 irisa rennes
6 篇 texas a&m univ c...
6 篇 univ of californ...
6 篇 michigan state u...
6 篇 univ of maryland...
6 篇 institute of com...
6 篇 ohio state univ ...
6 篇 carnegie mellon ...
5 篇 department of co...
5 篇 department of co...
5 篇 university of lu...

作者

14 篇 bader david a.
10 篇 li keqin
9 篇 zomaya albert y.
9 篇 prasanna viktor ...
9 篇 das sajal k.
8 篇 prasad sushil k.
8 篇 maciejewski anth...
8 篇 sussman alan
8 篇 sun xian-he
7 篇 ibarra oscar h.
7 篇 boukerche azzedi...
7 篇 casanova henri
7 篇 talbi el-ghazali
7 篇 panda dhabaleswa...
7 篇 olariu s.
7 篇 cai wentong
7 篇 ahmad ishfaq
7 篇 aluru srinivas
6 篇 pan yi
6 篇 dongarra jack

语言

3,544 篇 英文
3 篇 其他
1 篇 中文

检索条件"任意字段=Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing"

共 3548 条记录，以下是371-380 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation 28

Bulk Execution of Oblivious Algorithms on the Unified Memory...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Tani, Kazuya Takafuji, Daisuke Nakano, Koji Ito, Yasuaki Hiroshima Univ Dept Informat Engn Kagamiyama 1-4-1 Higashihiroshima 7398527 Japan

ISBN: (纸本)9781479941162

The Unified Memory Machine (UMM) is a theoretical parallel computing model that captures the essence of the global memory access of GPUs. A sequential algorithm is oblivious if an address accessed at each time does not depend on input data. Many important tasks including matrix computation, signal processing, sorting, dynamic programming, and encryption/decryption can be performed by oblivious sequential algorithms. Bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. The main contribution of this paper is to show that the bulk execution of an oblivious sequential algorithm can be implemented to run on the UMM very efficiently. More specifically, the bulk execution for different inputs can be implemented to run O (pt/w + lt) time units using threads on the UMM with memory width w and memory access latency l, where t is the running time of the oblivious sequential algorithm. We also prove that this implementation is time optimal. Further, we have implemented two oblivious sequential algorithms to compute the prefix-sums of an array of size n and to find the optimal triangulation of a convex n-gon using the dynamic programming technique. The prefix-sum algorithm is a quite simple example of oblivious algorithms, while the optimal triangulation algorithm is rather complicated. The experimental results on GeForce GTX Titan show that our implementations for the bulk execution of these two algorithms can be 150 times faster than that of a single CPU if they have many inputs. This fact implies that our idea for the bulk execution of oblivious sequential algorithms is a potent method to elicit the capability of CUDA-enabled GPUs very easily.

关键词： parallel algorithms oblivious sequential algorithm memory machine models coalesced memory access GPU CUDA

来源：评论

学校读者我要写书评

暂无评论

Nanoscale Cluster Detection in Massive Atom Probe Tomography Data 28

Nanoscale Cluster Detection in Massive Atom Probe Tomography...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Seal, Sudip K. Yoginath, Srikanth B. Miller, Michael K. Oak Ridge Natl Lab Computat Sci & Engn Div Oak Ridge TN 37831 USA Oak Ridge Natl Lab Ctr Nanophase Mat Sci Oak Ridge TN 37831 USA

ISBN: (纸本)9781479941162

Recent technological advances in atom probe tomography (APT) have led to unprecedented data acquisition capabilities that routinely generate data sets containing hundreds of millions of atoms. Detecting nanoscale clusters of different atom types present in these enormous amounts of data and analyzing their spatial correlations with one another are fundamental to understanding the structural properties of the material from which the data is derived. Extant algorithms for nanoscale cluster detection do not scale to large data sets. Here, a scalable, CUDA-based implementation of an autocorrelation algorithm is presented. It isolates spatial correlations amongst atomic clusters present in massive APT data sets in linear time using a linear amount of storage. Correctness of the algorithm is demonstrated using large synthetically generated data with known spatial distributions. Benefits and limitations of using GPU-acceleration for autocorrelation-based APT data analyses are presented with supporting performance results on data sets with up to billions of atoms. To our knowledge, this is the first nanoscale cluster detection algorithm that scales to massive APT data sets and executes on commodity hardware.

关键词： autocorrelation parallel algorithms atom probe tomography

来源：评论

学校读者我要写书评

暂无评论

Compactor : Optimization Framework at Staging I/O nodes 28

Compactor : Optimization Framework at Staging I/O nodes

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Venkatesan, Vishwanath Chaarawi, Mohamad Koziol, Quincey Gabriel, Edgar Univ Houston Dept Comp Sci Houston TX 77204 USA HDF Grp Champaign IL USA

ISBN: (纸本)9781479941162

Data-intensive applications are largely influenced by I/O performance on HPC systems and the scalability of such applications to exascale primarily depends on the scalability of the I/O performance on HPC systems in the future. To mitigate the I/O performance, recent HPC systems make use of staging nodes to delegate I/O requests and in-situ data analysis. In this paper, we present the Compactor framework and also present three optimizations to improve I/O performance at the data staging nodes. The first optimization performs collective buffering across requests from multiple processes. In the second optimization, we present a way to steal writes to service read request at the staging node. Finally, we also provide a way to "morph" write requests from the same process. All optimizations were implemented as a part of the Exascale FastForward I/O stack. We evaluated the optimizations over a PVFS2 file system using a micro-benchmark and Flash I/O benchmark. Our results indicate significant performance benefits with our framework. In the best case the compactor is able to provide up to 70% improvement in performance.

关键词： parallel I/O Staging I/O Optimizations Exascale FastForward I/O

来源：评论

学校读者我要写书评

暂无评论

Searching for the Optimal Data Partitioning Shape for parallel Matrix Matrix Multiplication on 3 Heterogenous Processors 28

Searching for the Optimal Data Partitioning Shape for Parall...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： DeFlumere, Ashley Lastovetsky, Alexey Univ Coll Dublin Sch Comp Sci & Informat Dublin 4 Ireland

ISBN: (纸本)9781479941162

parallel Matrix-Matrix Multiplication (MMM) is a fundamental part of the linear algebra libraries used by scientific applications on high performance computers. As heterogeneous systems have emerged as high performance computing platforms, the traditional homogeneous algorithms have been adapted to these heterogeneous environments. Although heterogeneous systems have been in use for some time, it remains an open problem of how to optimally partition data on heterogeneous processors to minimize computation, communication, and execution time. While the question of how to subdivide these MMM problems among heterogeneous processors has been studied, the underlying assumption of this prior study is that the data partition shape, the layout of the data within the matrix assigned to each processor, should be rectangular, i.e. that each processor should be assigned a rectangular portion of the matrix to compute. Our previous work in this area questioned the optimality of this traditional rectangular shape and studied this partition shape problem for two processors. In that work, we proposed a novel mathematical method for transforming partition shapes to decrease communication cost and an analytical technique for determining the optimal shape. In this work, we extend this technique to apply to three and more heterogeneous processors. While applying this method to two processors is relatively straightforward, the complexity grows immensely when considering three processors. With this complexity in mind, we propose a hybrid of experimental and analytical techniques. We postulate that a small number of partition shapes are potentially optimal, and perform extensive testing using a computer aided method to apply our previously developed analytical technique, without finding a counter example. We identified six data partition shapes which are candidates to be the optimal three processor shape.

关键词： parallel Matrix Multiplication Matrix Partitioning Heterogeneous Computing High Performance Computing

来源：评论

学校读者我要写书评

暂无评论

Online Monitoring System for Performance Fault Detection 28

Online Monitoring System for Performance Fault Detection

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Gioiosa, Roberto Kestor, Gokcen Kerbyson, Darren J. Pacific Northwest Natl Lab High Performance Comp Richland WA 99354 USA

ISBN: (纸本)9781479941162

To achieve the exaFLOPS performance within a contained power budget, next generation supercomputers will feature hundreds of millions of components operating at low-and near-threshold voltage. As the probability that at least one of these components fails during the execution of an application approaches certainty, it seems unrealistic to expect that any run of a scientific application will not experience some performance faults. We believe that there is need of a new generation of lightweight performance and debugging tools that can be used online even during production runs of parallel applications and that can identify performance anomalies during the application execution. In this work we propose the design and implementation of a monitoring system that continuously inspects the evolution of running applications and the health of the system. To achieve minimum runtime overhead while maintaining the desired level of flexibility, we propose a decoupled approach in which accurate monitoring is performed at kernel-level while performance anomaly disambiguation and corrective actions are performed at user-level. We evaluate our monitoring system on a 32-core AMD Interlagos compute node: First, we show that the runtime overhead of the monitoring system is negligible (0-2%). Then we show how our system can be used to precisely identify performance faults in two different scenarios. In the first, we inject OS noise while in the second we simulate the execution of a data analytics application next to a scientific simulation.

关键词： Exascale Operating system parallel applications Performance faults Reliability

来源：评论

学校读者我要写书评

暂无评论

High-Performance Zonal Histogramming on Large-Scale Geospatial Rasters Using GPUs and GPU-Accelerated Clusters 28

High-Performance Zonal Histogramming on Large-Scale Geospati...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Zhang, Jianting Wang, Dali CUNY City Coll Dept Comp Sci New York NY 10031 USA Oak Ridge Natl Lab Environm Sci Div Oak Ridge TN USA

ISBN: (纸本)9781479941162

Hardware Accelerators are playing increasingly important roles in achieving desired performance from desktop to cluster computing. While General Purpose computing on Graphics processing Units (GPGPU) technologies have been widely applied to computing intensive applications, there is relatively little work on using GPUs and GPU-accelerated clusters for data intensive computing that typically involves significant irregular data accesses. In this study, we report our designs and implementations of a popular geospatial operation called Zonal Histogramming on Nvidia GPUs. Given a zonal dataset in the form of a collection of polygons and a geospatial raster that can be considered as a 2D grid, for each polygon, Zonal Histogramming computes a histogram of the values of raster cells that fall within the polygon. Our experiments on 3000+ US counties (polygons) over 20+ billion NASA Shuttle Radar Topography Mission (SRTM) 30 meter resolution Digital Elevation Model (DEM) raster cells have shown that, an impressive 46 seconds end-to-end runtime can be achieved using a single Nvidia GTX Titan GPU device. The runtime is further reduced to similar to 10 seconds using 8 nodes on ORNL's Titan GPU-accelerated cluster. The desired high performance opens many possibilities for large-scale geospatial computing that is important for environmental and climate research.

关键词： Zonal Histogramming Geospatial Rasters Point-in-Polygon Test parallel Computing GPU

来源：评论

学校读者我要写书评

暂无评论

Prototyping the MBTAC processor for the REPLICA CMP 28

Prototyping the MBTAC processor for the REPLICA CMP

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Forsell, Martti Roivainen, Jussi Leppanen, Ville VTT Tech Res Ctr Finland Comp Platforms Team Box 1100 FI-90571 Oulu Finland Univ Turku Informat Technol Joukahaisenkatu 3-5 FI-20014 Turku Finland

ISBN: (纸本)9781479941162

Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous computational model of threads, often require maximizing the locality of memory references to get decent performance, and feature high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose computing from the very beginning like done in processors for configurable emulated shared memory (CESM) CMPs. They provide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchronous shared memory for easy variable allocation and convenient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multithreading, parallel computing savvy functional unit organization, support for fast synchronization between the instructions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked support for ordered multiprefix operations and full configurability of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC processor for our new REPLICA CESM architecture and the first FPGA implementations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.

关键词： parallel computing multithreded processor chaining FPGA prototype PRAM NUMA

来源：评论

学校读者我要写书评

暂无评论

parallel Heuristics for Scalable Community Detection 28

Parallel Heuristics for Scalable Community Detection

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Lu, Hao Kalyanaraman, Ananth Halappanavar, Mahantesh Choudhury, Sutanay Washington State Univ Sch Elect Engn & Comp Sci Pullman WA 99164 USA Pacific Northwest Natl Lab Computat Sci & Math Div Richland WA 99352 USA

ISBN: (纸本)9781479941162

Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number of iterations, while providing real speedups of up to 8x using 32 threads. In addition, our parallel implementation was able to exhibit weak scaling properties on up to 32 threads.

关键词： Community detection Graph coloring Louvain method parallel graph algorithms parallel heuristics

来源：评论

学校读者我要写书评

暂无评论

WECPAR: List Ranking Algorithm and Relative Computational Power 28

WECPAR: List Ranking Algorithm and Relative Computational Po...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： El-Boghdadi, Hatem M. Cairo Univ Fac Engn Dept Comp Engn Giza Egypt

ISBN: (纸本)9781479941162

Reconfigurable models were shown to be very powerful in solving many problems faster than non reconfigurable models. WECPAR W(M,N,k) is an M x N reconfigurable model that has point-to-point reconfigurable interconnection with k wires between neighboring processors. This paper studies several aspects of WECPAR. We first solve the list ranking problem on WECPAR. Some of the results obtained show that ranking one element in a list of N elements can be solved on W(N,N,N) WECPAR in O(1) time. Also, on W(N,N,k), ranking a list L(N) of N elements can be done in O((log N)( inverted right perpendicular log(k) (+1) N inverted left perpendicular )) time. To transfer a large body of algorithms to work on WECPAR and to assess its relative computational power, several simulations algorithms are introduced between WECPAR and well-known models such as PRAM and RMBM. Simulations algorithms show that a PRIORITY CRCW PRAM of N processors and S shared memory locations can be simulated by an W(S, N, k) WECPAR in O( inverted right perpendicular log(k) (+1) N inverted left perpendicular + inverted right perpendicular log S-k (+1) inverted left perpendicular ) time. Also, we show that a PRIORITY CRCW Basic-RMBM(P, B), of P processors and B buses can be simulated by an W(B, P+ B, k) WECPAR in O( inverted right perpendicular log(k) (+1) (P + B) inverted left perpendicular ) time. This has the effect of migrating a large number of algorithms to work directly on WECPAR with the simulation overhead.

关键词： parallel algorithms simulation algorithms list ranking

来源：评论

学校读者我要写书评

暂无评论

A distributed Speech Algorithm for Large Scale Data Communication Systems 28

A Distributed Speech Algorithm for Large Scale Data Communic...

引用

28th ieee International parallel & distributed processing symposium Workshops (IPDPSW)

作者： Xiong, Naixue Tong, Guoxiang Guo, Wenzhong Tan, Jian Wu, Guanning Hubei Univ Educ Hubei Coinnovat Ctr Basic Educ Technol Serv Sch Comp Sci Wuhan Hubei Peoples R China Univ Shanghai Sci & Technol Sch Optelect & Comp Engn Shanghai Key Lab Modern Opt Syst Shanghai Peoples R China Fuzhou Univ Coll Math & Comp Sci Fuzhou 350108 Peoples R China Univ Shanghai Sci & Technol Sch Opt Elect & Comp Engn Shanghai Peoples R China

ISBN: (纸本)9781479941162

Data-driven computing and using data for strategic advantages are exemplified by communication systems, and the speech intelligibility in communication systems is generally interrupted by interfering noise. This interference comes from the environmental noise, so we can reduce them intelligibility by masking the interested signal [1, 2]. An important work in communication systems is to extract speech from noisy speech and inhibiting background noise. In this paper, the subspace algorithm theory is introduced into a speech noise reduction system. We first analyze the principle of LMS adaptive speech noise reduction algorithm with the subspace algorithm, and then, we merge the subspace algorithm into the VS-LMS algorithm and propose a combined algorithm for an adaptive speech noise reduction system. Furthermore, we analyze the combined algorithm, which can decrease musical noise, as well as generate a suitable step-size factor to resolve the contradiction. This issue cannot be resolved by the current LMS algorithm [31], which has less convergence speed and larger residual noise than our system. Our simulation results demonstrate that our algorithm can get 3 to 10 times better than original algorithm in low SNR (-5 similar to 0db) and high SNR (0 similar to + 5db).

关键词： Adaptive Filter Subspace noise reduction algorithm Speech noise reduction system digital signal processing (DSP) LMS (least mean square)

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共355页 << < 34 35 36 37 38 39 40 41 42 43 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：