检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

2,699 篇 会议
58 册 图书
54 篇 期刊文献

馆藏范围

2,811 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,852 篇 工学
- 1,636 篇 计算机科学与技术...
- 847 篇 软件工程
- 342 篇 电气工程
- 222 篇 电子科学与技术（可...
- 216 篇 信息与通信工程
- 91 篇 控制科学与工程
- 63 篇 光学工程
- 58 篇 机械工程
- 47 篇 仪器科学与技术
- 39 篇 生物医学工程（可授...
- 38 篇 生物工程
- 31 篇 材料科学与工程（可...
- 27 篇 动力工程及工程热...
- 21 篇 化学工程与技术
- 20 篇 建筑学
- 17 篇 网络空间安全
- 15 篇 土木工程
- 13 篇 力学（可授工学、理...
506 篇 理学
- 343 篇 数学
- 115 篇 物理学
- 51 篇 系统科学
- 48 篇 生物学
- 32 篇 化学
- 30 篇 统计学（可授理学、...
177 篇 管理学
- 123 篇 管理科学与工程(可...
- 62 篇 图书情报与档案管...
- 49 篇 工商管理
44 篇 医学
- 30 篇 临床医学
- 14 篇 基础医学(可授医学...
15 篇 法学
- 15 篇 社会学
9 篇 经济学
9 篇 农学
8 篇 文学
2 篇 军事学
1 篇 教育学

主题

364 篇 parallel process...
219 篇 computer archite...
205 篇 graphics process...
146 篇 parallel archite...
136 篇 graphics process...
129 篇 hardware
116 篇 parallel algorit...
112 篇 image processing
99 篇 computational mo...
94 篇 concurrent compu...
87 篇 instruction sets
86 篇 field programmab...
83 篇 algorithm design...
79 篇 multicore proces...
77 篇 signal processin...
76 篇 parallel process...
66 篇 parallel program...
60 篇 throughput
60 篇 gpu
59 篇 kernel

机构

11 篇 natl univ def te...
6 篇 college of compu...
6 篇 school of comput...
6 篇 hosei univ dept ...
6 篇 natl univ def te...
5 篇 univ aizu dept c...
5 篇 carleton univ sc...
5 篇 school of comput...
5 篇 computer science...
5 篇 inria rennes
5 篇 city university ...
4 篇 chinese acad sci...
4 篇 univ michigan ad...
4 篇 institute of com...
4 篇 univ chinese aca...
4 篇 school of comput...
4 篇 univ jaume 1 dep...
4 篇 hainan internati...
4 篇 tech univ cluj n...
4 篇 department of co...

作者

11 篇 jack dongarra
10 篇 roman wyrzykowsk...
9 篇 konrad karczewsk...
9 篇 quintana-orti en...
7 篇 dongarra jack
7 篇 kothapalli kisho...
6 篇 hannig frank
6 篇 liu jie
6 篇 su jinshu
6 篇 nakano koji
6 篇 peng shietung
6 篇 li yamin
6 篇 chu wanming
6 篇 wyrzykowski roma...
6 篇 thulasiraman par...
5 篇 ito yasuaki
5 篇 jerzy waśniewski
5 篇 wang guojun
5 篇 geyong min
5 篇 wanlei zhou

语言

2,737 篇 英文
49 篇 其他
18 篇 中文
11 篇 俄文
2 篇 乌克兰文
1 篇 西班牙文

检索条件"任意字段=10th International Conference on Algorithms and Architectures for Parallel Processing"

共 2811 条记录，以下是1341-1350 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Asymmetry-aware scheduling in heterogeneous multi-core architectures

Asymmetry-aware scheduling in heterogeneous multi-core archi...

引用

10th IFIP international conference on Network and parallel Computing, NPC 2013

作者： Zhang, Tao Pan, Xiaohui Shu, Wei Wu, Min-You Shanghai Jiao Tong University Shanghai China University of New Mexico Albuquerque NM United States Shanghai University of Political Science and Law China

ISBN: (纸本)9783642408199

As threads of execution in a multi-programmed computing environment have different characteristics and hardware resource requirements, heterogeneous multi-core processors can achieve higher performance as well as power efficiency than homogeneous multi-core processors. To fully tap into that potential, OS schedulers need to be heterogeneity-aware, so they can match threads to cores according to characteristics of both. We propose two heterogeneity-aware thread schedulers, PBS and LCSS. PBS makes scheduling based on applications' sensitivity on large cores, and assigns large cores to applications that can achieve better performance gains. LCSS balances the large core resource among all applications. We have implemented these two schedulers in Linux and evaluated their performance with the PARSEC benchmark on different heterogeneous architectures. Overall, PBS outperforms Linux scheduler by 13.3% on average and up to 18%. LCSS achieves a speedup of 5.3% on average and up to 6% over Linux scheduler. Besides, PBS brings good performance with both asymmetric and symmetric workloads, while LCSS is more suitable for scheduling symmetric workloads. In summary, PBS and LCSS provide repeatability of performance measurement and better performance than the Linux OS scheduler. © 2013 IFIP international Federation for Information processing.

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

Accelerating all-to-All protein structures comparison with tmalign using a noc many-cores processor architecture

Accelerating all-to-All protein structures comparison with t...

引用

2013 IEEE 37th Annual Computer Software and Applications conference, COMPSAC 2013

作者： Sharma, Anuj Papanikolaou, Antonis Manolakos, Elias S. Department of Informatics and Telecommunications University of Athens Athens Greece Institute of Communication and Computer Systems National Technical University Athens Greece

ISBN: (纸本)9780769549798

Computational challenges for the one-to-many and many-to-many protein structure comparison (PSC) problem are a result of several factors: constantly expanding large-size structural proteomics databases, high computational complexity of pair wise protein comparison algorithms, and the multitude of pair wise comparison approaches used in the field. Advances in processor architectures, such as manycore CPUs, have enabled them to support parallelism making them of interest in speeding up PSC techniques. We presentrckAlign, an implementation of the popularly used TM-AlignPSC algorithm, designed for the Single-Chip Cloud Computer(SCC), an experimental processor created by Intel Labs. We developed a skeleton library, rckskel, and implemented amaster-slaves variant of TM-Align to exploit the parallelism offered by the SCC. We evaluated rckAlign on the SCC and compared it with existing TM-Align software running on a dualcore AMD CPU (2.4 GHz) and on a single-core Intel P54CPentium CPU (800 MHz). We observed an 11-fold speedup relatively to the former and a 44-fold speedup relatively to the latter. A key aspect of the performance of rckAlign on the SCC, is the almost linear speedup achieved with the number of SCC cores used as slaves. the method presented can easily be applied to other PSC algorithms and extended to running multiple PSC algorithms within the same SCC chip. © 2013 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

On the automatic generation of GPU-oriented software applications from RTL IPs

On the automatic generation of GPU-oriented software applica...

引用

11th ACM/IEEE international conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2013

作者： Bombieri, Nicola Fummi, Franco Vinco, Sara Dipartimento di Informatica Università di Verona Italy

ISBN: (纸本)9781479914173

Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). the good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. this paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). the methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. the experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator. © 2013 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Comparative analysis of DSP and intel processors by implementing FFTs using parallel programming for underwater signal processing applications

Comparative analysis of DSP and intel processors by implemen...

引用

international Bhurban conference on Applied Sciences & Technology, IBCAST

作者： Umar Hamid Haroon Shahzad Muhammad Irfan Centres of Excellence in Science and Applied Technologies Islamabad Pakistan

ISBN: (纸本)9781467344258

this paper presents Fast Fourier Transform (FFT) benchmark results to measure and compare the performance of various DSP and Intel processors for underwater signal processing applications. this paper aims to show performance enhancement in Intel processors as compared to DSP processors by using parallel programming for implementing signal processing functions in real time. this paper provides results that show a significant decrease in FFT execution time on an Intel based Multicore processor using parallel programming. therefore comparative analysis among different processor architectures presented in this paper will help the system designers in selecting an optimal processor for underwater signal processing applications.

关键词： Digital signal processing Program processors Clocks Multicore processing Benchmark testing Data acquisition Antennas

来源：评论

学校读者我要写书评

暂无评论

Programming real-time image processing for manycores in a high-level language

Programming real-time image processing for manycores in a hi...

引用

10th international Symposium on Advanced parallel processing Technologies, APPT 2013

作者： Gebrewahid, Essayas Zain-Ul-Abdin Svensson, Bertil Gaspes, Veronica Jego, Bruno Lavigueur, Bruno Robart, Mathieu Center for Research on Embedded Systems Halmstad University Halmstad Sweden STMicroelectronics - Advanced System Technology Grenoble France STMicroelectronics - Advanced System Technology Bristol United Kingdom

ISBN: (纸本)9783642452925

Manycore architectures are gaining attention as a means to meet the performance and power demands of high-performance embedded systems. However, their widespread adoption is sometimes constrained by the need for mastering proprietary programming languages that are low-level and hinder portability. We propose the use of the concurrent programming language occam-pi as a high-level language for programming an emerging class of manycore architectures. We show how to map occam-pi programs to the manycore architecture Platform 2012 (P2012). We describe the techniques used to translate the salient features of the language to the native programming model of the P2012. We present the results from a case study on a representative algorithm in the domain of real-time image processing: a complex algorithm for corner detection called Features from Accelerated Segment Test (FAST). Our results show that the occam-pi program is much shorter, is easier to adapt and has a competitive performance when compared to versions programmed in the native programming model of P2012 and in OpenCL. © 2013 Springer-Verlag.

关键词： High level languages

来源：评论

学校读者我要写书评

暂无评论

(Very) fast (all) k-nearest neighbors in metric and non metric spaces without indexing

(Very) fast (all) k-nearest neighbors in metric and non metr...

引用

6th international conference on Similarity Search and Applications, SISAP 2013

作者： Miranda, Natalia Chávez, Edgar Piccoli, María Fabiana Reyes, Nora Universidad Nacional de San Luis Argentina Universidad Nacional Autónoma de México Mexico

ISBN: (纸本)9783642410611

Proximity queries consists in retrieving objects near a given query. To avoid a brute force scan over a large database, an index can be used. However, for some problems, indexes are mostly useless (their running times are worst than sequential scan). On the other hand, researchers have tried massively parallel hardware (as GPGPU) in the quest of faster query times. the results have been modest because current algorithms are cumbersome, while GPGPU architectures favor simple kernels, have a clear memory hierarchy and need close to zero cross-talk between processing units. We have engineered very fast algorithms for proximity queries taking into account this principles, all of them are presented in this paper. In our approach no index is built, the cross-talk between threads is eliminated, and the higher (faster) levels of memory hierarchy are consistently used. the absence of data structures allows to use all the available memory for the database, and furthermore makes possible to do stream processing on very large data collections. © 2013 Springer-Verlag.

关键词： Query processing

来源：评论

学校读者我要写书评

暂无评论

Automatic Generation of Communications for Redundant Multi-dimensional Data parallel Redistributions

Automatic Generation of Communications for Redundant Multi-d...

引用

IEEE international conference on High Performance Computing and Communications (HPCC)

作者： Corinne Ancourt Teodora Petrisor Francois Irigoin Eric Lenormand MINES ParisTech CRI Fontainebleau France THALES Thales Research and Technology Palaiseau Cedex

In this paper we concentrate on embedded parallel architectures with heterogeneous memory management systems combining shared and local memories, and more precisely we focus on efficient data communications between the various architecture parts. We formulate explicit data transfers in a polyhedral context and give several strategies for managing efficient communications for redundantly stored/read data. this allows automatic DMA-style code generation for a variety of data mappings onto parallel processing elements. Our approach is validated on a wide series of data redistribution examples linked with a domain-specific parallelisation framework developed in thales, SpearDE. We give the solution for efficient data transfers mathematically as well as under the form of generated C code.

关键词： Arrays Data transfer parallel processing Distributed databases Program processors Tiles

来源：评论

学校读者我要写书评

暂无评论

Character of graph analysis workloads and recommended solutions on future parallel systems

Character of graph analysis workloads and recommended soluti...

引用

13th international conference on algorithms and architectures for parallel processing, ICA3PP 2013

作者： Tanabe, Noboru Tomimori, Sonoko Takata, Masami Joe, Kazuki Toshiba Research and Development Center Kawasaki Kanagawa 212-8582 Japan Nara Women's University Nara Nara 630-8506 Japan

ISBN: (纸本)9783319038582

Graph500 is a benchmark suite for big data analysis. Matrices used for Graph500 inherit the properties of graph analysis such as breadth first search for SNS and PageRank for web searching engine. Especially power saving is very important for its execution on future massively parallel processors and clouds. the spatial locality of sparse matrices used for Graph500 and its behaviors on cache memory are investigated. the experimental results show the spatial locality of sparse matrices used for Graph500 is very low. It is very difficult to solve the problem by just software approach because of the huge size and the randomness of their accesses. therefore, we recommend hardwired scatter/gather functions at memory side. they improve the processing speed in an order of magnitude. For achieving both of low power and high throughput of random access, we recommend implementing hardwired scatter/gather functions on logic-base in Hybrid Memory Cube (HMC). We also describe brief considerations of the power saving in the case of low cache hit rate application such as graph500. For example, when the hit rate is 15%, the power saving ratio of memory access is about 30-fold. © Springer international Publishing Switzerland 2013.

关键词： Cache memory

来源：评论

学校读者我要写书评

暂无评论

Multi-core Computation of Transfer Matrices for Strip Lattices in the Potts Model

Multi-core Computation of Transfer Matrices for Strip Lattic...

引用

IEEE international conference on High Performance Computing and Communications (HPCC)

作者： Cristobal A. Navarro Nancy Hitschfeld Fabrizio Canfora Department of Computer Science Universidad de Chile Santiago Valdivia Chile Centro de Estudios Cientifícos (CECs) Valdivia Chile Department of Computer Science Universidad de Chile Santiago Chile

the transfer-matrix technique is a convenient way for studying strip lattices in the Potts model since the computational costs depend just on the periodic part of the lattice and not on the whole. However, even when the cost is reduced, the transfer-matrix technique is still an NP-hard problem since the time T (|V |, |E|) needed to compute the matrix grows exponentially as a function of the graph width. In this work, we present a parallel transfer-matrix implementation that scales performance under multi-core architectures. the construction of the matrix is based on several repetitions of the deletion-contraction technique, allowing parallelism suitable to multi-core machines. Our experimental results show that the multi-core implementation achieves speedups of 3.7X with p = 4 processors and 5.7X with p = 8. the efficiency of the implementation lies between 60% and 95%, achieving the best balance of speedup and efficiency at p = 4 processors for actual multi-core architectures. the algorithm also takes advantage of the lattice symmetry, making the transfer matrix computation to run up to 2X faster than its non-symmetric counterpart and use up to a quarter of the original space.

关键词： Lattices Strips Program processors Computational modeling Partitioning algorithms parallel processing Buildings

来源：评论

学校读者我要写书评

暂无评论

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

A Fine-Grained Pipelined Implementation of LU Decomposition ...

引用

10th IFIP international conference on Network and parallel Computing (NPC)

作者： Zhang, Kai Chen, ShuMing Liu, Wei Ning, Xi Natl Univ Def Technol Sch Comp Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642408199;9783642408205

the LU decomposition is a widely used method to solve the dense linear algebra in many scientific computation applications. In recent years, the single instruction multiple data (SIMD) technology has been a popular method to accelerate the LU decomposition. However, the pipeline parallelism and memory bandwidth utilization are low when the LU decomposition mapped onto SIMD processors. this paper proposes a fine-grained pipelined implementation of LU decomposition on SIMD processors. the fine-grained algorithm well utilizes data dependences of the native algorithm to explore the fine-grained parallelism among all the computation resources. By transforming the non-coalesced memory access to coalesced version, the proposed algorithm can achieve the high pipeline parallelism and the high efficient memory access. Experimental results show that the proposed technology can achieve a speedup of 1.04x to 1.82x over the native algorithm and can achieve about 89% of the peak performance on the SIMD processor.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共282页 << < 131 132 133 134 135 136 137 138 139 140 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：