检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

3,174 篇 会议
72 篇 期刊文献
65 册 图书

馆藏范围

3,310 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

2,346 篇 工学
- 2,065 篇 计算机科学与技术...
- 1,041 篇 软件工程
- 415 篇 电气工程
- 329 篇 信息与通信工程
- 311 篇 电子科学与技术（可...
- 114 篇 控制科学与工程
- 69 篇 机械工程
- 67 篇 光学工程
- 67 篇 生物工程
- 62 篇 生物医学工程（可授...
- 36 篇 动力工程及工程热...
- 33 篇 仪器科学与技术
- 32 篇 材料科学与工程（可...
- 32 篇 建筑学
- 30 篇 化学工程与技术
- 24 篇 土木工程
- 21 篇 力学（可授工学、理...
726 篇 理学
- 485 篇 数学
- 174 篇 物理学
- 80 篇 生物学
- 65 篇 系统科学
- 61 篇 统计学（可授理学、...
- 37 篇 化学
249 篇 管理学
- 161 篇 管理科学与工程(可...
- 102 篇 图书情报与档案管...
- 71 篇 工商管理
64 篇 医学
- 53 篇 临床医学
- 21 篇 基础医学(可授医学...
22 篇 法学
- 20 篇 社会学
22 篇 农学
- 19 篇 作物学
16 篇 经济学
12 篇 文学
11 篇 教育学
4 篇 军事学

主题

329 篇 parallel process...
204 篇 computer archite...
203 篇 graphics process...
158 篇 parallel archite...
135 篇 parallel process...
123 篇 parallel algorit...
121 篇 graphics process...
115 篇 hardware
113 篇 image processing
86 篇 concurrent compu...
86 篇 computational mo...
77 篇 signal processin...
72 篇 parallel program...
72 篇 field programmab...
69 篇 multicore proces...
68 篇 instruction sets
67 篇 parallel computi...
65 篇 algorithm design...
58 篇 throughput
57 篇 gpu

机构

9 篇 college of compu...
9 篇 natl univ def te...
8 篇 carleton univ sc...
8 篇 national laborat...
6 篇 hosei univ dept ...
6 篇 inria rennes
6 篇 st francis xavie...
5 篇 chinese acad sci...
5 篇 univ aizu dept c...
5 篇 polish japanese ...
5 篇 computer science...
5 篇 college of compu...
5 篇 city university ...
4 篇 shanghai jiao to...
4 篇 charles univ pra...
4 篇 rwth aachen univ...
4 篇 hainan internati...
4 篇 department of co...
4 篇 university of ch...
4 篇 universidad carl...

作者

11 篇 jack dongarra
10 篇 roman wyrzykowsk...
8 篇 dongarra jack
7 篇 liu jie
7 篇 konrad karczewsk...
7 篇 quintana-orti en...
6 篇 hannig frank
6 篇 li dongsheng
6 篇 teich juergen
6 篇 li chao
6 篇 nakano koji
6 篇 peng shietung
6 篇 li yamin
6 篇 chu wanming
6 篇 krulis martin
5 篇 zhang lei
5 篇 ito yasuaki
5 篇 li kenli
5 篇 wanlei zhou
5 篇 tudruj marek

语言

3,216 篇 英文
83 篇 其他
20 篇 中文

检索条件"任意字段=5th International Conference on Algorithms and Architectures for Parallel Processing"

共 3311 条记录，以下是661-670 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

A 5G-code based iterative Non-Binary LDPC decoder

A 5G-code based iterative Non-Binary LDPC decoder

引用

IEEE international conference on Electronics, Circuits and Systems (ICECS)

作者： Dimitris Chytas Vassilis Paliouras University of Patras Patras Greece

this paper proposes an iterative Non-Binary LDPC (NB-LDPC) decoder for non-binary codes constructed using the 5G base matrices. Motivated by the binary to non-binary replacement method, we construct NB-LDPC matrices devised directly from the 5G base matrices. Subsequently, we develop an iterative decoding scheme able to facilitate parallelism due to its low complexity and to offer high performance due to its fast convergence. BER plots comparing Min-sum binary decoder (over 5G base matrices) to our proposed NB decoder reveal a performance gain of 0.5 dB in certain cases. Furthermore, hardware synthesis results obtained for a 45-nm ASIC technology are provided in order to quantify the throughput rate and area requirements of the proposed architecture. It is shown that the proposed decoding architecture, because of its independence on the lifting size factor, can offer higher throughput rate than the binary ones for small codeword lengths and code rates. In addition, as Galois Field (GF) order increases, the throughput rate increases too. Finally, the throughput-to-area show that the proposed NB architecture is generally suitable for small lifting size factors.

关键词： 5G mobile communication Performance gain parallel processing throughput Hardware Decoding parallel architectures

来源：评论

学校读者我要写书评

暂无评论

parallel Searching on Biological Networks 27

Parallel Searching on Biological Networks

引用

27th Euromicro international conference on parallel, Distributed and Network-Based processing (PDP)

作者： Bombieri, Nicola Bonnici, Vincenzo Giugno, Rosalba Univ Verona Dipartimento Informat Strata Grazie 15 I-37134 Verona Italy

ISBN: (纸本)9781728116440

Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even though very efficient algorithms have been defined to solve such a subgraph isomorphisms problem, the complexity of current real biological networks make their sequential execution time prohibitive. On the other hand, parallel architectures, from multi-core to many-core, have become pervasive to deal with the problem of the data size. Nevertheless, the sequential nature of the graph searching algorithms makes their implementation for parallel architectures very challenging. this paper presents three different parallel solutions for the graph searching problem. the first two target the exact search for multi-core CPUs and many-core GPUs, respectively. the third one targets the approximate search for GPUs, which handles node, edge, and node label mismatches. the paper shows how different techniques have been developed in all the solutions to reduce the search space complexity. the paper shows the performance of the proposed solutions on representative biological networks containing antiviral chemical compounds and protein interactions networks.

关键词： Biology Search problems Complexity theory Graphics processing units Indexes Topology

来源：评论

学校读者我要写书评

暂无评论

Research on low delay audio convolution algorithm combining mobile and cloud 21

Research on low delay audio convolution algorithm combining ...

引用

21st IEEE international conference on High Performance Computing and Communications, 17th IEEE international conference on Smart City and 5th IEEE international conference on Data Science and Systems, HPCC/SmartCity/DSS 2019

作者： Cai, Xilong Hu, Wei Wang, Yonghao College OfComputer Science and Technology Wuhan University of Science and Technology China Hubei Province Key Laboratory of Intelligent Information Processing Real-time Industrial System Wuhan China DMT Lab Birmingham City University Birmingham United Kingdom

ISBN: (纸本)9781728120584

Nowadays, intelligent mobile devices become most wide mobile multimedia terminals. But limited by performance and battery life, traditional audio architectures and algorithms cannot meet increasingly complex processing requirements. In this paper, the traditional audio convolution algorithm and audio architecture are analyzed. Based on the optimization of the block convolution algorithm, and combined with the cloud GPU, a dynamic task allocation low latency convolution algorithm is proposed under cloud hybrid systems. the experimental results show that the convolution algorithm can effectively reduce the workload of the mobile terminal, reduce the CPU usage of the mobile terminal, and realize the real-time audio mixing processing in complex scenes on mobile terminals with limited performance. © 2019 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

A parallel implementation of an XDraw viewshed algorithm with spark 21

A parallel implementation of an XDraw viewshed algorithm wit...

引用

作者： Jianbo, Zhang Caikun, Chen Tingnan, Liang Hao, Xia Simin, Zhou Faculty of Information Engineering China University of Geosciences Wuhan China

ISBN: (纸本)9781728120584

Viewshed analysis is an indispensable part of digital terrain analysis and is widely used in many application domains. High-resolution raster DEM data bring significant computational challenges to the existing viewshed analysis algorithms, which are computationally intensive and require a large memory space and massive computing power. the viewshed analysis process can be accelerated through the use of Apache Spark. In this article, we present both a tile-based raster data storing strategy and an equal-volume computing strategy for distributed viewshed computation using Spark. the parallel implementation of the XDraw algorithm mainly consists of three parts: (1) partitioning a raster DEM file into square tile sets and reorganizing these tile sets to prevent tile overlap across data divisions of HDFS, (2) subdividing the DEM into multiple equal-volume data sectors according to the viewpoint position, and (3) retrieving the corresponding tile sets of each sector to perform the XDraw algorithm independently and efficiently. Experiments with realworld datasets show that the two proposed strategies can achieve higher speed-up and efficiency for XDraw viewshed analysis as the raster DEM data volume is dramatically increased. © 2019 IEEE.

关键词： Sparks Partitioning algorithms Distributed databases Graphics processing units Geology parallel processing Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

3D Coded SUMMA: Communication-Efficient and Robust parallel Matrix Multiplication 26th

3D Coded SUMMA: Communication-Efficient and Robust Parallel ...

引用

26th international conference on parallel and Distributed Computing (Euro-Par)

作者： Jeong, Haewon Yang, Yaoqing Gupta, Vipul Engelmann, Christian Low, Tze Meng Cadambe, Viveck Ramchandran, Kannan Grover, Pulkit Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Calif Berkeley Berkeley CA USA Oak Ridge Natl Lab Oak Ridge TN USA Penn State Univ State Coll PA USA

ISBN: (纸本)9783030576752;9783030576745

In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. this work bridges the gap between recent developments in coded computing and fault-tolerance in high-performance computing (HPC). the core idea of coded computing is the same as algorithm-based fault-tolerance (ABFT), which is weaving redundancy in the computation using error-correcting codes. In particular, we show that MatDot codes, an innovative code construction for parallel matrix multiplications, can be integrated into three-dimensional SUMMA (Scalable Universal Matrix Multiplication Algorithm [30]) in a communication-avoiding manner. To tolerate any two node failures, the proposed 3D Coded SUMMA requires similar to 50% less redundancy than replication, while the overhead in execution time is only about 5-10%.

关键词： parallel matrix multiplication Fault-tolerant algorithms Algorithm-based fault tolerance Coded computing Communication-efficient algorithms Error detection and correction

来源：评论

学校读者我要写书评

暂无评论

Efficient Implementation of Kyber on Mobile Devices 27

Efficient Implementation of Kyber on Mobile Devices

引用

27th IEEE international conference on parallel and Distributed Systems, ICPADS 2021

作者： Zhao, Lirui Zhang, Jipeng Huang, Junhao Liu, Zhe Hancke, Gerhard Nanjing University of Aeronautics and Astronautics Jiangsu China Beijing Normal University-Hong Kong Baptist University United International College Guangdong China State Key Laboratory of Cryptology Beijing100878 China City University of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9781665408783

Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation. © 2021 IEEE.

关键词： Neon

来源：评论

学校读者我要写书评

暂无评论

parallel hybrid join algorithm on GPU 21

Parallel hybrid join algorithm on GPU

引用

作者： Guo, Chengxin Chen, Hong Zhang, Feng Li, Cuiping Key Lab of Data Engineering and Knowledge Engineering Ministry of Education School of Information Renmin University of China Beijing China

ISBN: (纸本)9781728120584

In data analytics applications, join is a general and time consuming operation. Optimizing join algorithms can benefit the query processing significantly. the emerging of GPUs provides a massive parallelism solution for improving the performance of the join operation. the hash join (HJ) and sort merge join (SMJ), which are two widely used join algorithms, have been proved effective for efficient join processing on the GPUs. Both algorithms have their own advantages and drawbacks, offering the chance of combining the advantages of HJ and SMJ on GPUs. In processing join operation on GPUs, data need to be transmitted between the CPU and the GPU due to the discrete GPU memory design, which causes performance degradation because of the high PCIe data transfer overhead. As GPUs are becoming more powerful than before, the performance gap between data transmission and GPU execution increases, which makes it even harder to implement an efficient join on GPUs. In this paper, we focus on the optimization of join algorithms on GPUs. We propose the parallel Hybrid Join algorithm on GPUs(PHYJ) to combine the advantages of HJ and SMJ, and overlap the data communication and GPU execution with a pipeline mechanism. In our evaluation, the PHYJ shows up to 1.72X and 1.55X speedup over the up-to-date HJ and SMJ algorithms respectively on a NVIDIA GTX 1080ti-Pascal GPU. On the TitanV-Volta GPU, up to 1.54X and 1.42X improvements can be achieved over the baseline HJ and SMJ algorithms respectively. © 2019 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Controlled Asynchronous GVT: Accelerating parallel Discrete Event Simulation on Many-Core Clusters 19

Controlled Asynchronous GVT: Accelerating Parallel Discrete ...

引用

48th international conference on parallel processing (ICPP)

作者： Eker, Ali Williams, Barry Chiu, Kenneth Ponomarev, Dmitry Binghamton Univ Binghamton NY 13902 USA

ISBN: (纸本)9781450362955

In this paper, we investigate the performance of parallel Discrete Event Simulation ( PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) algorithms in this environment and contribute three significant results. First, we show that it is essential to isolate the thread performing MPI communications from the task of processing simulation events, otherwise the simulation is significantly imbalanced and performs poorly. this applies to both synchronous and asynchronous GVT algorithms. Second, we demonstrate that synchronous GVT algorithm based on barrier synchronization is a better choice for communication-dominated models, while asynchronous GVT based on Mattern's algorithm performs better for computation-dominated scenarios. third, we propose Controlled Asynchronous GVT (CA-GVT) algorithm that selectively adds synchronization to Mattern-style GVT based on simulation conditions. We demonstrate that CA-GVT outperforms both barrier and Mattern's GVT and achieves about 8% performance improvement on mixed computation-communication models. this is a reasonable improvement for a simple modification to a GVT algorithm.

关键词： parallel Discrete Event Simulation Intel Xeon Phi Knights Landing Manycore architectures Performance Global Virtual Time

来源：评论

学校读者我要写书评

暂无评论

G-match: A fast GPU-friendly data compression algorithm 21

G-match: A fast GPU-friendly data compression algorithm

引用

作者： Lu, Li Hua, Bei School of Computer Science and Technology University of Science and Technology of China Hefei Anhui China

ISBN: (纸本)9781728120584

Data compression plays an important role in the era of big data;however, such compression is typically one of the bottlenecks of a massive data processing system due to intensive computing and memory access. In this paper, we propose a high-speed GPU-friendly data compression algorithm called G-match that takes full advantage of the GPU parallel computing power to speed up the compression process. the greatest challenge here is to solve the contradiction between the high data dependency inherent in the compression algorithm and the GPU single-instruction multiple-thread operating model. G-match achieves a high parallel degree by eliminating fine-grained data dependency and all path divergences in the algorithm. Compared with other, similar work on GPUs, G-match is the first thoroughly parallelized data compression algorithm. Experiments comparing other GPU compression algorithms show that G-match achieves approximately 33% speedup over the current fastest implementation and the highest compression ratio. © 2019 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Language-Based Optimization of Recursive Computations 13th

Parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Langu...

引用

13th international conference on parallel processing and Applied Mathematics, PPAM 2019

作者： Stpiczyński, Przemyslaw Institute of Computer Science Maria Curie–Sklodowska University Akademicka 9/519 Lublin20-033 Poland

ISBN: (纸本)9783030432218

the aim of this paper is to present a new high-performance implementation of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose a new algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. We also show how the use of intrinsics for Intel AVX2 and AVX512 vector extensions can improve the performance. Our new implementation achieves good performance on several multicore architectures and it is much more energy-efficient than simple SIMD-optimized implementations. © 2020, Springer Nature Switzerland AG.

关键词： Number theory

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共332页 << < 63 64 65 66 67 68 69 70 71 72 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：