检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,961 篇 会议
401 篇 期刊文献
287 册 图书
1 篇 学位论文

馆藏范围

24,650 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

14,498 篇 工学
- 13,368 篇 计算机科学与技术...
- 5,952 篇 软件工程
- 2,509 篇 电气工程
- 2,171 篇 信息与通信工程
- 916 篇 控制科学与工程
- 532 篇 电子科学与技术（可...
- 412 篇 机械工程
- 327 篇 生物工程
- 261 篇 动力工程及工程热...
- 219 篇 仪器科学与技术
- 207 篇 生物医学工程（可授...
- 155 篇 材料科学与工程（可...
- 152 篇 力学（可授工学、理...
- 150 篇 建筑学
- 147 篇 土木工程
- 123 篇 网络空间安全
- 117 篇 环境科学与工程（可...
- 112 篇 交通运输工程
2,956 篇 理学
- 2,034 篇 数学
- 449 篇 物理学
- 397 篇 生物学
- 343 篇 系统科学
- 313 篇 统计学（可授理学、...
- 145 篇 化学
2,037 篇 管理学
- 1,501 篇 管理科学与工程(可...
- 712 篇 工商管理
- 667 篇 图书情报与档案管...
249 篇 医学
- 190 篇 临床医学
- 111 篇 基础医学(可授医学...
173 篇 经济学
- 172 篇 应用经济学
153 篇 法学
85 篇 农学
82 篇 教育学
41 篇 文学
11 篇 军事学
8 篇 艺术学

主题

2,975 篇 distributed comp...
1,756 篇 parallel process...
1,703 篇 concurrent compu...
1,627 篇 cloud computing
1,228 篇 computational mo...
1,084 篇 computer archite...
948 篇 grid computing
932 篇 computer science
791 篇 application soft...
753 篇 computer network...
615 篇 scalability
519 篇 distributed data...
515 篇 algorithm design...
513 篇 hardware
500 篇 peer to peer com...
495 篇 parallel algorit...
487 篇 high performance...
451 篇 software enginee...
450 篇 parallel computi...
437 篇 parallel program...

机构

52 篇 university of ch...
41 篇 college of compu...
39 篇 institute of com...
37 篇 college of intel...
36 篇 department of co...
31 篇 department of co...
29 篇 school of comput...
28 篇 national laborat...
28 篇 natl univ def te...
27 篇 school of comput...
24 篇 school of comput...
24 篇 shandong provinc...
24 篇 institute of inf...
23 篇 institute of par...
22 篇 univ chinese aca...
21 篇 university of sc...
21 篇 school of comput...
20 篇 shanghai jiao to...
20 篇 department of co...
20 篇 school of comput...

作者

35 篇 li dongsheng
30 篇 liu jie
29 篇 dongsheng li
27 篇 duerr frank
26 篇 m. takizawa
26 篇 rajkumar buyya
26 篇 zomaya albert y.
24 篇 fahringer thomas
21 篇 jack dongarra
19 篇 yang yang
18 篇 wei li
18 篇 prodan radu
17 篇 li kenli
17 篇 wang guojun
17 篇 badia rosa m.
16 篇 liu yang
16 篇 p. banerjee
16 篇 dou yong
15 篇 lei wang
15 篇 xuejun yang

语言

23,957 篇 英文
592 篇 其他
89 篇 中文
17 篇 俄文
2 篇 土耳其文
2 篇 乌克兰文
1 篇 德文
1 篇 西班牙文
1 篇 法文

检索条件"任意字段=International Conference on Advances in Parallel and Distributed Computing"

共 24650 条记录，以下是831-840 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

PiP-MColl: Process-in-Process-based Multi-object MPI Collectives 25

PiP-MColl: Process-in-Process-based Multi-object MPI Collect...

引用

25th IEEE international conference on Cluster computing (CLUSTER)

作者： Huang, Jiajun Ouyang, Kaiming Zhai, Yujia Liu, Jinyang Si, Min Raffenetti, Ken Zhou, Hui Hori, Atsushi Chen, Zizhong Guo, Yanfei Thakur, Rajeev Univ Calif Riverside Riverside CA 92521 USA NVIDIA Corp Santa Clara CA USA Meta Platforms Inc Cambridge MA USA Argonne Natl Lab Argonne IL USA Natl Inst Informat Chiyoda City Tokyo Japan

ISBN: (纸本)9798350307924

In the era of exascale computing, the adoption of a large number of CPU cores and nodes by high-performance computing (HPC) applications has made MPI collective performance increasingly crucial. As the number of cores and nodes increases, the importance of optimizing MPI collective performance becomes more evident. Current collective algorithms, including kernel-assisted inter-process data exchange techniques and data sharing based shared-memory approaches, are prone to significant performance degradation due to the overhead of system calls and page faults or the cost of extra data-copy latency. These issues can negatively impact the efficiency and scalability of HPC applications. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Interprocess MPI Collective design that maximizes small message MPI collective performance at scale. We also present specific designs to boost the performance for larger messages, such that we observe a comprehensive improvement for a series of message sizes beyond small messages. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Processin-Process shared memory techniques to eliminate unnecessary system call, page fault overhead and extra data copy, which results in improved intra- and inter-node message rate and throughput. Experimental results demonstrate that PiP-MColl significantly outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for the MPI collectives MPI Scatter, MPI Allgather, and MPI Allreduce.

关键词： MPI Collective Message Passing Interface Process-in-Process parallel Algorithms distributed Systems

来源：评论

学校读者我要写书评

暂无评论

Benchmarking the parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

Benchmarking the Parallel 1D Heat Equation Solver in Chape...

引用

international workshops held at the 29th international conference on parallel and distributed computing, Euro-Par 2023

作者： Diehl, Patrick Morris, Max Brandt, Steven R. Gupta, Nikunj Kaiser, Hartmut Center of Computation and Technology Louisiana State University Baton Rouge United States Department of Physics and Astronomy Louisiana State University Baton Rouge United States Department of Computer Science Louisiana State University Baton Rouge United States Seattle United States

ISBN: (纸本)9783031488023

Many scientific high performance codes that simulate e.g. black holes, coastal waves, climate and weather, etc. rely on block-structured meshes and use finite differencing methods to solve the appropriate systems of differential equations iteratively. This paper investigates implementations of a straightforward simulation of this type using various programming systems and languages. We focus on a shared memory, parallelized algorithm that simulates a 1D heat diffusion using asynchronous queues for the ghost zone exchange. We discuss the advantages of the various platforms and explore the performance of this model code on different computing architectures: Intel, AMD, and ARM64FX. As a result, Python was the slowest of the set we compared. Java, Go, Swift, and Julia were the intermediate performers. The higher performing platforms were C++, Rust, Chapel, Charm++, and HPX. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

关键词： Python

来源：评论

学校读者我要写书评

暂无评论

Matching Program Implementations and Heterogeneous computing Systems 22nd

Matching Program Implementations and Heterogeneous Computing...

引用

22nd international conference on parallel and distributed computing, Applications and Technologies (PDCAT 2021)

作者： Sandrieser, Martin Benkner, Siegfried Univ Vienna Fac Comp Sci Res Grp Sci Comp Vienna Austria

ISBN: (纸本)9783030967727;9783030967710

High performance computing (HPC) systems have become highly parallel aggregations of heterogeneous system elements. Different kinds of processors, memory regions, interconnects and software resources constitute the modern HPC computing platform. This makes software development and efficient program execution a challenging task. Previously, we have developed a platform description framework for describing multiple aspects of computing platforms. It enables tools and users to better cope with the complexities of heterogeneous platforms in a programming model and system independent way. In this paper we present how our platform model can be used to describe program implementation variants that utilize different parallel programming models. We show that by matching platform models of program implementations to descriptions of a concrete heterogeneous system we can increase overall resource utilization. In addition, we show that our model featuring control relationships brings significant performance gains for finding platform patterns within a commonly used heterogeneous compute cluster configuration.

关键词： Modeling Platform Heterogeneous computing

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of DNA Cryptography Encryption Scheme using MPI and CUDA 1

Parallel Implementation of DNA Cryptography Encryption Schem...

引用

1st IEEE international conference on advances in Electronics, Communication, computing and Intelligent Information Systems, ICAECIS 2023

作者： Kamath, Navaneeth N Manjunath, K.N. Roopalakshmi, R. Rana, Akhil Nayak, Ashalatha Neelima, B. Manipal Institute of Technology Manipal Academy of Higher Education Department of Computer Science and Engineering Karnataka Manipal576104 India

ISBN: (纸本)9798350348057

There is a great requirement for cryptographic systems to secure shared or transmitted data over the internet. Periodic modification of the currently used cryptographic scheme to encrypt the data is also suggested to keep data transmission more secure. Along with the complexity of cracking the cryptosystem, it is expected that the overall encryption or decryption process should be less time-consuming to use in real time for data transmission over the internet. In this paper, we propose a DNA (Deoxy-Ribose-Nucleic-Acid) Cryptography and Substitution method-based encryption scheme to secure the text data which takes O(n) time to complete the encryption process of text data containing n characters and it is implemented using MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture) programming. The computation time of the sequential C program, MPI program, and CUDA program were analyzed on different data. © 2023 IEEE.

关键词： Cryptography

来源：评论

学校读者我要写书评

暂无评论

Joint Partitioning and Sampling Algorithm for Scaling Graph Neural Network 29

Joint Partitioning and Sampling Algorithm for Scaling Graph ...

引用

29th Annual IEEE international conference on High Performance computing, Data, and Analytics (HiPC)

作者： Das, Manohar Lal Jatala, Vishwesh Gupta, Gagan Raj Indian Inst Technol Bhilai Dept EECS Raipur Madhya Pradesh India

ISBN: (纸本)9781665494236

Graph Neural Network (GNN) has emerged as a popular toolbox for solving complex problems on graph data structures. Graph neural networks use machine learning techniques to learn the vector representations of nodes and/or edges. Learning these representations demands a huge amount of memory and computing power. The traditional shared-memory multiprocessors are insufficient to meet real-world data's computing requirements;hence, research has gained momentum toward distributed GNN. Scaling the distributed GNN has the following challenges: (1) the input graph needs to be efficiently partitioned, (2) the cost of communication between compute nodes should be reduced, and (3) the sampling strategy should be efficiently chosen to minimize the loss in accuracy. To address these challenges, we propose a joint partitioning and sampling algorithm, which partitions the input graph with weighted METIS and uses a bias sampling strategy to minimize total communication costs. We implemented our approach using the DistDGL framework and evaluated it using several real-world datasets. We observe that our approach (1) shows an average reduction in communication overhead by 53%, (2) requires less partitioning time to partition a graph, (3) shows improved accuracy, (4) shows a speed up of 1.5x on OGB-Arxiv dataset, when compared to the state-of-the-art DistDGL implementation.

关键词： Graph Neural Network Graph Partitioning distributed Processing Deep Learning

来源：评论

学校读者我要写书评

暂无评论

AccDP: Accelerated Data-parallel distributed DNN Training for Modern GPU-Based HPC Clusters 29

AccDP: Accelerated Data-Parallel Distributed DNN Training fo...

引用

29th Annual IEEE international conference on High Performance computing, Data, and Analytics (HiPC)

作者： Alnaasan, Nawras Jain, Arpan Shafi, Aamir Subramoni, Hari Panda, Dhabaleswar K. Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA

ISBN: (纸本)9781665494236

Deep Learning (DL) has become a prominent machine learning technique due to the availability of efficient computational resources in the form of Graphics Processing Units (GPUs), large-scale datasets and a variety of models. The newer generation of GPUs are being designed with special emphasis on optimizing performance for DL applications. Also, the availability of easy-to-use DL frameworks-like PyTorch and TensorFlowhas enhanced productivity of domain experts to work on their custom DL applications from diverse domains. However, existing Deep Neural Network (DNN) training approaches may not fully utilize the newly emerging powerful GPUs like the NVIDIA A100-this is the primary issue that we address in this paper. Our motivating analyses show that the GPU utilization on NVIDIA A100 can be as low as 43% using traditional DNN training approaches for small-to-medium DL models and input data size. This paper proposes AccDP-a data-parallel distributed DNN training approach-to accelerate GPU-based DL applications. AccDP exploits the Message Passing Interface (MPI) communication library coupled with the NVIDIA's Multi-Process Service (MPS) to increase the amount of work assigned to parallel GPUs resulting in higher utilization of compute resources. We evaluate our proposed design on different small-to-medium DL models and input sizes on the state-of-the-art HPC clusters. By injecting more parallelism into DNN training using our approach, the evaluation shows up to 58% improvement in training performance on a single GPU and up to 62% on 16 GPUs compared to regular DNN training. Furthermore, we conduct an in-depth characterization to determine the impact of several DNN training factors and best practices-including the batch size and the number of data loading workers- to optimally utilize GPU devices. To the best of our knowledge, this is the first work that explores the use of MPS and MPI to maximize the utilization of GPUs in distributed DNN training.

关键词： Deep Neural Networks Graphics Processing Units Multi-Process Service MVAPICH2

来源：评论

学校读者我要写书评

暂无评论

Early Stopping for Any Number of Corruptions 43rd

Early Stopping for Any Number of Corruptions

引用

43rd Annual international conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT)

作者： Loss, Julian Nielsen, Jesper Buus CISPA Helmholtz Ctr Informat Secur Saarbrucken Germany Aarhus Univ Aarhus Denmark

ISBN: (纸本)9783031587337;9783031587344

Minimizing the round complexity of byzantine broadcast is a fundamental question in distributed computing and cryptography. In this work, we present the first early stopping byzantine broadcast protocol that tolerates up to t = n - 1 malicious corruptions and terminates in O(min{f(2), t+1}) rounds for any execution with f <= t actual corruptions. Our protocol is deterministic, adaptively secure, and works assuming a plain public key infrastructure. Prior early-stopping protocols all either require honest majority or tolerate only up to t = (1- epsilon)n malicious corruptions while requiring either trusted setup or strong number theoretic hardness assumptions. As our key contribution, we show a novel tool called a polariser that allows us to transfer certificate-based strategies from the honest majority setting to settings with a dishonest majority.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

distributed-Memory parallel Contig Generation for De Novo Long-Read Genome Assembly 51

Distributed-Memory Parallel Contig Generation for De Novo Lo...

引用

51st international conference on parallel Processing (ICPP)

作者： Guidi, Giulia Raulet, Gabriel Rokhsar, Daniel Oliker, Leonid Yelick, Katherine Buluc, Aydin Univ Calif Berkeley Lawrence Berkeley Natl Lab Berkeley CA 94720 USA Lawrence Berkeley Natl Lab Berkeley CA USA Univ Calif Berkeley DOE Joint Genome Inst Berkeley CA USA

ISBN: (纸本)9781450397339

De novo genome assembly, i.e., rebuilding the sequence of an unknown genome from redundant and erroneous short sequences, is a key but computationally intensive step in many genomics pipelines. The exponential growth of genomic data is increasing the computational demand and requires scalable, high-performance approaches. In this work, we present a novel distributed memory algorithm that, from a string graph representation of the genome and using sparse matrices, generates the contig set, i.e., overlapping sequences that form a map representing a region of a chromosome. Using matrix abstraction, we mask branches in the string graph, and compute the connected component to group genomic sequences that belong to the same linear chain (i.e., contig). Then, we perform multiway number partitioning to minimize the load imbalance in local assembly, i.e., concatenation of sequences from a given contig. Based on the assignment obtained by partitioning, we compute the induce subgraph function to redistribute sequences between processes, resulting in a set of local sparse matrices. Finally, we traverse each matrix using depth-first search to concatenate sequences. Our algorithm shows good scaling with parallel efficiency up to 80% on 128 nodes, resulting in uniform genome coverage and showing promising results in terms of assembly quality. Our contig generation algorithm localizes the assembly process to significantly reduce the amount of computation spent on this step. Our work is a step forward for efficient de novo long read assembly of large genomes in a distributed memory.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

Small Reservoirs Make a Mickle: distributed Reservoir-computing based Equalization for 100 Gb/s VCSEL-enabled Optical Interconnects

Small Reservoirs Make a Mickle: Distributed Reservoir-Comput...

引用

Asia Communications and Photonics conference (ACP) / international conference on Information Photonics and Optical Communications (IPOC)

作者： Zhang, Songte Zhang, Wenjia He, Zuyuan Shanghai Jiao Tong Univ State Key Lab Adv Opt Commun Syst & Networks Shanghai 200240 Peoples R China

ISBN: (纸本)9781665481557

In this paper, we propose a distributed reservoir-computing based parallel nonlinear equalization for 100 Gb/s vertical cavity surface emitting laser (VCSEL) enabled optical interconnects. Equalization performance of proposed equalizer is compared with neural network and Volterra series based equalizers and similar performance can be achieved but with very neat and low computational complexity training process. Moreover, this approach, explained as small reservoirs make a mickle, is a scalable network generation solution that is promising for parallel hardware implementation.

关键词： Short reach optical interconnects reservoir computing equalization algorithm

来源：评论

学校读者我要写书评

暂无评论

Efficient VLSI Architectures of Multimode 2D FIR Filter Bank using distributed Arithmetic Methodology 6th

Efficient VLSI Architectures of Multimode 2D FIR Filter Bank...

引用

6th international conference on Soft computing and Signal Processing, ICSCSP 2023

作者： Odugu, Venkata Krishna Satish, B. Janardhana Rao, B. Gade, Harish Babu Department of ECE CVR College of Engineering Hyderabad India

ISBN: (纸本)9789819984503

For the implementation of the hardware structure of filter architecture, the area, power, and delay efficiency are needed. Memory complexity is also important in the 2D FIR filter architecture, while used for image processing applications. In this work, a memory-efficient 2D FIR filter bank architecture is designed using parallel processing, symmetry, and distributed arithmetic methodology. The symmetry concept decreases the quantity of the multipliers of the filter architecture. parallel processing and multimode filter bank approaches will improve memory efficiency through memory reuse and memory sharing, respectively. The distributed arithmetic (DA)-based filter architectures reduce the multiplier complexity in terms of power and area. Four types of symmetry filter architectures along with one normal filter (without symmetry) are integrated as a multimode filter bank. The required filter can be selected by the control logic. In this filter bank, the memory module is shared by all the sub-filter, which is called memory sharing. The LUT-based DA methodology is used to implement the multipliers of each sub-filter arithmetic module. The proposed multimode filter bank architecture is implemented for the filter length N = 4 with two-level parallel processing using Xilinx Vivado tools. The utilization of resources is correlated with works mentioned in the literature. The area, timing delay and power consumption reports are generated using Genus synthesis tools in 45 nm CMOS technology from the Cadence vendor. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： Filter banks

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 80 81 82 83 84 85 86 87 88 89 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：