检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

38 篇 会议
3 册 图书

馆藏范围

41 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

22 篇 工学
- 21 篇 计算机科学与技术...
- 11 篇 软件工程
- 2 篇 电气工程
- 1 篇 动力工程及工程热...
- 1 篇 电子科学与技术（可...
- 1 篇 信息与通信工程
- 1 篇 航空宇航科学与技...
- 1 篇 生物工程
4 篇 管理学
- 2 篇 管理科学与工程(可...
- 2 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 理学
- 1 篇 生物学
- 1 篇 系统科学

主题

2 篇 distributed syst...
2 篇 concurrent progr...
2 篇 computer systems...
2 篇 algorithm analys...
1 篇 internet of thin...
1 篇 operating system...
1 篇 fair scheduling
1 篇 scalability
1 篇 dynamic load bal...
1 篇 information syst...
1 篇 computer systems...
1 篇 software enginee...
1 篇 deep learning
1 篇 operating system...
1 篇 communications m...
1 篇 computer hardwar...
1 篇 managers
1 篇 big data
1 篇 software enginee...
1 篇 graphics process...

机构

2 篇 huazhong univers...
1 篇 university of sc...
1 篇 inesc tec braga
1 篇 univ politecn ma...
1 篇 chung hua univer...
1 篇 pennsylvania sta...
1 篇 department of co...
1 篇 natl univ def te...
1 篇 cent wiskunde in...
1 篇 univ erlangen-nu...
1 篇 school of comput...
1 篇 ibm thomas j. wa...
1 篇 ibm thomas j. wa...
1 篇 los alamos natio...
1 篇 inesc tec & u mi...
1 篇 national innovat...
1 篇 univ minho braga
1 篇 state key labora...
1 篇 chaoyang univers...
1 篇 university of cr...

作者

2 篇 pereira jose
2 篇 paugam-moisy h.
2 篇 xuanhua shi
1 篇 volkert j.
1 篇 fritsch a.
1 篇 stamatogiannakis...
1 篇 jimenez-peris ri...
1 篇 shen junzhong
1 篇 azcarraga a.
1 篇 schneider scott
1 篇 lalis spyros
1 篇 lemahieu i
1 篇 card r
1 篇 liao jianwei
1 篇 van de pol j
1 篇 lucia pons
1 篇 tsai szu-hao
1 篇 chao wang
1 篇 mostefaoui a
1 篇 ching-hsien hsu

语言

41 篇 英文

检索条件"任意字段=IFIP WG10.3 Working Conference on Applications in Parallel and Distributed Computing"

共 41 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies 20th

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Frie...

引用

20th ifip wg 10.3 International conference on Network and parallel computing, NPC 2024

作者： Guo, Mingfeng Deng, Liang Dai, Zhe Li, Ruitian Lin, Gaofeng Liu, Jie Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819628292

Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm on GPU due to its short preprocessing time and outstanding performance. However, we observed that this algorithm still has two performance bottlenecks. Firstly, the thread-level parallel mode can introduce to thread divergence issues within GPU warps during the writing phase. Secondly, the thread-level and warp-level fusion mode may struggles to fully exploit GPU resources due to suboptimal mapping relationships between rows and threads. To address these issues, this paper proposes DaCPSpTRSV, a new synchronization-free algorithm with GPU-friendly data communication and parallelism strategies. Specifically, we first develop a fast-forward thread-level approach, incorporating an efficient global memory access pattern and a light-weight dependency control mechanism, to optimize data communication and alleviate thread divergence. A fine-grained fusion strategy is then proposed to maximize GPU parallelism by adaptively selecting the suitable thread-level or warp-level modes. Moreover, the commonly-used compressed sparse row (CSR) format is employed in our DaCPSpTRSV, enhancing the versatility of our algorithm. We evaluate our approach using 245 matrices from the SuiteSparse Matrix Collection on two NVIDIA GPUs, demonstrating speedup ratios of up to 4.77×, 4.94×, 1.67×, and 1.62× compared to cuSPARSE, Sync-Free, CapelliniSpTRSV, and YuenyeungSpTRSV, respectively. The project is open-sourced at https://***/gmfff12334/DaCP. © ifip International Federation for Information Processing 2025.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Dynamic Allocation of Processor Cores to Graph applications on Commodity Servers 23

Dynamic Allocation of Processor Cores to Graph Applications ...

引用

Proceedings of the 32nd International conference on parallel Architectures and Compilation Techniques

作者： Lucia Pons Julio Sahuquillo Timothy M. Jones Universitat Politècnica de València Valencia Spain University of Cambridge Cambridge United Kingdom

ISBN: (纸本)9798350342543

Graph processing is increasingly adopted to solve problems that span many application domains, including scientific computing, social networks, and big-data analytics. These applications present particular features (huge working sets and irregular scalability) that make the default Linux scheduler, which adopts a time-sharing policy to provide a fair scheduler, perform poorly when co-locating multiple graph applications in the same processor. This work focuses on maximizing processor utilization, which is a major concern of current data centers. To this end, we propose AFAIR, a flexible scheduling policy that allocates multiple graph applications on the same processor and assigns a fraction of the cores exclusively to each application instead of sharing them. Moreover, AFAIR dynamically adds/removes cores to the running applications, adapting the number of threads used for parallel execution to balance memory load. This allows AFAIR to achieve almost perfect fairness, on average 95%.

关键词： fair scheduling

来源：评论

学校读者我要写书评

暂无评论

Accelerate Model parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement 22nd

Accelerate Model Parallel Deep Learning Training Using Effec...

引用

22nd ifip wg 6.1 International conference on distributed applications and Interoperable Systems (DAIS) Held as Part of the 17th International Federated conference on distributed computing Techniques (DisCoTec)

作者： Wang, Tianze Payberah, Amir H. Hagos, Desta Haileselassie Vlassov, Vladimir KTH Royal Inst Technol Stockholm Sweden

ISBN: (纸本)9783031160929;9783031160912

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decisionmaking by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of neural network graph traversal orders on device placement. In particular, we empirically study how different graph traversal orders of neural networks lead to different device placements, which in turn affects the training time of the neural network. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing effective graph traversal orders in device placement for various neural network families to improve the training time in model parallelization.

关键词： Device Placement Model parallelization Deep Learning Graph Traversal Order

来源：评论

学校读者我要写书评

暂无评论

DROAllocator: A Dynamic Resource-Aware Operator Allocation Framework in distributed Streaming Processing 17th

DROAllocator: A Dynamic Resource-Aware Operator Allocation ...

引用

17th ifip wg 10.3 International conference on Network and parallel computing, NPC 2020

作者： Liu, Fan Jin, Zongze Mu, Weimin Zhu, Weilin Zhang, Yun Wang, Weiping Institute of Information Engineering Chinese Academy of Sciences Beijing China School of Cyber Security University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783030794774

With the rapid development of Internet services and the Internet of Things (IoT), many studies focus on operator allocation to enhance the DSPAs’ (data stream processing applications) performance and resource utilization. However, the existing approaches ignore the dynamic changes of the node resources to allocate the operator instances to guarantee the performance, which increasing the number of migration leads to the waste of resources and the instability. To address these issues, we propose a framework named DROAllocator to select the appropriate nodes to allocate the operator instances. By capturing the change tendency of the node resources and the operator performance, our allocation mechanism decreases the number of migration to enhance the performance. The experimental results show the DROAllocator not only decrease the number of migrations to allocate the operator instances to ensure the end-to-end throughput and the latency, but also enhance the resource utilization. © 2021, ifip International Federation for Information Processing.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

distributed Quality-Aware Resource Allocation for Video Transmission in Wireless Networks 16th

Distributed Quality-Aware Resource Allocation for Video Tran...

引用

16th ifip wg 10.3 International conference on Network and parallel computing, NPC 2019

作者： He, Chao Xie, Zhidong Tian, Chang College of Communications Engineering Army Engineering University of PLA Nanjing210007 China National Innovation Institute of Defense Technology Academy of Military Sciences of PLA Beijing100071 China

ISBN: (纸本)9783030307080

The rapid development of wireless networks makes it more convenient for people to enjoy high quality multimedia. However, video applications are throughput-demanding, and relatively, radio resource always seems insufficient. Hence, a distributed algorithm is designed in this paper to allocate the limited wireless resource among multiple users for video streaming. In order to specify multimedia service from other ordinary data transmission, the QoE-oriented utility function is considered first. Then, a potential game model is formulated and all the video receivers can update their rate strategies with very little information exchange. By this kind of updating, the bandwidth allocation could be achieved intelligently. The algorithm converges to a set of correlated equilibria. Numeric simulation results indicate that it brings remarkable benefits to both the resource provider and the video users. © 2019, ifip International Federation for Information Processing.

关键词： Resource allocation

来源：评论

学校读者我要写书评

暂无评论

Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA 1

引用

14th ifip wg 10.3 International conference on Network and parallel computing (NPC)

作者： Qiao, Yuran Shen, Junzhong Huang, Dafei Yang, Qianming Wen, Mei Zhang, Chunyuan Natl Univ Def Technol Coll Comp Natl Key Lab Parallel & Distributed Proc Changsha Hunan Peoples R China

ISBN: (数字)9783319682105

ISBN: (纸本)9783319682105;9783319682099

Nowadays, the rapid growth of data across the Internet has provided sufficient labeled data to train deep structured artificial neural networks. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. To this end, various FPGA based deep neural network accelerators are proposed for higher performance and lower energy consumption. However, as a dilemma, the development cycle of FPGA application is much longer than that of CPU and GPU. Although FPGA vendors such as Altera and Xilinx have released OpenCL framework to ease the programming, tuning the OpenCL codes for desirable performance on FPGAs is still challenging. In this paper, we look into the OpenCL implementation of Convolutional Neural Network (CNN) on FPGA. By analysing the execution manners of a CPU/GPU oriented verision on FPGA, we find out the causes of performance difference between FPGA and CPU/GPU and locate the performance bottlenecks. According to our analysis, we put forward a corresponding optimization method focusing on external memory transfers. We implement a prototype system on an Altera Stratix V A7 FPGA, which brings a considerable 4.76x speed up to the original version. To the best of our knowledge, this implementation outperforms most of the previous OpenCL implementations on FPGA by a large margin.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

Holistic Shuffler for the parallel Processing of SQL Window Functions 1

引用

16th ifip wg 6.1 International conference on distributed applications and Interoperable Systems (DAIS) held as part of the 11th International Federated conference on distributed computing Techniques (DisCoTec)

作者： Coelho, Fabio Pereira, Jose Vilaca, Ricardo Oliveira, Rui INESC TEC Braga Portugal Univ Minho Braga Portugal

ISBN: (数字)9783319395777

ISBN: (纸本)9783319395777;9783319395760

Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. Currently, systems bypass parallelization opportunities which become especially relevant when considering Big Data as data is naturally partitioned. We present a shuffling technique to improve the parallel execution of window functions when data is naturally partitioned when the query holds a partitioning clause that does not match the natural partitioning of the relation. We evaluated this technique with a non-cumulative ranking function and we were able to reduce data transfer among parallel workers in 85% when compared to a naive approach.

关键词： Big data

来源：评论

学校读者我要写书评

暂无评论

An RDMA Middleware for Asynchronous Multi-stage Shuffling in Analytical Processing 1

引用

作者： Goncalves, Rui C. Pereira, Jose Jimenez-Peris, Ricardo INESC TEC & U Minho HASLab Braga Portugal Univ Politecn Madrid Madrid Spain LeanXcale Madrid Spain

ISBN: (数字)9783319395777

ISBN: (纸本)9783319395777;9783319395760

A key component in large scale distributed analytical processing is shuffling, the distribution of data to multiple nodes such that the computation can be done in parallel. In this paper we describe the design and implementation of a communication middleware to support data shuffling for executing multi-stage analytical processing operations in parallel. The middleware relies on RDMA (Remote Direct Memory Access) to provide basic operations to asynchronously exchange data among multiple machines. Experimental results show that the RDMA-based middleware developed can provide a 75% reduction of the costs of communication operations on parallel analytical processing tasks, when compared with a sockets middleware.

关键词： distributed databases OLAP Middleware RDMA

来源：评论

学校读者我要写书评

暂无评论

The new territory of lightweight security in a cloud computing environment

The new territory of lightweight security in a cloud computi...

引用

11th ifip wg 10.3 International conference on Network and parallel computing, NPC 2014

作者： Wang, Shu-Ching Tseng, Shih-Chi Chuan, Hsin-Met Yan, Kuo-Qin Tsai, Szu-Hao Chaoyang University of Technology Taiwan Hsing-Kuo University Taiwan

ISBN: (纸本)9783662449165

The cloud computing is an Internet-based resource sharing system in which virtualized resources are provided over the Internet. Cloud computing refers to a class of systems and applications that employ distributed resources for use in various applications;these computing resources are utilized over a network to facilitate the execution of tasks. However, cloud computing resources are heterogeneous and dynamic, connecting a broad range of resources. Thus, there are a large numbers of application and data center in the cloud computing environment. Therefore, the security issues of authentication and communication in application services and data center need to be considered in the cloud computing environment. © 2014 ifip International Federation for Information Processing.

关键词： Cloud security

来源：评论

学校读者我要写书评

暂无评论

Fault-tolerant storage servers for the databases of redundant web servers in a computing grid

Fault-tolerant storage servers for the databases of redundan...

引用

11th ifip wg 10.3 International conference on Network and parallel computing, NPC 2014

作者： Ok, MinHwan Korea Railroad Research Institute Woulam Uiwang Gyeonggi Korea Republic of

ISBN: (纸本)9783662449165

computing Grid in this paper is a Grid computing environment that supplies applications which run in a local computing site only, without any modification or adaptation for running globally in the Grid computing environment. Each stage of a running application is transcribed at all the management databases coupled with respective Web servers. The consistency is maintained by double-checking of every acknowledgement against a write to all the management databases and a circulated read response from either database. The storage spaces could be integrated into a single one by storage managers within a computing site. The modification of a file is broadcast to the storage managers sharing the storage space and their allocation tables are updated immediately. The system architecture is in a distributed control type, potentially the best match for Cloud computing. © 2014 ifip International Federation for Information Processing.

关键词： Managers

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共5页 << < 1 2 3 4 5 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：