检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

4,544 篇 会议
81 篇 期刊文献
2 册 图书

馆藏范围

4,627 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,556 篇 工学
- 2,362 篇 计算机科学与技术...
- 1,162 篇 软件工程
- 494 篇 电气工程
- 406 篇 信息与通信工程
- 212 篇 电子科学与技术（可...
- 212 篇 控制科学与工程
- 98 篇 网络空间安全
- 80 篇 动力工程及工程热...
- 80 篇 生物工程
- 68 篇 光学工程
- 65 篇 机械工程
- 56 篇 生物医学工程（可授...
- 36 篇 建筑学
- 30 篇 化学工程与技术
- 29 篇 环境科学与工程（可...
614 篇 理学
- 375 篇 数学
- 144 篇 物理学
- 93 篇 生物学
- 74 篇 系统科学
- 69 篇 统计学（可授理学、...
464 篇 管理学
- 345 篇 管理科学与工程(可...
- 134 篇 图书情报与档案管...
- 128 篇 工商管理
- 65 篇 公共管理
111 篇 法学
- 103 篇 社会学
104 篇 医学
- 73 篇 公共卫生与预防医...
- 31 篇 临床医学
- 29 篇 基础医学(可授医学...
65 篇 文学
- 65 篇 新闻传播学
38 篇 经济学
- 37 篇 应用经济学
16 篇 教育学
10 篇 农学
1 篇 艺术学

主题

1,192 篇 computer archite...
425 篇 hardware
344 篇 high performance...
294 篇 computational mo...
255 篇 concurrent compu...
236 篇 application soft...
213 篇 computer science
205 篇 parallel process...
180 篇 distributed comp...
178 篇 costs
172 篇 delay
169 篇 grid computing
168 篇 bandwidth
164 篇 field programmab...
153 篇 resource managem...
153 篇 cloud computing
145 篇 computer network...
142 篇 throughput
129 篇 processor schedu...
125 篇 scalability

机构

22 篇 institute of com...
16 篇 graphic era deem...
14 篇 chitkara univers...
13 篇 university of ch...
11 篇 college of compu...
10 篇 school of comput...
10 篇 computer science...
9 篇 georgia inst tec...
9 篇 computer systems...
9 篇 intel corp santa...
8 篇 department of co...
8 篇 department of el...
8 篇 ibm thomas j. wa...
8 篇 barcelona superc...
8 篇 mathematics and ...
8 篇 quantum universi...
8 篇 中国电子科技集团...
8 篇 carnegie mellon ...
8 篇 mathematics and ...
8 篇 key laboratory o...

作者

14 篇 navaux philippe ...
10 篇 satvik vats
9 篇 vats satvik
9 篇 i. foster
9 篇 d.k. panda
8 篇 dhabaleswar k. p...
8 篇 dongarra jack
8 篇 loh gabriel h.
8 篇 guedes dorgival
8 篇 prasanna viktor ...
8 篇 mutlu onur
7 篇 shiva mehta
7 篇 borin edson
7 篇 chong frederic t...
7 篇 torrellas josep
7 篇 kim nam sung
7 篇 li chao
7 篇 zhou huiyang
7 篇 viktor k. prasan...
7 篇 ferreira renato

语言

4,564 篇 英文
41 篇 其他
19 篇 中文
3 篇 葡萄牙文
1 篇 法文

检索条件"任意字段=15th Symposium on Computer Architecture and High Performance Computing"

共 4627 条记录，以下是21-30 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Exploiting the Potential of Flexible Processing Units 35

Exploiting the Potential of Flexible Processing Units

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Vazquez, Mateo Azhar, Muhammad Waqar Trancoso, Pedro Chalmers Univ Technol Dept Comp Sci & Engn Gothenburg Sweden

ISBN: (纸本)9798350305487

In order to meet the increased computational demands and stricter power constraints of modern applications, architectures have evolved to include domain-specific accelerators. In order to design efficient accelerators, three main challenges need to be addressed: compute, memory, and control. Moreover, since SoCs usually contain multiple accelerators, selecting the right one for each task also become crucial. this becomes specially relevant in Flexible Processing Units (xPUs), processing units that provide multiple functionalities with the same hardware. While it is possible to use shared support components for all functionalities, this will lead to sub-optimal performance. In this work, we take one example of such xPU, and analyze the aspects which have not yet been fully addressed, showing that there is more potential to be exploited. By understanding the required memory patterns, we can achieve up to 72% speedup gains compared to using the memory support optimized for a different functionality. Furthermore, we propose an in-depth analysis of the different functionalities provided by the xPU. We then leverage the insights obtained from this analysis by providing a mechanism that selects the right functionality, maximizing hardware utilization.

关键词： Flexible Processing Unit Vector Unit Systolic Array GEMM DNN Scientific computing

来源：评论

学校读者我要写书评

暂无评论

A performance Comparison of HPC Workloads on Traditional and Cloud-based HPC Clusters 35

A Performance Comparison of HPC Workloads on Traditional and...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Munhoz, Vanderlei Bonfils, Antoine Castro, Marcio Mendizabal, Odorico Univ Fed Santa Catarina Florianopolis SC Brazil Polytech Grenoble Grenoble France

ISBN: (纸本)9798350381603

Cloud computing allows users to access large computing infrastructures quickly. In the high performance computing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and research groups to use highly parallel infrastructures in the cloud. However, parallel runtime systems and software optimizations proposed over the years to improve the performance and scalability of HPC applications targeted traditional on-premise HPC clusters, where developers have direct access to the underlying hardware without any kind of virtualization. In this paper, we analyze the performance and scalability of HPC applications from the NAS Parallel Benchmarks suite when running on a virtualized HPC cluster built on top of Amazon Web Services (AWS), contrasting them with the results obtained with the same applications running on a traditional on-premise HPC cluster from Grid'5000. Our results show that CPU-bound applications achieve similar results in both platforms, whereas communication-bound applications may be impacted by the limited network bandwidth in the cloud. Cloud infrastructure demonstrated better performance under workloads with moderate communication and mediumsized messages.

关键词： high performance computing Cloud computing NAS Parallel Benchmarks performance Evaluation

来源：评论

学校读者我要写书评

暂无评论

Automated Road Extraction from Aerial Images: A Generative Approach with W-FuseNet Model 15

Automated Road Extraction from Aerial Images: A Generative A...

引用

15th International Conference on computing Communication and Networking Technologies, ICCCNT 2024

作者： Muduli, Debendra Sharma, Santosh Kumar Pradhan, Debasish C.V. Raman Global University Dept. Computer Science and Eng. Bhubaneshwar India

ISBN: (纸本)9798350370249

this research introduces an automated road extraction model utilizing deep learning techniques for high-resolution aerial imagery. Focused on applications in urban planning, disaster management, and logistics, the study employs convolutional neural networks (CNNs) and a conditional Generative Adversarial Network (cGAN) architecture. the Massachusetts Roads Dataset, comprising 1171 images, is subject to advanced preprocessing techniques, including resizing, concatenation, and normalization. the proposed model features a carefully crafted generator with an encoder-decoder architecture and a modified Pix2pix-based cGAN for semantic segmentation. Inspired by the PatchGAN concept, the discriminator assesses image patches to distinguish real from generated images. A literature survey evaluates methodologies, emphasizing the Road Structure Refined CNN (RSRCNN), generative learning approaches, and the fully convolutional network (FCN) for road extraction. performance analysis demonstrates the model's competitive IoU score compared to existing approaches along with accuracy, precision, recall and f1-score. A novel road extraction model using the Massachusetts Roads Dataset is introduced, highlighting WFuseNet architecture and advanced pre-processing techniques. © 2024 IEEE.

关键词： cGAN CNNs FCN PatchGAN Pix2pix RSRCNN

来源：评论

学校读者我要写书评

暂无评论

15th ACM SIGCHI symposium on Engineering Interactive computing Systems, EICS 2023

15th ACM SIGCHI Symposium on Engineering Interactive Computi...

引用

15th ACM SIGCHI symposium on Engineering Interactive computing Systems, EICS 2023

ISBN: (纸本)9783031592348

the proceedings contain 17 papers. the special focus in this conference is on Engineering Interactive computing Systems. the topics include: Evaluation of a Social Robot System for performance-Oriented Stroke therapy;MUMR-MIODMIT: A Generic architecture Extending Standard Interactive Systems architecture to Address Engineering Issues for Rehabilitation;serious Game for Company Governance: Supporting Integration, Prevention of Professional Disintegration and Job Retention of People with Disabilities;two Concepts of Domain-Specific Languages for therapists to Control a Humanoid Robot;an Approach to Leverage Artificial Intelligence for Car-Parking Related Mobile Applications;Engineering AI-Similar Designs: Should I Engineer My Interactive System with AI Technologies?;explaining through the Right Reasoning Style: Lessons Learnt;Exploring AI-Enhanced Shared Control for an Assistive Robotic Arm;hidden Figures: Architectural Challenges to Expose Parameters Lost in Code;Not What I was Trained for – Out-of-Distribution-Tests for Interactive AIs;end User Development for Extended Reality;exertion Trainer: Smartphone Exergame Design to Support Children’s Kinesthetic Learning through Playful Feedback;explaining Temporal Logic Model Checking Counterexamples through the Use of Structured Natural Language;merging Creativity with Computation in Sketch-to-Code Transitions;UX Data Visualization: Supporting Software Professionals in Exploring Users’ Interaction Data.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reverse Time Migration with Lossy and Lossless Wavefield Compression 35

Reverse Time Migration with Lossy and Lossless Wavefield Com...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Barbosa, Carlos H. S. Coutinho, Alvaro L. G. A. Fed Univ Rio Janeiro UFRJ0 Dept Civil Engn Rio De Janeiro Brazil

ISBN: (纸本)9798350305487

Seismic imaging techniques like Reverse Time Migration (RTM) are time-consuming and data-intensive activities in the field of geophysical exploration. the computational cost associated with the stability and dispersion conditions in the discrete two-way wave equation makes RTM time-consuming. Additionally, RTM is data-intensive due to the need to manage a considerable amount of information, such as the forward propagated wavefields (source wavefield), to build the final migrated seismic image according to an imaging condition. In this context, we introduce lossy and lossless wavefield compression for parallel multi-core and GPU-based RTM to alleviate the data transfer between processor and disk. We use OpenACC for enabling GPU parallelism and the ZFP library aligned to decimation based on the Nyquist sampling theorem to reduce storage. We study experimentally the effects of wavefield compression for both GPU-based and optimized OpenMP+vectorization RTM versions. Multi-core and GPU-based RTM have been linked to the ZFP library to compress the source wavefield on-the-fly once it has been decimated according to the Nyquist sampling theorem to calculate the imaging condition. this approach can reduce drastically the persistent storage required by the technique. However, it is essential to understand the impact of using compressed wavefields on the migration process that builds the seismic image. In this context, we show how much storage can be reduced without compromising the seismic image's accuracy and quality.

关键词： Seismic imaging high-performance computing Reverse Time Migration OpenMP/OpenACC Compression

来源：评论

学校读者我要写书评

暂无评论

NeurOPar, A Neural Network-driven EDP Optimization Strategy for Parallel Workloads 35

NeurOPar, A Neural Network-driven EDP Optimization Strategy ...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Kunas, Cristiano A. Rossi, Fabio D. Luizelli, Marcelo C. Calheiros, Rodrigo N. Navaux, Philippe O. A. Lorenzon, Arthur F. Fed Univ Pampa Campus Alegrete Bage Brazil Fed Inst Farroupilha Campus Alegrete Farroupilha Brazil Western Sydney Univ Penrith NSW Australia Univ Fed Rio Grande do Sul Inst Informat Porto Alegre RS Brazil

ISBN: (纸本)9798350305487

the pursuit of energy efficiency has been driving the development of techniques to optimize hardware resource usage in high-performance computing (HPC) servers. On multicore architectures, thread-level parallelism (TLP) exploitation, dynamic voltage and frequency scaling (DVFS), and uncore frequency scaling (UFS) are three popular methods applied to improve the trade-off between performance and energy consumption, represented by the energy-delay product (EDP). However, the complexity of selecting the optimal configuration (TLP degree, DVFS, and UFS) for each application poses a challenge to software developers and end-users due to the massive number of possible configurations. To tackle this challenge, we propose NeurOPar, an optimization strategy for parallel workloads driven by an artificial neural network (ANN). It uses representative hardware and software metrics to build and train an ANN model that predicts combinations of thread count and core/uncore frequency levels that provide optimal EDP results. through experiments on four multicore processors using twenty-five applications, we demonstrate that NeurOPar predicts combinations that yield EDP values close to the best ones achieved by an exhaustive search and improve the overall EDP by 42% compared to the default execution of HPC applications. We also show that NeurOPar can enhance the execution of parallel applications without incurring the performance and energy penalties associated with online methods by comparing it with two state-of-the-art strategies.

关键词： Parallel computing Artificial Neural-Network performance-Energy Optimization

来源：评论

学校读者我要写书评

暂无评论

Celeritas: Out-of-Core based Unsupervised Graph Neural Network via Cross-layer computing 2024 30

Celeritas: Out-of-Core based Unsupervised Graph Neural Netwo...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Li, Yi Yang, Tsun-Yu Yang, Ming-Chang Shen, Zhaoyan Li, Bingzhe Univ Texas Dallas Dept Comp Sci Richardson TX 75083 USA Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China Shandong Univ Dept Comp Sci Jinan Shandong Peoples R China

ISBN: (纸本)9798350393132;9798350393149

Graph neural networks (GNN) one of the most popular neural network models, are extensively applied in graphrelated fields, including drug discovery, recommendation systems, etc. Unsupervised graph learning as one type of GNN plays a crucial role in various graph-related missions like node classification and edge prediction. However, with the increasing size of real-world graph datasets, processing such massive graphs in host memory becomes impractical, and GNN training demands a substantial storage volume to accommodate the vast amount of graph data. Consequently, GNN training results in significant I/O migration between the host and storage. Although state-ofthe-art frameworks have made strides in mitigating I/O overhead by considering embedding locality, their GNN frameworks still suffer from long training times. In this paper, we propose a fully out-of-core framework, called Celeritas, which speeds up the unsupervised GNN training on a single machine by co-designing the GNN algorithm and storage systems. First, based on the theoretical analysis, we propose a new partial combination operation to enable the embedding updates across GNN layers. this cross-layer computing achieves future computation for the embedding stored in memory to save data migration. Second, due to the dependency between embedding and edges, we consider their data locality together. Based on the cross-layer computing property, we propose a new loading order to fully utilize the data stored in the main memory to save I/O. Finally, a new sampling scheme called two-level sampling is proposed associated with a new partition algorithm to further reduce data migration and computation overhead while maintaining similar training accuracy. the real system experiments indicate that the proposed Celeritas can reduce the total training time of different GNN models from 44.76% to 73.85% compared to state-of-art schemes for different graph datasets.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Assessing the performance of an architecture-aware optimization tool for neural networks 35

Assessing the performance of an architecture-aware optimizat...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Marichal, Raul Dufrechou, Ernesto Ezzatti, Pablo Fac Ingn Inst Comp Montevideo Uruguay

ISBN: (纸本)9798350381603

the important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven technologies, optimizing the performance and effectiveness of these networks has become crucial. While numerous studies have achieved promising results in this field, the process of fine-tuning and identifying optimal architectures for specific problem domains remains a complex and resource-intensive task. As such, there is a pressing need to explore and evaluate techniques that can improve this optimization process, reducing costs and time-to-deployment while maximizing the overall performance of Neural Networks. this work focuses on evaluating the optimization process of NetAdpat for two neural networks on an Nvidia Jetson device. We observe a performance decay for the larger network when the algorithm tries to meet the latency constraint. Furthermore, we propose potential alternatives to optimize this tool. Particularly, we propose an alternative configuration search procedure that allows us to enhance the optimization process, achieving speedups of up to similar to 7x.

关键词： efficient computing neural network optimizations edge devices heterogeneous computing NetAdapt

来源：评论

学校读者我要写书评

暂无评论

MIMDRAM: An End-to-End Processing-Using-DRAM System for high-throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data computing 30

MIMDRAM: An End-to-End Processing-Using-DRAM System for High...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Oliveira, Geraldo F. Olgun, Ataberk Yaglikci, Abdullah Giray Bostanci, F. Nisa Gomez-Luna, Juan Ghose, Saugata Mutlu, Onur Swiss Fed Inst Technol Zurich Switzerland Univ Illinois Champaign IL USA

ISBN: (纸本)9798350393132;9798350393149

Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide (e.g., 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD parallelism (which is often smaller than the DRAM row granularity), PUD execution often leads to underutilization, throughput loss, and energy waste. Second, due to the high area cost of implementing interconnects that connect columns in a wide DRAM row, most PUD architectures are limited to the execution of parallel map operations, where a single operation is performed over equally-sized input and output arrays. third, the need to feed the wide DRAM row with tens of thousands of data elements combined with the lack of adequate compiler support for PUD systems create a programmability barrier, since programmers need to manually extract SIMD parallelism from an application and map computation to the PUD hardware. Our goal is to design a flexible PUD system that overcomes the limitations caused by the large and rigid granularity of PUD. To this end, we propose MIMDRAM, a hardware/software co-designed PUD system that introduces new mechanisms to allocate and control only the necessary resources for a given PUD operation. the key idea of MIMDRAM is to leverage finegrained DRAM (i.e., the ability to independently access smaller segments of a large DRAM row) for PUD computation. MIMDRAM exploits this key idea to enable a multiple-instruction multiple-data (MIMD) execution model in each DRAM subarray (and SIMD execution within each DRAM row segment). We evaluate MIMDRAM using twelve real-world applications and 495 multi-programmed application mixes. Our evaluation shows that MIMDRAM provides 34x the performance, 14.3x the energy efficiency, 1.7x the throughp

关键词： DRAM energy-efficiency hardware/software co-design memory-centric computing processing-in-memory

来源：评论

学校读者我要写书评

暂无评论

Supporting Secure Multi-GPU computing with Dynamic and Batched Metadata Management 30

Supporting Secure Multi-GPU Computing with Dynamic and Batch...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Na, Seonjin Kim, Jungwoo Lee, Sunho Huh, Jaehyuk Georgia Inst Technol Atlanta GA 30332 USA Korea Adv Inst Sci & Technol Seoul South Korea

ISBN: (纸本)9798350393132;9798350393149

With growing problem sizes for GPU computing, multi-GPU systems with fine-grained memory sharing have emerged to improve the current coarse-grained unified memory support based on page migration. Such multi-GPU systems with shared memory pose a new challenge in securing CPUGPU and inter-GPU communications, as the cost of secure data transfers adds a significant performance overhead. there are two overheads of secure communication in multi-GPU systems: First, extra overhead is added to generate one-time pads (OTPs) for authenticated encryption. Second, the security metadata such as MACs and counters passed along with encrypted data consume precious network bandwidth. this study investigates the performance impact of secure communication in multi-GPU systems and evaluates the prior CPU-oriented OTP precomputation schemes adapted for multi-GPU systems. Our investigation identifies the challenge with the limited OTP buffers for interGPU communication and the opportunity to reduce traffic for security meta-data with bursty communications in GPUs. Based on the analysis, this paper proposes a new dynamic OTP buffer allocation technique, which adjusts the buffer assignment for each source-destination pair to reflect the communication patterns. To address the bandwidth problem by extra security metadata, the study employs a dynamic batching scheme to transfer only a single set of metadata for each batched group of data responses. the proposed design constantly tracks the communication pattern from each GPU, periodically adjusts the allocated buffer size, and dynamically forms batches of data transfers. Our evaluation shows that in a 16-GPU system, the proposed scheme can improve the performance by 13.2% and 17.5% on average from the prior cached and private schemes, respectively.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共463页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：