检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

59 篇 会议

馆藏范围

59 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

41 篇 工学
- 37 篇 计算机科学与技术...
- 16 篇 软件工程
- 7 篇 电气工程
- 2 篇 电子科学与技术（可...
- 2 篇 信息与通信工程
- 1 篇 机械工程
- 1 篇 光学工程
- 1 篇 材料科学与工程（可...
- 1 篇 动力工程及工程热...
- 1 篇 建筑学
- 1 篇 生物工程
- 1 篇 安全科学与工程
10 篇 理学
- 8 篇 数学
- 2 篇 物理学
- 2 篇 统计学（可授理学、...
- 1 篇 生物学
- 1 篇 系统科学
5 篇 管理学
- 4 篇 管理科学与工程(可...
- 2 篇 工商管理
- 1 篇 图书情报与档案管...

主题

5 篇 scheduling
4 篇 parallel archite...
4 篇 computer archite...
3 篇 reconfigurable a...
3 篇 delay
2 篇 computer science
2 篇 scalability
2 篇 registers
2 篇 asynchronous var...
2 篇 static analysis
2 篇 dependence graph
2 篇 multicore archit...
2 篇 costs
2 篇 computer program...
2 篇 high performance...
2 篇 program compiler...
2 篇 hardware
2 篇 field programmab...
2 篇 dynamic scheduli...
2 篇 design methodolo...

机构

2 篇 univ texas austi...
2 篇 ohio state univ ...
2 篇 univ calif berke...
1 篇 math. & computer...
1 篇 department of co...
1 篇 univ texas austi...
1 篇 univ politecn ca...
1 篇 ohio state univ ...
1 篇 lawrence livermo...
1 篇 brookhaven natio...
1 篇 tsinghua univ ct...
1 篇 tsinghua univ tn...
1 篇 lawrence livermo...
1 篇 federal universi...
1 篇 univ jaume 1 dep...
1 篇 pacific nw natl ...
1 篇 univ passau inst...
1 篇 hunan univ coll ...
1 篇 department of el...
1 篇 university of ch...

作者

2 篇 song shuaiwen le...
1 篇 wang qinggang
1 篇 nassi ike
1 篇 bellini riccardo
1 篇 chan ernie
1 篇 rülke s
1 篇 dehnavi maryam m...
1 篇 chen hang
1 篇 cotet costel emi...
1 篇 lee yunsup
1 篇 du jiayi
1 篇 pilato christian
1 篇 zhang minjia
1 篇 kaya kamer
1 篇 li kenli
1 篇 zheng long
1 篇 h. corporaal
1 篇 barker kevin
1 篇 mehta gayatri
1 篇 jin ruoming

语言

59 篇 英文

检索条件"任意字段=Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures"

共 59 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

SPAA 2024 - proceedings of the 36th acm symposium on parallelism in algorithms and architectures

SPAA 2024 - Proceedings of the 36th ACM Symposium on Paralle...

引用

36th acm symposium on parallelism in algorithms and architectures, SPAA 2024

ISBN: (纸本)9798400704161

the proceedings contain 54 papers. the topics discussed include: expediting hazard pointers with bounded RCU critical sections;Alock: asymmetric lock primitive for RDMA systems;when is parallelism fearless and zero-cost with rust?;efficient parallel reinforcement learning framework using the reactor model;parallel best arm identification in heterogeneous environments;brief announcement: lock-free learned search data structure;brief announcement: LIT: lookup interlocked table for range queries;brief announcement: a fast scalable detectable unrolled lock-based linked list;scheduling out-trees online to optimize maximum flow;optimizing dynamic data center provisioning through speed scaling: a primal-dual perspective;scheduling jobs with work-inefficient parallel solutions;and multi bucket queues: efficient concurrent priority scheduling.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SPAA 2023 - proceedings of the 35th acm symposium on parallelism in algorithms and architectures

SPAA 2023 - Proceedings of the 35th ACM Symposium on Paralle...

引用

35th acm symposium on parallelism in algorithms and architectures, SPAA 2023

ISBN: (纸本)9781450395458

the proceedings contain 47 papers. the topics discussed include: Quancurrent: a concurrent quantiles sketch;an efficient scheduler for task-parallel interactive applications;efficient synchronization-light work stealing;balanced allocations in batches: the tower of two choices;massively parallel tree embeddings for high dimensional spaces;deterministic massively parallel symmetry breaking for sparse graphs;an associativity threshold phenomenon in set-associative caches;increment - and - freeze: every cache, everywhere, all of the time;multidimensional approximate agreement with asynchronous fallback;a tight characterization of fast failover routing: resiliency to two link failures is possible;releasing memory with optimistic access: a hybrid approach to memory reclamation and allocation in lock-free programs;transactional composition of nonblocking data structures;applying hazard pointers to more concurrent data structures;and nearly optimal parallel algorithms for longest increasing subsequence.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SPAA 2022 - proceedings of the 34th acm symposium on parallelism in algorithms and architectures

SPAA 2022 - Proceedings of the 34th ACM Symposium on Paralle...

引用

34th acm symposium on parallelism in algorithms and architectures, SPAA 2022

ISBN: (纸本)9781450391467

the proceedings contain 44 papers. the topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth graphs;adaptive massively parallel algorithms for cut problems;preparing for disaster: leveraging precomputation to efficiently repair graph structures upon failures;the energy complexity of Las Vegas leader election;a fully-distributed peer-to-peer protocol for byzantine-resilient distributed hash tables;brief announcement: the (limited) power of multiple identities: asynchronous byzantine reliable broadcast with improved resilience through collusion;brief announcement: composable dynamic secure emulation;and robust and optimal contention resolution without collision detection.

关键词：

来源：评论

学校读者我要写书评

暂无评论

iQAN: Fast and Accurate Vector Search with Efficient Intra-Query parallelism on Multi-Core architectures 23

iQAN: Fast and Accurate Vector Search with Efficient Intra-Q...

引用

28th acm SIGPLAN Annual symposium on Principles and Practice of Parallel Programming, PPoPP 2023

作者： Peng, Zhen Zhang, Minjia Li, Kai Jin, Ruoming Ren, Bin College of William & Mary WilliamsburgVA United States Microsoft AI and Research BellevueWA United States Kent State University KentOH United States

ISBN: (纸本)9798400700156

Vector search has drawn a rapid increase of interest in the research community due to its application in novel AI applications. Maximizing its performance is essential for many tasks but remains preliminary understood. In this work, we investigate the root causes of the scalability bottleneck of using intra-query parallelism to speedup the state-of-the-art graph-based vector search systems on multi-core architectures. Our in-depth analysis reveals several scalability challenges from both system and algorithm perspectives. Based on the insights, we propose iQAN, a parallel search algorithm with a set of optimizations that boost convergence, avoid redundant computations, and mitigate synchronization overhead. Our evaluation results on a wide range of real-world datasets show that iQAN achieves up to 37.7× and 76.6× lower latency than state-of-the-art sequential baselines on datasets ranging from a million to a hundred million datasets. We also show that iQAN achieves outstanding scalability as the graph size or the accuracy target increases, allowing it to outperform the state-of-the-art baseline on two billion-scale datasets by up to 16.0× with up to 64 cores. © 2023 Owner/Author.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

HybriDS: Cache-Conscious Concurrent Data Structures for Near-Memory Processing architectures 22

HybriDS: Cache-Conscious Concurrent Data Structures for Near...

引用

34th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Choe, Jiwon Crotty, Andrew Moreshet, Tali Herlihy, Maurice Bahar, R. Iris Brown Univ Providence RI 02912 USA Northwestern Univ Evanston IL USA

ISBN: (纸本)9781450391467

In recent years, the ever-increasing impact of memory access bottlenecks has brought forth a renewed interest in near-memory processing (NMP) architectures. In this work, we propose and empirically evaluate hybrid data structures, which are concurrent data structures custom-designed for these new NMP architectures. We focus on cache-optimized data structures, such as skiplists and B+ trees, that are often used as index structures in online transaction processing (OLTP) systems to enable fast key-based lookups. these data structures are hierarchical, where lookups begin at a small number of top-level nodes and diverge to many different node paths as they move down the hierarchy, such that nodes in higher levels benefit more from caching. Our proposed hybrid data structures split traditional hierarchical data structures into a host-managed portion consisting of higher-level nodes and an NMP-managed portion consisting of the remaining lower-level nodes, thus retaining and further enhancing the cache-conscious optimizations of their conventional implementations. Although the idea might seem relatively simple, the splitting of the data structure prompts new synchronization problems, and careful implementation is required to ensure high concurrency and correctness. We provide implementations of a hybrid skiplist and a hybrid B+ tree, and we empirically evaluate them on a cycle-accurate full-system architecture simulator. Our results show that the hybrid data structures have the potential to improve performance by more than 2x compared to state-of-the-art concurrent data structures.

关键词： near-memory processing concurrent data structures

来源：评论

学校读者我要写书评

暂无评论

CGO 2022 - proceedings of the 2022 IEEE/acm International symposium on Code Generation and Optimization

CGO 2022 - Proceedings of the 2022 IEEE/ACM International Sy...

引用

20th IEEE/acm International symposium on Code Generation and Optimization, CGO 2022

ISBN: (纸本)9781665405843

the proceedings contain 28 papers. the topics discussed include: a compiler framework for optimizing dynamic parallelism on GPUs;a compiler for sound floating-point computations using affine arithmetic;aggregate update problem for multi-clocked dataflow languages;palmed: throughput characterization for superscalar architectures;automatic generation of debug headers through Blackbox equivalence checking;gadgets splicing: dynamic binary transformation for precise rewriting;lambda the ultimate SSA: optimizing functional programs in SSA;and HECATE: performance-aware scale optimization for homomorphic encryption compiler.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators 28

Atomic Dataflow based Graph-Level Workload Orchestration for...

引用

28th Annual IEEE International symposium on High-Performance Computer Architecture (HPCA)

作者： Zheng, Shixuan Zhang, Xianjue Liu, Leibo Wei, Shaojun Yin, Shouyi Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9781665420273

To efficiently deploy state-of-the-art deep neural network (DNN) workloads with growing computational intensity and structural complexity, scalable DNN accelerators have been proposed in recent years, which are featured by multitensor engines and distributed on-chip buffers. Such spatial architectures have significantly expanded scheduling space in terms of parallelism and data reuse potentials, which demands for delicate workload orchestration. Previous works on DNN's hardware mapping problem mainly focus on operator-level loop transformation for single array, which are insufficient for this new challenge. Resource partitioning methods for multi-engines such as CNN-partition and inter-layer pipelining have been studied. However, their intrinsic disadvantages of workload unbalance and pipeline delay still prevent scalable accelerators from releasing full potentials. In this paper, we propose atomic dataflow, a novel graph-level scheduling and mapping approach developed for DNN inference. Instead of partitioning hardware resources into fixed regions and binding each DNN layer to a certain region sequentially, atomic dataflow schedules the DNN computation graph in workload-specific granularity (atoms) to ensure PE-array utilization, supports flexible atom ordering to exploit parallelism, and orchestrates atom-engine mapping to optimize data reuse between spatially connected tensor engines. Firstly, we propose a simulated annealing based atomic tensor generation algorithm to minimize load unbalance. Secondly, we develop a dynamic programming based atomic DAG scheduling algorithm to systematically explore massive ordering potentials. Finally, to facilitate data locality and reduce expensive off-chip memory access, we present mapping and buffering strategies to efficiently utilize distributed on-chip storage. With an automated optimization framework being established, experimental results show significant improvements over baseline approaches in terms of performance, har

关键词： scheduling domain-specific architectures

来源：评论

学校读者我要写书评

暂无评论

Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory architectures 28

Accelerating Graph Convolutional Networks Using Crossbar-bas...

引用

28th Annual IEEE International symposium on High-Performance Computer Architecture (HPCA)

作者： Huang, Yu Zheng, Long Yao, Pengcheng Wang, Qinggang Liao, Xiaofei Jin, Hai Xue, Jingling Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Sch Comp Sci & Technol Serv Comp Technol & Syst LabCluster & Grid Comp Wuhan Peoples R China UNSW Sydney Sydney NSW Australia

ISBN: (纸本)9781665420273

Graph convolutional networks (GCNs) are promising to enable machine learning on graphs. GCNs exhibit mixed computational kernels, involving regular neural-network-like computing and irregular graph-analytics-like processing. Existing GCN accelerators obey a divide-and-conquer philosophy to architect two separate types of hardware to accelerate these two types of GCN kernels, respectively. this hybrid architecture improves intra-kernel efficiency but considers little inter-kernel interactions in a holistic view for improving overall efficiency. In this paper, we present a new GCN accelerator, REFLIP, with three key innovations in terms of architecture design, algorithm mappings, and practical implementations. First, REFLIP leverages PIM-featured crossbar architectures to build a unified architecture for supporting the two types of GCN kernels simultaneously. Second, REFLIP adopts novel algorithm mappings that can maximize potential performance gains reaped from the unified architecture by exploiting the massive crossbar-structured parallelism. third, REFLIP assembles software/hardware co-optimizations to process real-world graphs efficiently. Compared to the state-of-the-art software frameworks running on Intel Xeon E5-2680v4 CPU and NVIDIA Tesla V100 GPU, REFLIP achieves the average speedups of 6,432 x and 86.32 x and the average energy savings of 9,817x and 302.44 x, respectively. In addition, REFLIP also outperforms a state-of-the-art GCN hardware accelerator, AWB-GCN, by achieving an average speedup of 5.06 x and an average energy saving of 15.63x.

关键词： accelerator processing-in-memory graph convolutional network crossbar architectures

来源：评论

学校读者我要写书评

暂无评论

Adaptable Register File Organization for Vector Processors 28

Adaptable Register File Organization for Vector Processors

引用

28th Annual IEEE International symposium on High-Performance Computer Architecture (HPCA)

作者： Ramirez Lazo, Cristobal Reggiani, Enrico Rojas Morales, Carlos Figueras Bague, Roger Villa Vargas, Luis A. Ramirez Salinas, Marco A. Valero Cortes, Mateo Sabri Unsal, Osman Cristal, Adrian Univ Politecn Cataluna Barcelona Spain Barcelona Supercomp Ctr Barcelona Spain Inst Politecn Nacl Mexico City DF Mexico

ISBN: (纸本)9781665420273

Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL1). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector of up to 128 elements microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate AVA performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area.

关键词： vector architectures data-level parallelism vector register file

来源：评论

学校读者我要写书评

暂无评论

An efficient uncertain graph processing framework for heterogeneous architectures 21

An efficient uncertain graph processing framework for hetero...

引用

26th acm SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP 2021

作者： Zhang, Heng Li, Lingda Zhuang, Donglin Liu, Rui Song, Shuang Tao, Dingwen Wu, Yanjun Song, Shuaiwen Leon Institution of Software Chinese Academy of Sciences China Brookhaven National Laboratory New York United States University of Sydney Sydney Australia University of Chicago Chicago United States Facebook Inc. Montain View United States Washington State University Washington United States Institution of Software Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781450382946

Uncertain or probabilistic graphs have been ubiquitously used in many emerging applications. Previously CPU based techniques were proposed to use sampling but suffer from (1) low computation efficiency and large memory overhead, (2) low degree of parallelism, and (3) nonexistent general framework to effectively support programming uncertain graph applications. To tackle these challenges, we propose a general uncertain graph processing framework for multi-GPU systems, named BPGraph. Integrated with our highly-efficient path sampling method, BPGraph can support a wide range of uncertain graph algorithms' development and optimization. Extensive evaluation demonstrates a significant performance improvement from BPGraph over the state-of-the-art uncertain graph sampling techniques. © 2021 Owner/Author.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共6页 << < 1 2 3 4 5 6 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：