检索结果-内蒙古大学图书馆

50th Annual International Symposium on Computer Architecture (ISCA)

作者： Krishnan, Srivatsan Yazdanbaksh, Amir Prakash, Shvetank Jabbour, Jason Uchendu, Ikechukwu Ghosh, Susobhan Boroujerdian, Behzad Richins, Daniel Tripathy, Devashree Faust, Aleksandra Reddi, Vijay Janapa Harvard Univ Cambridge MA 02138 USA Google Res Brain Team Mountain View CA USA UT Austin Austin TX USA IIT Bhubaneswar Bhubaneswar Odisha India

ISBN: (纸本)9798400700958

machine learning (ML) has become a prevalent approach to tame the complexity of design space exploration for domain-specific architectures. While appealing, using ML for design space exploration poses several challenges. First, it is not straightforward to identify the most suitable algorithm from an ever-increasing pool of ML methods. Second, assessing the trade-offs between performance and sample efficiency across these methods is inconclusive. Finally, the lack of a holistic framework for fair, reproducible, and objective comparison across these methods hinders the progress of adopting ML-aided architecture design space exploration and impedes creating repeatable artifacts. To mitigate these challenges, we introduce ArchGym, an open-source gymnasium and easy-to-extend framework that connects a diverse range of search algorithms to architecture simulators. To demonstrate its utility, we evaluate ArchGym across multiple vanilla and domain-specific search algorithms in the design of a custom memory controller, deep neural network accelerators, and a custom SoC for AR/VR workloads, collectively encompassing over 21K experiments. The results suggest that with an unlimited number of samples, ML algorithms are equally favorable to meet the user-defined target specification if its hyperparameters are tuned thoroughly;no one solution is necessarily better than another (e.g., reinforcement learning vs. Bayesian methods). We coin the term "hyperparameter lottery" to describe the relatively probable chance for a search algorithm to find an optimal design provided meticulously selected hyperparameters. Additionally, the ease of data collection and aggregation in ArchGym facilitates research in ML-aided architecture design space exploration. As a case study, we show this advantage by developing a proxy cost model with an RMSE of 0.61% that offers a 2,000-fold reduction in simulation time. Code and data for ArchGym is available at https://***/ArchGym.

关键词： machine learning machine learning for Computer Architecture machine learning for system Reinforcement learning Bayesian Optimization Open Source Baselines Reproducibility

来源：评论

学校读者我要写书评

暂无评论

Core Placement Optimization for Multi-chip Many-core Neural Network systems with Reinforcement learning

引用

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC systemS 2021年第2期26卷 1–27页

作者： Wu, Nan Deng, Lei Li, Guoqi Xie, Yuan Univ Calif Santa Barbara Dept Elect & Comp Engn Santa Barbara CA 93106 USA Tsinghua Univ Ctr Brain Inspired Comp Res Dept Precis Instrument Beijing 100084 Peoples R China

Multi-chip many-core neural network systems are capable of providing high parallelism benefited from decentralized execution, and they can be scaled to very large systems with reasonable fabrication costs. As multi-chip many-core systems scale up, communication latency related effects will take a more important portion in the system performance. While previous work mainly focuses on the core placement within a single chip, there are two principal issues still unresolved: the communication-related problems caused by the non-uniform, hierarchical on/off-chip communication capability in multi-chip systems, and the scalability of these heuristic-based approaches in a factorially growing search space. To this end, we propose a reinforcement-learning-based method to automatically optimize core placement through deep deterministic policy gradient, taking into account information of the environment by performing a series of trials (i.e., placements) and using convolutional neural networks to extract spatial features of different placements. Experimental results indicate that compared with a naive sequential placement, the proposed method achieves 1.99x increase in throughput and 50.5% reduction in latency;compared with the simulated annealing, an effective technique to approximate the global optima in an extremely large search space, our method improves the throughput by 1.22x and reduces the latency by 18.6%. We further demonstrate that our proposed method is capable to find optimal placements taking advantages of different communication properties caused by different system configurations, and work in a topology-agnostic manner.

关键词： Multi-chip many-core architecture neural network accelerator core placement optimization machine learning for system

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：