咨询与建议

限定检索结果

文献类型

  • 1 篇 会议

馆藏范围

  • 1 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 1 篇 工学
    • 1 篇 计算机科学与技术...
    • 1 篇 软件工程

主题

  • 1 篇 algorithm-system...
  • 1 篇 llm quantization
  • 1 篇 large language m...

机构

  • 1 篇 univ chinese aca...
  • 1 篇 chinese acad sci...
  • 1 篇 shanghaitech uni...
  • 1 篇 north china elec...

作者

  • 1 篇 cheng long
  • 1 篇 wang mengdi
  • 1 篇 liu lian
  • 1 篇 pan yudong
  • 1 篇 wang ying
  • 1 篇 ren haimeng
  • 1 篇 xu zhaohui
  • 1 篇 han yinhe
  • 1 篇 li xiaowei

语言

  • 1 篇 英文
检索条件"主题词=Algorithm-System Co-design"
1 条 记 录,以下是1-10 订阅
排序:
coMET: Towards Practical W4A4KV4 LLMs Serving  25
COMET: Towards Practical W4A4KV4 LLMs Serving
收藏 引用
30th International conference on Architectural Support for Programming Languages and Operating systems-ASPLOS
作者: Liu, Lian Cheng, Long Ren, Haimeng Xu, Zhaohui Pan, Yudong Wang, Mengdi Li, Xiaowei Han, Yinhe Wang, Ying Univ Chinese Acad Sci CAS Inst Comp Technol Beijing Peoples R China North China Elect Power Univ Beijing Peoples R China ShanghaiTech Univ Shanghai Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China
Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers. However, prevalent quantization methods, such as 8-bit... 详细信息
来源: 评论