Tool learning has generated widespread interest as a vital means of interaction between Large Language Models (LLMs) and the physical world. Current research predominantly emphasizes LLMs' capacity to utilize tool...
详细信息
Reinforcement Learning from Human Feedback (RLHF) is a crucial approach to aligning language models with human values and intentions. A fundamental challenge in this method lies in ensuring that the reward model accur...
详细信息
Large language models (LLMs) have achieved tremendous success in understanding language and processing text. However, question-answering (QA) on lengthy documents faces challenges of resource constraints and a high pr...
详细信息
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools (e.g., various APIs) for large language models (LLMs) with expected outcomes. However, these approaches rely on a lim...
详细信息
Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention's q...
详细信息
There are a wide variety of intelligence accelerators with promising performance and energy efficiency,deployed in a broad range of applications such as computer vision and speech ***,programming productivity hinders ...
详细信息
There are a wide variety of intelligence accelerators with promising performance and energy efficiency,deployed in a broad range of applications such as computer vision and speech ***,programming productivity hinders the deployment of deep learning *** low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model,is designed to reduce the programming burden on the intelligence ***,it is inflexible for developers to build a network model for every deep learning application,which probably brings unnecessary repetitive *** this paper,a flexible and efficient programming framework for deep learning accelerators,FlexPDA,is proposed,which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast *** evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end *** experimental results validate the effectiveness of FlexPDA,which achieves an end-to-end performance improvement of 1.620x over the low-level library.
With the advent of virtualization techniques and software-defined networking(SDN),network function virtualization(NFV)shifts network functions(NFs)from hardware implementations to software appliances,between which exi...
详细信息
With the advent of virtualization techniques and software-defined networking(SDN),network function virtualization(NFV)shifts network functions(NFs)from hardware implementations to software appliances,between which exists a performance *** to narrow the gap is an essential issue of current NFV ***,the cumbersomeness of deployment,the water pipe effect of virtual network function(VNF)chains,and the complexity of the system software stack together make it tough to figure out the cause of low performance in the NFV *** pinpoint the NFV system performance,we propose NfvInsight,a framework for automatic deployment and benchmarking VNF *** framework tackles the challenges in NFV performance *** framework components include chain graph generation,automatic deployment,and fine granularity *** design and implementation of each component have their *** the best of our knowledge,we make the first attempt to collect rules forming a knowledge base for generating reasonable chain *** deploys the generated chain graphs automatically,which frees the network operators from executing at least 391 lines of bash commands for a single *** diagnose the performance bottleneck,NfvInsight collects metrics from multiple layers of the software ***,we collect the network stack latency distribution ingeniously,introducing only less than 2.2%*** showcase the convenience and usability of NfvInsight in finding bottlenecks for both VNF chains and the underlying *** our framework,we find several design flaws of the network stack,which are unsuitable for packet forwarding inside one single server under the NFV *** optimization for these flaws gains at most 3x performance improvement.
1 Introduction Most real-world graphs are large-scale but unstructured and *** of the most notable characteristics of real-world graphs is the skewed power law degree distribution[1]:most vertices have a few neighbors...
详细信息
1 Introduction Most real-world graphs are large-scale but unstructured and *** of the most notable characteristics of real-world graphs is the skewed power law degree distribution[1]:most vertices have a few neighbors while a few own a large number of *** characteristics present challenges for efficient parallel graph processing,such as load imbalance,poor locality,and redundant *** from modifying the graph programming abstraction or changing the execution models on different architectures,reducing the irregularity of graph data also improves the performance of graph processing[2].For example,it is wellknown that BFS has a bad temporal locality,but it is possible to transform irregular graphs to more regular ones to improve spatial locality and gain more performance.
Uniform memory multicore neural network accelerators(UNNAs)furnish huge computing power to emerging neural network ***,with neural network architectures going deeper and wider,the limited memory capacity has become a ...
详细信息
Uniform memory multicore neural network accelerators(UNNAs)furnish huge computing power to emerging neural network ***,with neural network architectures going deeper and wider,the limited memory capacity has become a constraint to deploy models on UNNA *** how to efficiently manage memory space and how to reduce workload footprints are urgently *** this paper,we propose Tetris:a heuristic static memory management framework for UNNA *** reconstructs execution flows and synchronization relationships among cores to analyze each tensor’s liveness *** the memory management problem is converted to a sequence permutation *** uses a genetic algorithm to explore the permutation space to optimize the memory management strategy and reduce memory *** evaluate several typical neural networks and the experimental results demonstrate that Tetris outperforms the state-of-the-art memory allocation methods,and achieves an average memory reduction ratio of 91.9%and 87.9%for a quad-core and a 16-core Cambricon-X platform,respectively.
Graph convolutional networks(GCNs)have received significant attention from various research fields due to the excellent performance in learning graph *** GCN performs well compared with other methods,it still faces **...
详细信息
Graph convolutional networks(GCNs)have received significant attention from various research fields due to the excellent performance in learning graph *** GCN performs well compared with other methods,it still faces *** a GCN model for large-scale graphs in a conventional way requires high computation and storage ***,motivated by an urgent need in terms of efficiency and scalability in training GCN,sampling methods have been proposed and achieved a significant *** this paper,we categorize sampling methods based on the sampling mechanisms and provide a comprehensive survey of sampling methods for efficient training of *** highlight the characteristics and differences of sampling methods,we present a detailed comparison within each category and further give an overall comparative analysis for the sampling methods in all ***,we discuss some challenges and future research directions of the sampling methods.
暂无评论