While large language models (LLMs) are usually deployed on powerful servers, there is growing interest in deploying them on local machines for better real-time performance, service stability, privacy, and flexibility....
详细信息
ISBN:
(数字)9798350387957
ISBN:
(纸本)9798350387964
While large language models (LLMs) are usually deployed on powerful servers, there is growing interest in deploying them on local machines for better real-time performance, service stability, privacy, and flexibility. Unfortunately, the GPU memory on local machines is often insufficient to accommodate the entire LLM. Although running an LLM on such a GPU device is still possible by swapping data between the limited GPU memory and the abundant main memory, the slow speed of data swapping significantly hampers inference time, rendering it impractical in reality. In this paper, we propose RTiL, a systematic solution to address the above challenge. RTiL utilizes collaborative inference, which combines a lightweight LLM with the default powerful LLM. The lightweight LLM generates output tokens, which are then validated for quality by the powerful LLM. This approach allows RTiL to significantly speed up inference while maintaining the same output quality as when using the powerful LLM alone. Additionally, by delegating part of the inference workload to the CPU and optimizing data movement between main and GPU memory, we further enhance the efficiency of the inference process. Furthermore, we extend RTiL to handle requests with real-time requirements, enabling it to meet such demands by slightly trading off output quality. Through extensive experiments, we demonstrate notable improvements in inference efficiency and the ability to fulfill real-time requirements while minimizing degradation in output quality.
Light field (LF), as the representative of Metaverse multimedia format, its huge data scale and heavy rendering burden have brought great challenges to networks and user devices. Although edge-based light field Metave...
详细信息
In the rapidly advancing field of autonomous technologies, particularly in transportation and surveillance, there is a critical need for accurate and reliable pedestrian detection systems. While deep learning models h...
详细信息
The proceedings contain 35 papers. The special focus in this conference is on VLSI, Signal Processing, Power Electronics, IoT, Communication and embeddedsystems. The topics include: Sensor Perception Reliability in A...
ISBN:
(纸本)9789819746569
The proceedings contain 35 papers. The special focus in this conference is on VLSI, Signal Processing, Power Electronics, IoT, Communication and embeddedsystems. The topics include: Sensor Perception Reliability in Automated Vehicles;design and Implementation of 16-Bit Posit Arithmetic;emergence, Evolution, and applications of Medical Cyber-Physical systems;smart Insole Design for Foot Pressure Monitoring;an Efficient Cryptographic Technique to Improve Data Privacy and Security in E-Commerce;an Iot-Based Transmission Line Fault Monitoring System;performance Analysis of the System of IoT Architecture;real-time Implementation of IoT-Based Home Automation System on Arduino Platform;Data Security Paradigms: Paillier Homomorphic Encryption and AES in the Context of Privacy-Preserving Computation;a Review on Watermarking Techniques of Digital Image;An Improved Pre-charging Strategy for Asymmetrical Mixed-Cell Submodule (SM) Based Modular Multilevel Converter;classification of Remotely Sensed Data Using Fisher’s Linear Discriminant;design of Smart Wheelchair for Disabilities with Quadriplegia Through Brain–Computer Interface;Automatic Load Distribution of Power During Low Power Generation Using PLC;AI-driven Lung Cancer Detection for Rapid Analysis of Medical Imaging Data;comparative Study of Transfer Learning Models for Mushroom Classification;automated Software Defect Prediction Model: AdaBoost-Based Support Vector Machine Approach;Preprocessing of MRI Data for the Early Detection of Alzheimer’s Disease Using DWT;classification of Soybean Seed Using Support Vector Machine with Image Enhancement Techniques;From Deployment to Drift: A Comprehensive Approach to ML Model Monitoring with Evidently AI;a real-time Computer Vision-Based Green Intensity Estimation Method for the Arecanut Tree Pesticide Spraying System;key Enabling Technologies for Next-Generation Wireless systems.
With the proliferation of decentralised finance (DeFi) protocols AMMs (automated market makers) on blockchains have also gained momentum in the past few years. This is mostly attributable to the fact that they require...
详细信息
This article proposes the concept of a platform for the development, accumulation and use of specialised applications - chatbots - that automate functions related to providing information, placing orders and fulfillin...
详细信息
Traditional maintenance (TM) practices in industrial companies reveal several drawbacks. It is predominantly conducted on fixed schedules rather than based on actual equipment conditions, resulting in issues such as d...
详细信息
Drowsiness detection is a pivotal element in ensuring the well-being of both drivers' and passengers safety, as it helps prevent accidents caused by tired or drowsy individuals behind the wheel. This research intr...
详细信息
In this study, the practical application of technologies related to IoT and machine learning in the robotics education context is examined based on the use of a robot with two proximity sensors. The trial system was d...
详细信息
Many embeddedsystems have hard resource constraints that make schedules found by list scheduling heuristics infeasible. One of the main challenges yielded by memory constraints and the high degree of parallelism is d...
详细信息
暂无评论