Reinforcement Learning from Human Feedback (RLHF) is key to aligning Large Language Models (LLMs), typically paired with the Proximal Policy Optimization (PPO) algorithm. While PPO is a powerful method designed for ge...
详细信息
Reinforcement Learning from Human Feedback (RLHF) is key to aligning Large Language Models (LLMs), typically paired with the Proximal Policy Optimization (PPO) algorithm. While PPO is a powerful method designed for general reinforcement learning tasks, it is overly sophisticated for LLMs, leading to laborious hyper-parameter tuning and significant computation burdens. To make RLHF efficient, we present ReMax, which leverages 3 properties of RLHF: fast simulation, deterministic transitions, and trajectory-level rewards. These properties are not exploited in PPO, making it less suitable for RLHF. Building on the renowned REINFORCE algorithm, ReMax does not require training an additional value model as in PPO and is further enhanced with a new variance reduction technique. ReMax offers several benefits over PPO: it is simpler to implement, eliminates more than 4 hyper-parameters in PPO, reduces GPU memory usage, and shortens training time. ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO. Applying ReMax to a Mistral-7B model resulted in a 94.78% win rate on the AlpacaEval leaderboard and a 7.739 score on MT-bench, setting a new SOTA for open-source 7B models. These results show the effectiveness of ReMax while addressing the limitations of PPO in LLMs. Copyright 2024 by the author(s)
The ubiquity of spatio-temporal data in the real world presents challenges for predictive modeling due to the intricate interplay of temporal trends and spatial correlations. In this paper, we address these challenges...
详细信息
The substitution box,often known as an S-box,is a nonlinear component that is a part of several block *** purpose is to protect cryptographic algorithms from a variety of cryptanalytic assaults.A Multi-Criteria Decisi...
详细信息
The substitution box,often known as an S-box,is a nonlinear component that is a part of several block *** purpose is to protect cryptographic algorithms from a variety of cryptanalytic assaults.A Multi-Criteria Decision Making(MCDM)problem has a complex selection procedure because of having many options and criteria to choose *** of this,statistical methods are necessary to assess the performance score of each S-box and decide which option is the best one available based on this *** the Pythagorean Fuzzy-based Technique for Order of Preference by Similarity to Ideal Solution(TOPSIS)method,the major objective of this investigation is to select the optimal S-box to be implemented from a pool of twelve key *** the help of the Pythagorean fuzzy set(PFS),the purpose of this article is to evaluate whether this nonlinear component is suitable for use in a variety of encryption *** this article,we have considered various characteristics of S-boxes,including nonlinearity,algebraic degree,strict avalanche criterion(SAC),absolute indicator,bit independent criterion(BIC),sum of square indicator,algebraic immunity,transparency order,robustness to differential cryptanalysis,composite algebraic immunity,signal to noise ratio-differential power attack(SNR-DPA),and confusion coefficient variance on some standard S-boxes that are Advanced Encryption Following this,the findings of the investigation are changed into Pythagorean fuzzy numbers in the shape of a *** matrix is then subjected to an analysis using the TOPSIS method,which is dependent on the Pythagorean fuzzy set,to rank the most suitable S-box for use in encryption applications.
This study uses the Weinbaum-Jiji bioheat model to numerically simulate heat transfer through skin tissue that is exposed to a constant heat flux. The classical Weinbaum-Jiji equation is modified by including the Capu...
详细信息
We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for mu...
详细信息
We conducted a study on the electron stopping power of protons in aluminum at finite electron temperatures, utilizing time-dependent density functional theory nonadiabatically coupled with molecular dynamics. Our inve...
详细信息
We conducted a study on the electron stopping power of protons in aluminum at finite electron temperatures, utilizing time-dependent density functional theory nonadiabatically coupled with molecular dynamics. Our investigation focused on protons with initial velocities ranging from 0.1 to 1.0 a.u., providing a wealth of detailed information on the electronic states involved in the stopping process, with exceptional spatial and temporal resolution. Our results show that the electron temperature can significantly effect the electron stopping power. A quantum-blocking mechanism based on a physical picture of electronic transitions in energy levels has been proposed for explaining the phenomenon of electron stopping power decreasing with the increase of target electron temperature.
Analysis of import substitution processes is one of the urgent problems for many countries in the context of deglobalization of the world economy. Mathematical methods of convex analysis make it possible to construct ...
详细信息
The COVID-19 pandemic has necessitated large-scale vaccination campaigns to control the virus's spread. Given the scale of these efforts, it's crucial to scrutinize potential adverse effects, especially in vul...
详细信息
Typically, prostate evaluation is done by using different imaging sequences of magnetic resonance imaging. Dynamic contrast enhancement, one of such scanning modalities, allow to spot higher vascular permeability and ...
详细信息
Recent research on nanostructures has demonstrated their importance and application in a variety of *** are used directly or indirectly in drug delivery systems,medicine and pharmaceuticals,biological sensors,photodet...
详细信息
Recent research on nanostructures has demonstrated their importance and application in a variety of *** are used directly or indirectly in drug delivery systems,medicine and pharmaceuticals,biological sensors,photodetectors,transistors,optical and electronic devices,and so *** discovery of carbon nanotubes with Y-shaped junctions is motivated by the development of future advanced electronic *** of their interactionwithY-junctions,electronic switches,amplifiers,and three-terminal transistors are of particular *** is a concept that determines the uncertainty of a system or *** concepts are also used in biology,chemistry,and applied *** on the requirements,entropy in the form of a graph can be classified into several *** 1955,graph-based entropy was *** of the types of entropy is edgeweighted *** examined the abstract form of Y-shaped junctions in this *** edge-weight-based entropy formulas for the generic view of Y-shaped junctions were created,and some edge-weighted and topological index-based concepts for Y-shaped junctions were discussed in the present paper.
暂无评论