Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a n...
详细信息
Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov's theorem and the use of the Radon Nikodým derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodým extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.
We derive Policy Gradients(PGs) with time varying parameterizations for nonlinear diffusion processes affine in noise. The resulting policies have the form of reward weighted gradient. The analysis is in continuous ti...
详细信息
ISBN:
(纸本)9781467357159
We derive Policy Gradients(PGs) with time varying parameterizations for nonlinear diffusion processes affine in noise. The resulting policies have the form of reward weighted gradient. The analysis is in continuous time and includes the case of linear and nonlinear parameterizations. Examples on stochastic control problems for diffusions processes are provided.
作者:
Imran Shafique AnsariFerkan YilmazMohamed-Slim AlouiniComputer
Electrical and Mathematical Sciences and Engineering (CEMSE) Division at King Abdullah University of Science and Technology (KAUST) Al-Khawarizmi Applied Math. Building (Bldg. #1) Thuwal 23955-6900 Makkah Province Kingdom of Saudi Arabia
The probability density function (PDF) and cumulative distribution function of the sum of L independent but not necessarily identically distributed squared η-μ variates, applicable to the output statistics of maxima...
详细信息
ISBN:
(纸本)9781467363365
The probability density function (PDF) and cumulative distribution function of the sum of L independent but not necessarily identically distributed squared η-μ variates, applicable to the output statistics of maximal ratio combining (MRC) receiver operating over η-μ fading channels that includes the Hoyt and the Nakagami-m models as special cases, is presented in closed-form in terms of the Fox's H function. Further analysis, particularly on the bit error rate via PDF-based approach, is also represented in closed form in terms of the extended Fox's H function (H). The proposed new analytical results complement previous results and are illustrated by extensive numerical and Monte Carlo simulation results.
Power-hungry Graphics processing unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resou...
详细信息
Power-hungry Graphics processing unit (GPU) accelerators are ubiquitous in high performance computing data centers today. GPU virtualization frameworks introduce new opportunities for effective management of GPU resources by decoupling them from application execution. However, power management of GPU-enabled server clusters faces significant challenges. The underlying system infrastructure shows complex power consumption characteristics depending on the placement of GPU workloads across various compute nodes, power-phases and cabinets in a datacenter. GPU resources need to be scheduled dynamically in the face of time-varying resource demand and peak power constraints. We propose and develop a power-aware virtual OpenCL (pVOCL) framework that controls the peak power consumption and improves the energy efficiency of the underlying server system through dynamic consolidation and power-phase topology aware placement of GPU workloads. Experimental results show that pVOCL achieves significant energy savings compared to existing power management techniques for GPU-enabled server clusters, while incurring negligible impact on performance. It drives the system towards energy-efficient configurations by taking an optimal sequence of adaptation actions in a virtualized GPU environment and meanwhile keeps the power consumption below the peak power budget.
Software testing is a critical process for achieving product quality. Its importance is more and more recognized, and there is a growing concern in improving the accomplishment of this process. In this context, Knowle...
详细信息
Software testing is a critical process for achieving product quality. Its importance is more and more recognized, and there is a growing concern in improving the accomplishment of this process. In this context, Knowledge Management emerges as an important supporting tool. However, managing relevant knowledge to reuse is difficult and it requires some means to represent and to associate semantics to a large volume of test information. In order to address this problem, we have developed a Reference Ontology on Software Testing (ROost). ROost is built reusing ontology patterns from the Software Process Ontology Pattern Language (SP-OPL). In this paper, we discuss how ROost was developed, and present a fragment of Roost that concerns with software testing process, its activities, artifacts, and procedures.
In dynamic spectrum access, cognitive radio networks compete with one another to acquire under-utilized and idle channels for data transmission. Each network strives to acquire as many channels as possible to satisfy ...
详细信息
In dynamic spectrum access, cognitive radio networks compete with one another to acquire under-utilized and idle channels for data transmission. Each network strives to acquire as many channels as possible to satisfy its maximum channel requirement. The main goal of a network is to maximize its utility and at the same time minimize the amount of contention it experiences with other secondary networks. In this paper we focus on the study of channel acquisition and contention handling mechanisms that a system of networks can adopt to maximize utility and minimize contention. A system of networks becomes stable if all networks have acquired enough number of channels to satisfy their channel requirements without any contention between them. It is obvious that a network's utility depends on the stability of the system and how fast the stability is attained. The mechanisms proposed in this paper ensure fast convergence of the system by minimizing the contention experienced until a state of equilibrium is finally attained. Simulation experiments were conducted taking into consideration the unpredictable behavior of the primary users who are the licensed owners of the spectrum bands.
Interface types in OO languages support polymorphism, abstraction and information hiding by separating interfaces from their implementations. The separation enhances modularity of programs, however, it causes also cha...
详细信息
The discovery of miRNAs had great impacts on traditional biology. Typically, miRNAs have the potential to bind to the 3'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression....
详细信息
Many Android apps have a legitimate need to communicate over the Internet and are then responsible for protecting potentially sensitive data during transit. This paper seeks to better understand the potential security...
详细信息
Mobile technologies and Web 2.0 have led to explosions of communication, with a resulting increased need for people to process and utilize that new communication. This paper presents a new approach to create an integr...
详细信息
暂无评论