In this work, we exploit the potential benefits of multi-arm bandit scheme in cooperative multiple-input multiple output (MIMO) wireless networks. In particular, we consider an online-policy for amplify-and-forward MI...
详细信息
ISBN:
(纸本)9781728150895
In this work, we exploit the potential benefits of multi-arm bandit scheme in cooperative multiple-input multiple output (MIMO) wireless networks. In particular, we consider an online-policy for amplify-and-forward MIMO relay selection (RS), where relays are provided with uncertain channel state information (CSI). We design the RS policy as a sequential experience-driven learning algorithm with a contextual bandit (CB) approach, where the algorithm learns to select an optimal relay node using the imperfect CSI provided as a context vector and the past experience of rewards procured with current policy, with the aim of maximizing the cumulative mean reward over time. Further, with extensive simulation result, we demonstrate that proposed CB based RS policy achieves superior performance gains compared to conventional Gram-Schmidt method.
In recent years, there has been an increasing interest in the study of human-robot interactions (HRI). In HRI tasks, the strengths of both human and robot can be utilized in task execution in a complimentary way. Acco...
详细信息
ISBN:
(纸本)9781728190938
In recent years, there has been an increasing interest in the study of human-robot interactions (HRI). In HRI tasks, the strengths of both human and robot can be utilized in task execution in a complimentary way. According to different scenarios of human-robot interaction, the interaction task requirements and the external sensory systems or configurations adopted are usually different. Majority of the existing works focus on developing the control methods for some specific applications or with specific sensors and few results has been presented to formulate different interaction task requirements and various sensory models in a general way. In this paper, a human-robot interaction task variable that is able to describe various interaction task requirements in a unified way is integrated with a general sensory model obtained from on an offline neural network based learning algorithm so that various external sensors can be directly used in the interaction control systems to provide various sensory information so as to enhance the perception capability. We present a robot controller by combining the human-robot interaction task variable and the general sensory model so as to achieve various human-robot interaction tasks based on various external sensors by simply adjusting the task parameters and training the system, without having to modify the sensory models or controller. Convergence analysis of the proposed offline neural network based learning algorithm is shown and experimental results are presented to illustrate the performance of the proposed method.
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement learning (MARL) methods ty...
详细信息
ISBN:
(纸本)9781713821120
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.
This paper discusses the main types of applications for natural language (NL) processing. Depending on the purpose of the semantic system, a natural language model is built. Based on this model an application is devel...
详细信息
ISBN:
(纸本)9781728180755
This paper discusses the main types of applications for natural language (NL) processing. Depending on the purpose of the semantic system, a natural language model is built. Based on this model an application is developed. The model built is called a linguistic processor. Certain models imply the use of specific algorithms which typically take a lot of computing and information resources. This article describes a new approach to building language processors - the use of systems with predefined semantics. The definition of key concepts in such systems is given: seme, semantic pattern, recognition tree, semantic spectrum, elementary and compound solutions. Also, the main algorithms for implementation of these systems are reviewed. The system capabilities are checked by a software prototype. The test results have shown the following results:- almost all main features of NL are taken into account (polysemy, homonymy, synonymy, the dependence of meaning of the text on the context, etc.);- the system is practically universal, since it allows to implement applications designed for any purpose;- the suggested approach is tested by the implementation of such applications as command systems, text classification systems and text generating systems (e.g., question-and-answer systems);- the system takes significantly less information and computing resources than those used by the already created applications for NL text processing;- the only disadvantage of the suggested approach is the need for timeconsuming learning of the system being built for it to be able to operate within a certain domain.
Malaria is a contagious disease that affects millions of lives every year. Traditional diagnosis of malaria in laboratory requires an experienced person and careful inspection to discriminate healthy and infected red ...
详细信息
Malaria is a contagious disease that affects millions of lives every year. Traditional diagnosis of malaria in laboratory requires an experienced person and careful inspection to discriminate healthy and infected red blood cells (RBCs). It is also very time-consuming and may produce inaccurate reports due to human errors. Cognitive computing and deep learning algorithms simulate human intelligence to make better human decisions in applications like sentiment analysis, speech recognition, face detection, disease detection, and prediction. Due to the advancement of cognitive computing and machine learning techniques, they are now widely used to detect and predict early disease symptoms in healthcare field. With the early prediction results, healthcare professionals can provide better decisions for patient diagnosis and treatment. Machine learning algorithms also aid the humans to process huge and complex medical datasets and then analyze them into clinical insights. This paper looks for leveraging deep learning algorithms for detecting a deadly disease, malaria, for mobile healthcare solution of patients building an effective mobile system. The objective of this paper is to show how deep learning architecture such as convolutional neural network (CNN) which can be useful in real-time malaria detection effectively and accurately from input images and to reduce manual labor with a mobile application. To this end, we evaluate the performance of a custom CNN model using a cyclical stochastic gradient descent (SGD) optimizer with an automatic learning rate finder and obtain an accuracy of 97.30% in classifying healthy and infected cell images with a high degree of precision and sensitivity. This outcome of the paper will facilitate microscopy diagnosis of malaria to a mobile application so that reliability of the treatment and lack of medical expertise can be solved.
The continuous progress of computer science and technology has accelerated the pace of informatization construction of the medical system. Medical technology has developed rapidly in various research directions, and t...
详细信息
The continuous progress of computer science and technology has accelerated the pace of informatization construction of the medical system. Medical technology has developed rapidly in various research directions, and the construction of medical IT systems has been continuously improved. The popular application of electronic medical records has produced massive medical data in the medical process. At the same time, in medical behavior, more and more rely on data to make relevant judgments. The coverage of medical equipment is becoming more and more extensive, and the accuracy of data is constantly improving, and the clinical diagnosis is gradually shifting from qualitative judgment to quantitative analysis. Based on the analysis of electronic medical record data, this article studies and analyzes the risk factors leading to diabetes. By analyzing the characteristic variables, the risk factors significantly related to diabetes are obtained as the input variables of the BP neural network model. For complex problems, machine learning algorithms have higher accuracy and stronger generalization capabilities. Based on the BP artificial neural network model, this paper builds and builds a machine learning simulation to predict diabetes.
Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is generally believed that sampling the training examples without-replacement, also know...
详细信息
ISBN:
(纸本)9781713829546
Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is generally believed that sampling the training examples without-replacement, also known as random reshuffling, causes learning algorithms to converge faster. We give a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffling in learning algorithms [19]. We use this to give an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling.
This paper presents a decentralized learning algorithm for learning how to coordinate an automated team of actuated parts designed to build several types of structures specified by a user on a plane surface. The algor...
详细信息
ISBN:
(纸本)9781728153650
This paper presents a decentralized learning algorithm for learning how to coordinate an automated team of actuated parts designed to build several types of structures specified by a user on a plane surface. The algorithm learns from the environment feedback and agent behavior. This problem is defined as a Markov decision process where agents (actuated parts) are modeled as small cube-shaped robots subject to the Bellman's equation (Q-learning). The Q-learning algorithm considers the communication and conflict resolution models between the agents that lead to the emergence of intelligent global behavior (in a non-stationary stochastic environment). The main contribution of this paper is to propose a self-assembly approach capable of randomly generating the navigation routes of the multiple agents while learning the structure shape according to the hazardous dispersion area that must be isolated in the environment. Simulation trials show the feasibility of merging between the multi-agent coordination process and anti-collision strategy where different case studies are analysed and discussed.
Reward decomposition, which aims to decompose the full reward into multiple sub-rewards, has been proven beneficial for improving sample efficiency in reinforcement learning. Existing works on discovering reward decom...
详细信息
ISBN:
(纸本)9781713829546
Reward decomposition, which aims to decompose the full reward into multiple sub-rewards, has been proven beneficial for improving sample efficiency in reinforcement learning. Existing works on discovering reward decomposition are mostly policy dependent, which constrains diversified or disentangled behavior between different policies induced by different sub-rewards. In this work, we propose a set of novel policy-independent reward decomposition principles by constraining uniqueness and compactness of different state representations relevant to different sub-rewards. Our principles encourage sub-rewards with minimal relevant features, while maintaining the uniqueness of each sub-reward. We derive a deep learning algorithm based on our principle, and refer to our method as RD2, since we learn reward decomposition and disentangled representation jointly. RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments where the reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD2 against existing reward decomposition methods.
In many video coding systems, separable transforms (such as two-dimensional DCT-2) have been used to code block residual signals obtained after prediction. This paper proposes a parametric approach to build graph-base...
详细信息
ISBN:
(纸本)9781728163956
In many video coding systems, separable transforms (such as two-dimensional DCT-2) have been used to code block residual signals obtained after prediction. This paper proposes a parametric approach to build graph-based separable transforms (GBSTs) for video coding. Specifically, a GBST is derived from a pair of line graphs, whose weights are determined based on two non-negative parameters. As certain choices of those parameters correspond to the discrete sine and cosine transform types used in recent video coding standards (including DCT-2, DST-7 and DCT-8), this paper further optimizes these graph parameters to better capture residual block statistics and improve video coding efficiency. The proposed GBSTs are tested on the Versatile Video Coding (VVC) reference software, and the experimental results show that about 0.4% average coding gain is achieved over the existing set of separable transforms constructed based on DCT-2, DST-7 and DCT-8 in VVC.
暂无评论