We propose an algorithm to approximate the semivalues of general transferable-utility cooperative games that involve a large set of players 1, . .., |N| and possibly depend on uncertain parameters. We first encode the...
详细信息
We propose an algorithm to approximate the semivalues of general transferable-utility cooperative games that involve a large set of players 1, . .., |N| and possibly depend on uncertain parameters. We first encode the game's utility function using a low-rank tensor decomposition, namely the tensor train (TT) model, which requires a limited number of function evaluations. The TT format casts the utility as a compressed tensor of shape 2|N| and makes it possible to efficiently work with the exponentially-sized set of all possible coalitions of players. Given a game compressed in this manner, the proposed algorithm obtains arbitrary semivalues without incurring additional error, in particular the Shapley values and Banzhaf-Coleman indices, which are two of the most important allocation rules in cooperative gametheory. Our algorithm takes O(|N|R2) operations per semivalue, where R is the game's TT rank. We show experimentally that many classical games can be compressed at low error with a moderate TT rank, making our algorithm more sample-efficient than Monte Carlo-based estimation. We also give a theoretical bound for the error of the semivalues obtained through our algorithm. Last, when the game depends on randomly distributed parameters, we are able to compute the expected semivalues efficiently. (c) 2021 Elsevier Inc. All rights reserved.
Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed st...
详细信息
Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems. (C) 2011 Elsevier Ltd. All rights reserved.
We present a functional framework for automated Bayesian and worst-case mechanism design, based on a two-stage game model of strategic interaction between the designer and the mechanism participants. At the core of ou...
详细信息
We present a functional framework for automated Bayesian and worst-case mechanism design, based on a two-stage game model of strategic interaction between the designer and the mechanism participants. At the core of our framework is a black-box optimization algorithm which guides the process of evaluating candidate mechanisms. We apply the approach to several classes of two-player infinite games of incomplete information, producing optimal or nearly optimal mechanisms using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametrized mechanism design for infinite Bayesian games.
A simultaneous non-zero-sum game is modeled to extend the classical network interdiction problem. In this model, an interdictor (e.g., an enforcement agent) decides how much of an inspection resource to spend along ea...
详细信息
A simultaneous non-zero-sum game is modeled to extend the classical network interdiction problem. In this model, an interdictor (e.g., an enforcement agent) decides how much of an inspection resource to spend along each arc in the network to capture a smuggler. The smuggler (randomly) selects a commodity to smugglea source and destination pair of nodes, and also a corresponding path for traveling between the given pair of nodes. This model is motivated by a terrorist organization that can mobilize its human, financial, or weapon resources to carry out an attack at one of several potential target destinations. The probability of evading each of the network arcs nonlinearly decreases in the amount of resource that the interdictor spends on its inspection. We show that under reasonable assumptions with respect to the evasion probability functions, (approximate) Nash equilibria of this game can be determined in polynomial time;depending on whether the evasion functions are exponential or general logarithmically-convex functions, exact Nash equilibria or approximate Nash equilibria, respectively, are computed. (c) 2017 Wiley Periodicals, Inc.
This paper proposes an approach to control of discrete systems with incomplete information and sensing capabilities, with respect to temporal logic constraints. The approach introduces active sensing to alleviate comp...
详细信息
This paper proposes an approach to control of discrete systems with incomplete information and sensing capabilities, with respect to temporal logic constraints. The approach introduces active sensing to alleviate computational effort in control design for systems interacting with uncontrollable environments under incomplete information. Particularly, it transforms a deterministic controller under complete information into a randomized, observation-based controller. Interleaving the latter with strategic queries to sensors, the temporal logic specification is proven to be satisfied almost surely. The effectiveness of the method is demonstrated with robotic motion planning examples.
The paper proposes a general game theoretical model, called capacity demand game, for treating simultaneous capacity requests in scarce-resource cognitive radio (CR) environments. The approach is that of non-cooperati...
详细信息
The paper proposes a general game theoretical model, called capacity demand game, for treating simultaneous capacity requests in scarce-resource cognitive radio (CR) environments. The approach is that of non-cooperative games describing CR interactions in terms of radio resource access. Experiments reveal stable states (equilibria) that favour an equitable usage of radio resources to the benefit of all participants. Several equilibria are detected and discussed: Nash (NE), Pareto, joint Nash-Pareto, and Lorenz equilibrium.
We study the route-guidance system proposed by Jahn, Mohring, Schulz, and Stier-Moses [Operations Research 53 (2005), 600-616] from a theoretical perspective. As system-optimal guidance is known to be problematic, thi...
详细信息
We study the route-guidance system proposed by Jahn, Mohring, Schulz, and Stier-Moses [Operations Research 53 (2005), 600-616] from a theoretical perspective. As system-optimal guidance is known to be problematic, this approach computes a traffic pattern that minimizes the total travel time subject to user constraints. These constraints are designed to ensure that routes suggested to users are not much longer than shortest paths for the prevailing network conditions. To calibrate the system, a certain measure-called normal length-must be selected. We show that when this length is defined as the travel time at equilibrium, the resulting traffic assignment is provably efficient and close to fair. To measure efficiency, we compare the output to the best solution without guidance and to user equilibria. To measure unfairness, we compare travel times of different users, and show that they do not differ too much. Inefficient or unfair traffic assignments cause users to travel too long or discourage people from accepting the system;either consequence would jeopardize the potential impact of a route-guidance system. (C) 2006 Wiley Periodicals, Inc.
The game of poker has been identified as a beneficial domain for current AI research because of the properties it possesses such as the need to deal with hidden information and stochasticity. The identification of pok...
详细信息
The game of poker has been identified as a beneficial domain for current AI research because of the properties it possesses such as the need to deal with hidden information and stochasticity. The identification of poker as a useful research domain has inevitably resulted in increased attention from academic researchers who have pursued many separate avenues of research in the area of computer poker. The poker domain has often featured in previous review papers that focus on games in general, however a comprehensive review paper with a specific focus on computer poker has so far been lacking in the literature. In this paper, we present a review of recent algorithms and approaches in the area of computer poker, along with a survey of the autonomous poker agents that have resulted from this research. We begin with the first serious attempts to create strong computerised poker players by constructing knowledge-based and simulation-based systems. This is followed by the use of computational game theory to construct robust poker agents and the advances that have been made in this area. Approaches to constructing exploitive agents are reviewed and the challenging problems of creating accurate and dynamic opponent models are addressed. Finally, we conclude with a selection of alternative approaches that have received attention in previously published material and the interesting problems that they pose. (C) 2011 Elsevier B.V. All rights reserved.
In a blockchain network, to mine new blocks like in cryptocurrencies or secure IoT networks, each node or player specifies the amount of computational power as its strategy by compromising between the cost and expecte...
详细信息
In a blockchain network, to mine new blocks like in cryptocurrencies or secure IoT networks, each node or player specifies the amount of computational power as its strategy by compromising between the cost and expected utility. Since the strategies of all players affect the expected utility of others through the probability of success, in this article, we first formulate the mining competition among the players in a blockchain network as a noncooperative game. The existence and uniqueness of the Nash equilibrium (NE) point of the game are proven. We consider a gradient learning strategy for the players while preserving their private information as a bounded rational learning model. Furthermore, the convergence of this learning strategy to the epsilon-NE point of the game is studied analytically using the concept of the mean field (MF) gametheory. While conventional analytical tools face problems in dealing with a large number of participants, which is a key feature in many IoT networks, deploying the MF gametheory facilitates analyzing the behavior of a large population of players by encapsulating the network behavior in an MF term. As the number of players becomes larger, the accuracy of the MF method becomes greater. Moreover, in the MF approach, no information exchange among the agents is needed for optimal decision making and the privacy of the players is preserved. The minimal information exchange is also a proper motivation for using the MF approach in the IoT networks.
暂无评论