We propose an algorithm to approximate the semivalues of general transferable-utility cooperative games that involve a large set of players 1, . .., |N| and possibly depend on uncertain parameters. We first encode the...
详细信息
We propose an algorithm to approximate the semivalues of general transferable-utility cooperative games that involve a large set of players 1, . .., |N| and possibly depend on uncertain parameters. We first encode the game's utility function using a low-rank tensor decomposition, namely the tensor train (TT) model, which requires a limited number of function evaluations. The TT format casts the utility as a compressed tensor of shape 2|N| and makes it possible to efficiently work with the exponentially-sized set of all possible coalitions of players. Given a game compressed in this manner, the proposed algorithm obtains arbitrary semivalues without incurring additional error, in particular the Shapley values and Banzhaf-Coleman indices, which are two of the most important allocation rules in cooperative gametheory. Our algorithm takes O(|N|R2) operations per semivalue, where R is the game's TT rank. We show experimentally that many classical games can be compressed at low error with a moderate TT rank, making our algorithm more sample-efficient than Monte Carlo-based estimation. We also give a theoretical bound for the error of the semivalues obtained through our algorithm. Last, when the game depends on randomly distributed parameters, we are able to compute the expected semivalues efficiently. (c) 2021 Elsevier Inc. All rights reserved.
Weighted timed games are zero-sum games played by two players on a timed automaton equipped with weights, where one player wants to minimise the cumulative weight while reaching a target. Used in a reactive synthesis ...
详细信息
Weighted timed games are zero-sum games played by two players on a timed automaton equipped with weights, where one player wants to minimise the cumulative weight while reaching a target. Used in a reactive synthesis perspective, this quantitative extension of timed games allows one to measure the quality of controllers in real-time systems. Weighted timed games are notoriously difficult and quickly undecidable, even when restricted to non-negative weights. For non-negative weights, a fragment of weighted timed games that can be analysed has been introduced by Bouyer, Jaziri and Markey in 2015. Though the value problem is undecidable even in this fragment, the authors show how to approximate the value by considering regions with a refined granularity. In this work, we extend this class to incorporate negative weights, allowing one to model energy for instance, and prove that the value can still be approximated, with the same complexity (provided that clocks are bounded). A restriction also allows us to obtain a class of decidable weighted timed games with negative weights and an arbitrary number of clocks. In addition, we show that a symbolic algorithm, relying on the paradigm of value iteration, can be used as an approximation/computation schema over these classes. We also consider the special case of untimed weighted games, where the same fragments are solvable in polynomial time: this contrasts with the pseudo-polynomial complexity, known so far, for weighted games without restrictions.
In previous work on higher-order games, we accounted for finite games of unbounded length by working with continuous outcome functions, which carry implicit game trees. In this work we make such trees explicit. We use...
详细信息
In previous work on higher-order games, we accounted for finite games of unbounded length by working with continuous outcome functions, which carry implicit game trees. In this work we make such trees explicit. We use concepts from dependent type theory to capture history-dependent games, where the set of available moves at a given position in the game depends on the moves played up to that point. In particular, games are modelled by a W-type, which is essentially the same type used by Aczel to model constructive Zermelo-Frankel set theory (CZF). We have also implemented all our definitions, constructions, results and proofs in the dependently-typed programming language Agda, which, in particular, allows us to run concrete examples of computations of optimal strategies, that is, strategies in subgame perfect equillibrium.(c) 2023 The Author(s). Published by Elsevier B.V.
When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all pl...
详细信息
ISBN:
(纸本)9798400714269
When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces progressive hiding, an algorithm that balances learning the basic mechanics of an imperfect information game and satisfying the information constraints. Progressive hiding is inspired by methods from stochastic multistage optimization such as scenario decomposition and progressive hedging. We prove that it enables the adaptation of counterfactual regret minimization to games where perfect recall is not satisfied. Numerical experiments illustrate that progressive hiding produces notable improvements in several settings.
My Ph.D. research introduces an approach to learning families of parametrically related symmetric game instances in both normal-form and Bayesian settings. This approach eliminates the need to model each game instance...
详细信息
ISBN:
(纸本)9798400714269
My Ph.D. research introduces an approach to learning families of parametrically related symmetric game instances in both normal-form and Bayesian settings. This approach eliminates the need to model each game instance separately, improving data efficiency and enabling broader exploration of the parameter space. game-family models support more comprehensive empirical mechanism design, and facilitate iterative generation of piecewise best-response strategies in the Bayesian setting. Overall, my work aims to expand model expressiveness while prioritizing compactness and data efficiency in simulation-based game environments, promoting diverse real-world insights.
game-theoretic models of influence in networks often assume the network structure to be static. In this paper, we allow the network structure to vary according to the underlying behavioral context. This leads to sever...
详细信息
My Ph.D. research focuses on developing practical algorithms in computer games by assembling a variety of artificial intelligence methods (game-tree search, machine learning, graphical models, etc.). In this extended ...
详细信息
ISBN:
(纸本)9781450394321
My Ph.D. research focuses on developing practical algorithms in computer games by assembling a variety of artificial intelligence methods (game-tree search, machine learning, graphical models, etc.). In this extended abstract, I will briefly review three of my previous works that studied normal-form games, Bayesian games, and extensive-form games through modern AI lenses. Then I will cast three possible future directions that I am dedicating to.
In a blockchain network, to mine new blocks like in cryptocurrencies or secure IoT networks, each node or player specifies the amount of computational power as its strategy by compromising between the cost and expecte...
详细信息
In a blockchain network, to mine new blocks like in cryptocurrencies or secure IoT networks, each node or player specifies the amount of computational power as its strategy by compromising between the cost and expected utility. Since the strategies of all players affect the expected utility of others through the probability of success, in this article, we first formulate the mining competition among the players in a blockchain network as a noncooperative game. The existence and uniqueness of the Nash equilibrium (NE) point of the game are proven. We consider a gradient learning strategy for the players while preserving their private information as a bounded rational learning model. Furthermore, the convergence of this learning strategy to the epsilon-NE point of the game is studied analytically using the concept of the mean field (MF) gametheory. While conventional analytical tools face problems in dealing with a large number of participants, which is a key feature in many IoT networks, deploying the MF gametheory facilitates analyzing the behavior of a large population of players by encapsulating the network behavior in an MF term. As the number of players becomes larger, the accuracy of the MF method becomes greater. Moreover, in the MF approach, no information exchange among the agents is needed for optimal decision making and the privacy of the players is preserved. The minimal information exchange is also a proper motivation for using the MF approach in the IoT networks.
We develop a game-theoretic model to guide the choice of the reward structure in customer loyalty programs. We model a duopoly market in which one firm adopts a loyalty program. Firms independently and simultaneously ...
详细信息
We develop a game-theoretic model to guide the choice of the reward structure in customer loyalty programs. We model a duopoly market in which one firm adopts a loyalty program. Firms independently and simultaneously set the prices and rewards. Heterogeneous customers buy homogeneous products in a multi-period setting. Customers are segmented into three groups based on their level of strategic behavior, which is expressed in terms of their degree of forward-lookingness. We use two exogenous parameters to represent the size of each segment. A third parameter captures the point pressure effect, which refers to the increase in customer spending as they approach a reward threshold. In each period, customers choose the firm that maximizes their utility, which is a function of offered prices, rewards, and the distance to the next reward. We use the logit model to model the customer choice behavior. Customers' accumulated purchases evolve as a Markov chain. We derive the limiting distribution of accumulated purchases, which is subsequently used to formulate the firm's expected revenue functions. We develop two algorithms to find the Nash equilibrium for both the linear and nonlinear rewards in term of the three parameters. Using a thorough numerical analysis, we show that the choice of the structure becomes more critical as the size of the strategic segment increases. The nonlinear scheme is superior when the size of the highly-strategic segment is very small. The linear rewards is superior in markets where the size of the highly-strategic segment and the sensitivity to distance are simultaneously not small.
暂无评论