Published January 1, 2021
| Version v1
Conference paper
Open
Combinatorial Gaussian Process Bandits with Probabilistically Triggered Arms
Description
Combinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive (O(root mTlogT gamma(PTA)(T,mu))) O (m root T log T/p*) upper bounds for the regret of ComGP-UCB that hold with high probability, where m denotes the number of base arms, p* denotes the minimum non-zero triggering probability, and gamma(PTA)(T,mu) denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS.
Files
bib-c4316a52-1514-4896-9610-d50fffb39be1.txt
Files
(191 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:d021ac2d638e89a266f99a910b90520f
|
191 Bytes | Preview Download |