Stochastic Multi-Armed Bandit With Knapsack & Distributing AP/Server Selection Problem

13 views

Create Account or Sign In to post comments

In this talk, Ekram Hossain discusses a BwK model and shows its application to the distributed access point (AP) or server selection problem. Hossain also discusses a linear contextual bandit with a knapsack model for the same problem.

Multi-armed bandits (MAB) is a popular sequential decision-making technique ideal for decision-making under uncertainty given no prior knowledge of the environment. It uses the history of previous decisions and observations as well as side information, if available, to arrive at the current decision.

The classic MAB algorithm such as the upper confidence bound (UCB) algorithm concerned with learning the single optimal action among a set of candidate actions with unknown rewards. Different from traditional bandits, bandits with knapsacks (BwK) can model more sophisticated distributed decision-making problems under global constraints.

Multi-armed bandits (MAB) is a popular sequential decision-making technique ideal for...