Kubnal Bridge

Techniques & Methods

Bandit Optimization

Bandit optimization, named for slot machine ("one-armed bandit") problems, seeks to maximize cumulative reward when action outcomes are uncertain. At each step, the algorithm must decide whether to exploit the currently best-known option or explore less-tried options that might be better.

Bandit algorithms (UCB, Thompson Sampling, Epsilon-Greedy) are used in A/B testing, recommendation systems, ad placement, and hyperparameter tuning. They are more efficient than grid search because they allocate more trials to promising configurations.

Authority Links

Related Terms