Kubnal Bridge

Techniques & Methods

Markov Decision Process

An MDP formalizes decision-making as a tuple of states, actions, transition probabilities, and rewards. An agent observes the current state, takes an action, transitions to a new state with some probability, and receives a reward. The goal is to learn a policy maximizing cumulative reward.

MDPs provide the mathematical foundation for all model-based reinforcement learning. The Markov property—that future states depend only on the current state, not on history—is the key simplifying assumption that makes MDPs tractable.

Authority Links

Related Terms