巡り廻る
  • Home
Subscribe
Tagged

reinforcement learning

A collection of 1 post

The Multi-Armed Bandit Problem
reinforcement learning

The Multi-Armed Bandit Problem

Problem Setup The multi-armed bandit problem can be described as a Markov decision process, a tuple $\langle \mathcal{X}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma \rangle$, with only one state. $\mathcal{X} = {x}$ is a finite set of states $\mathcal{A}$ is a finite set of actions $\mathcal{P}

  • Chenglin Lu
Chenglin Lu Oct 20, 2018 • 2 min read
巡り廻る © 2022
Powered by Ghost