diff --git a/units/en/unit2/mid-way-recap.mdx b/units/en/unit2/mid-way-recap.mdx index b6440407..bd488c39 100644 --- a/units/en/unit2/mid-way-recap.mdx +++ b/units/en/unit2/mid-way-recap.mdx @@ -6,7 +6,7 @@ We have two types of value-based functions: - State-value function: outputs the expected return if **the agent starts at a given state and acts according to the policy forever after.** - Action-value function: outputs the expected return if **the agent starts in a given state, takes a given action at that state** and then acts accordingly to the policy forever after. -- In value-based methods, rather than learning the policy, **we define the policy by hand** and we learn a value function. If we have an optimal value function, we **will have an optimal policy.** +- In value-based methods, rather than learning the policy, **we focus on learning a value function**. An optimal value function, **will lead us to an optimal policy.** There are two types of methods to update the value function: