You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -734,7 +734,7 @@ The above graphs indicates that
734
734
## Employed Worker Can't Quit
735
735
736
736
737
-
The preceding version of temporal difference Q-learning described in equation system (4) lets an an employed worker quit, i.e., reject her wage as an incumbent and instead receive unemployment compensation this period
737
+
The preceding version of temporal difference Q-learning described in equation system {eq}`eq:old4` lets an employed worker quit, i.e., reject her wage as an incumbent and instead receive unemployment compensation this period
738
738
and draw a new offer next period.
739
739
740
740
This is an option that the McCall worker described in {doc}`this quantecon lecture <mccall_model>` would not take.
@@ -756,11 +756,11 @@ $$
756
756
\end{aligned}
757
757
$$ (eq:temp-diff)
758
758
759
-
It turns out that formulas {eq}`eq:temp-diff` combined with our Q-learning recursion (3) can lead our agent to eventually learn the optimal value function as well as in the case where an option to redraw can be exercised.
759
+
It turns out that formulas {eq}`eq:temp-diff` combined with our Q-learning recursion {eq}`eq:old3` can lead our agent to eventually learn the optimal value function as well as in the case where an option to redraw can be exercised.
760
760
761
761
But learning is slower because an agent who ends up accepting a wage offer prematurally loses the option to explore new states in the same episode and to adjust the value associated with that state.
762
762
763
-
This can leads to inferior outcomes when the number of epochs/episods is low.
763
+
This can lead to inferior outcomes when the number of epochs/episods is low.
0 commit comments