You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/mccall_q.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ Relative to the dynamic programming formulation of the McCall worker model that
34
34
35
35
The Q-learning algorithm invokes a statistical learning model to learn about these things.
36
36
37
-
Statistical learning often comes down to some version of least squares, and it will here too.
37
+
Statistical learning often comes down to some version of least squares, and it will be here too.
38
38
39
39
Any time we say _statistical learning_, we have to say what object is being learned.
40
40
@@ -484,7 +484,7 @@ pseudo-code for our McCall worker to do Q-learning:
484
484
485
485
4. Update the state associated with the chosen action and compute $\widetilde{TD}$ according to {eq}`eq:old4` and update $\widetilde{Q}$ according to {eq}`eq:old3`.
486
486
487
-
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again again according to {eq}`eq:old3`.
487
+
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again according to {eq}`eq:old3`.
488
488
489
489
6. Stop when the old and new Q-tables are close enough, i.e., $\lVert\tilde{Q}^{new}-\tilde{Q}^{old}\rVert_{\infty}\leq\delta$ for given $\delta$ or if the worker keeps accepting for $T$ periods for a prescribed $T$.
490
490
@@ -508,7 +508,7 @@ For example, an agent who has accepted a wage offer based on her Q-table will be
508
508
509
509
By using the $\epsilon$-greedy method and also by increasing the number of episodes, the Q-learning algorithm balances gains from exploration and from exploitation.
510
510
511
-
**Remark:** Notice that $\widetilde{TD}$ associated with an optimal Q-table defined in equation (2) automatically above satisfies $\widetilde{TD}=0$ for all state action pairs. Whether a limit of our Q-learning algorithm converges to an optimal Q-table depends on whether the algorithm visits all state, action pairs often enough.
511
+
**Remark:** Notice that $\widetilde{TD}$ associated with an optimal Q-table defined in equation (2) automatically above satisfies $\widetilde{TD}=0$ for all state action pairs. Whether a limit of our Q-learning algorithm converges to an optimal Q-table depends on whether the algorithm visits all state-action pairs often enough.
512
512
513
513
We implement this pseudo code in a Python class.
514
514
@@ -662,7 +662,7 @@ ax.legend()
662
662
plt.show()
663
663
```
664
664
665
-
Now, let us compute the case with a larger state space: $n=20$ instead of $n=10$.
665
+
Now, let us compute the case with a larger state space: $n=30$ instead of $n=10$.
666
666
667
667
```{code-cell} ipython3
668
668
n, a, b = 30, 200, 100 # default parameters
@@ -739,7 +739,7 @@ and draw a new offer next period.
739
739
740
740
This is an option that the McCall worker described in {doc}`this quantecon lecture <mccall_model>` would not take.
741
741
742
-
See {cite}`Ljungqvist2012`, chapter 7 on search, for a proof.
742
+
See {cite}`Ljungqvist2012`, chapter 6 on search, for a proof.
743
743
744
744
But in the context of Q-learning, giving the worker the option to quit and get unemployment compensation while
745
745
unemployed turns out to accelerate the learning process by promoting experimentation vis a vis premature
0 commit comments