Skip to content

Commit 7b0e7b3

Browse files
committed
fix typos 2
1 parent d79e714 commit 7b0e7b3

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

lectures/mccall_q.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Relative to the dynamic programming formulation of the McCall worker model that
3434

3535
The Q-learning algorithm invokes a statistical learning model to learn about these things.
3636

37-
Statistical learning often comes down to some version of least squares, and it will here too.
37+
Statistical learning often comes down to some version of least squares, and it will be here too.
3838

3939
Any time we say _statistical learning_, we have to say what object is being learned.
4040

@@ -484,7 +484,7 @@ pseudo-code for our McCall worker to do Q-learning:
484484
485485
4. Update the state associated with the chosen action and compute $\widetilde{TD}$ according to {eq}`eq:old4` and update $\widetilde{Q}$ according to {eq}`eq:old3`.
486486
487-
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again again according to {eq}`eq:old3`.
487+
5. Either draw a new state $w'$ if required or else take existing wage if and update the Q-table again according to {eq}`eq:old3`.
488488
489489
6. Stop when the old and new Q-tables are close enough, i.e., $\lVert\tilde{Q}^{new}-\tilde{Q}^{old}\rVert_{\infty}\leq\delta$ for given $\delta$ or if the worker keeps accepting for $T$ periods for a prescribed $T$.
490490
@@ -508,7 +508,7 @@ For example, an agent who has accepted a wage offer based on her Q-table will be
508508
509509
By using the $\epsilon$-greedy method and also by increasing the number of episodes, the Q-learning algorithm balances gains from exploration and from exploitation.
510510
511-
**Remark:** Notice that $\widetilde{TD}$ associated with an optimal Q-table defined in equation (2) automatically above satisfies $\widetilde{TD}=0$ for all state action pairs. Whether a limit of our Q-learning algorithm converges to an optimal Q-table depends on whether the algorithm visits all state, action pairs often enough.
511+
**Remark:** Notice that $\widetilde{TD}$ associated with an optimal Q-table defined in equation (2) automatically above satisfies $\widetilde{TD}=0$ for all state action pairs. Whether a limit of our Q-learning algorithm converges to an optimal Q-table depends on whether the algorithm visits all state-action pairs often enough.
512512
513513
We implement this pseudo code in a Python class.
514514
@@ -662,7 +662,7 @@ ax.legend()
662662
plt.show()
663663
```
664664
665-
Now, let us compute the case with a larger state space: $n=20$ instead of $n=10$.
665+
Now, let us compute the case with a larger state space: $n=30$ instead of $n=10$.
666666
667667
```{code-cell} ipython3
668668
n, a, b = 30, 200, 100 # default parameters
@@ -739,7 +739,7 @@ and draw a new offer next period.
739739
740740
This is an option that the McCall worker described in {doc}`this quantecon lecture <mccall_model>` would not take.
741741
742-
See {cite}`Ljungqvist2012`, chapter 7 on search, for a proof.
742+
See {cite}`Ljungqvist2012`, chapter 6 on search, for a proof.
743743
744744
But in the context of Q-learning, giving the worker the option to quit and get unemployment compensation while
745745
unemployed turns out to accelerate the learning process by promoting experimentation vis a vis premature

0 commit comments

Comments
 (0)