Skip to content

Commit 71bd205

Browse files
committed
Fix small typos
1 parent 98f5bef commit 71bd205

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

lectures/mccall_q.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The Q-learning algorithm combines ideas from
2323

2424
* dynamic programming
2525

26-
* a recursive version of least squares known as **temporal difference learning**
26+
* a recursive version of least squares known as [temporal difference learning](https://en.wikipedia.org/wiki/Temporal_difference_learning).
2727

2828
This lecture applies a Q-learning algorithm to the situation faced by a McCall worker.
2929

@@ -101,9 +101,6 @@ The worker's income $y_t$ equals his wage $w$ if he is employed, and unemploymen
101101
An optimal value $V\left(w\right) $ for a McCall worker who has just received a wage offer $w$ and is deciding whether
102102
to accept or reject it satisfies the Bellman equation
103103

104-
105-
106-
107104
$$
108105
V\left(w\right)=\max_{\text{accept, reject}}\;\left\{ \frac{w}{1-\beta},c+\beta\int V\left(w'\right)dF\left(w'\right)\right\}
109106
$$ (eq_mccallbellman)
@@ -281,15 +278,15 @@ These equations are aligned with the Bellman equation for the worker's optimal
281278
Evidently, the optimal value function $V(w)$ described in that lecture is related to our Q-function by
282279
283280
$$
284-
V(w) = \max_{\textrm{accept},\textrm{reject}} \left\{ Q(w, \text{accept} \right), Q\left(w,\text{reject} \right\}
281+
V(w) = \max_{\textrm{accept},\textrm{reject}} \left\{ Q(w, \text{accept} \right), Q\left(w,\text{reject} \right)\}
285282
$$
286283
287284
If we stare at the second equation of system {eq}`eq:impliedq`, we notice that since the wage process is identically and independently distributed over time,
288285
$Q\left(w,\text{reject}\right)$, the right side of the equation is independent of the current state $w$.
289286
290287
So we can denote it as a scalar
291288
292-
$$ Q_r=Q\left(w,\text{reject}\right),\forall w\in\mathcal{W}.
289+
$$ Q_r := Q\left(w,\text{reject}\right) \quad \forall \, w\in\mathcal{W}.
293290
$$
294291

295292
This fact provides us with an
@@ -737,7 +734,7 @@ The above graphs indicates that
737734
## Employed Worker Can't Quit
738735
739736
740-
The preceding version of temporal difference Q-learning described in equation system (4) lets an an employed worker quit, i.e., reject her wage as an incumbent and instead accept receive unemployment compensation this period
737+
The preceding version of temporal difference Q-learning described in equation system (4) lets an an employed worker quit, i.e., reject her wage as an incumbent and instead receive unemployment compensation this period
741738
and draw a new offer next period.
742739
743740
This is an option that the McCall worker described in {doc}`this quantecon lecture <mccall_model>` would not take.

0 commit comments

Comments
 (0)