@@ -101,7 +101,7 @@ class FirstByDowning(Player):
101101 S, P(C_o | C_s) and the conditional probability that O will choose C
102102 following D by S, P(C_o, D_s)."
103103
104- Throughout the paper the strategy (S) assumes that the opponent (D ) is
104+ Throughout the paper the strategy (S) assumes that the opponent (O ) is
105105 playing a reactive strategy defined by these two conditional probabilities.
106106
107107 The strategy aims to maximise the long run utility against such a strategy
@@ -125,13 +125,16 @@ class FirstByDowning(Player):
125125
126126 EV_TOT = #CC(EV_CC) + #CD(EV_CD) + #DC(EV_DC) + #DD(EV_DD)
127127
128- I.E. The player aims to maximise the expected value of being in each state
129- weighted by the number of times we expect to be in that state.
128+ This differs from the more modern literature where #CC, #CD, #DC and #DD
129+ would imply that counts of both players playing C and C, or the first
130+ playing C and the second D etc...
131+ In this case the author uses an argument based on the sequence of plays by
132+ the player (S) so #CC denotes the number of times the player plays C twice
133+ in a row. This is then used to
130134
131- On the second page of the appendix, figure 4 (page 390) supposedly
132- identifies an expression for EV_TOT however it is not clear how some of the
133- steps are carried out. It seems like an asymptotic
134- argument is being used. Furthermore, a specific term is made to disappear in
135+ On the second page of the appendix, figure 4 (page 390)
136+ identifies an expression for EV_TOT.
137+ A specific term is made to disappear in
135138 the case of T - R = P - S (which is not the case for the standard
136139 (R, P, S, T) = (3, 1, 0, 5)):
137140
@@ -142,8 +145,12 @@ class FirstByDowning(Player):
142145 in the abstract) and as such the final expression (with only V as unknown)
143146 can be used to decide if V should indicate that S always cooperates or not.
144147
145- Given the lack of usable details in this paper, the following interpretation
146- is used to implement this strategy:
148+ This final expression is used to show that EV_TOT is linear in the number of
149+ cooperations by the player thus justifying the fact that the player will
150+ always cooperate or defect.
151+
152+ All of the above details are used to give the following interpretation of
153+ the strategy:
147154
148155 1. On any given turn, the strategy will estimate alpha = P(C_o | C_s) and
149156 beta = P(C_o | D_s).
@@ -180,6 +187,13 @@ class FirstByDowning(Player):
180187 > "Initially, they are both assumed to be .5, which amounts to the
181188 > pessimistic assumption that the other player is not responsive."
182189
190+ Note that if alpha = beta = 1 / 2 then:
191+
192+ E_C = alpha R + alpha S
193+ E_D = alpha T + alpha P
194+
195+ And from the defining properties of the Prisoner's Dilemma (T > R > P > S)
196+ this gives: E_D > E_C.
183197 Thus, the player opens with a defection in the first two rounds. Note that
184198 from the Axelrod publications alone there is nothing to indicate defections
185199 on the first two rounds, although a defection in the opening round is clear.
0 commit comments