Skip to content

Commit 632b1c6

Browse files
committed
Added codex solutions to the README.md
Made some puzzles harder Merged game_theory.py and games.py Added taint_date, the date at which a puzzle was added to the repo. AI systems trained after that date may have seen the puzzle and its solution.
1 parent ec68ca5 commit 632b1c6

File tree

15 files changed

+19321
-12507
lines changed

15 files changed

+19321
-12507
lines changed

README.md

Lines changed: 98 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
# Python Programming Puzzles (P3)
22

33
This repo contains a dataset of Python programming puzzles which can be used to teach and evaluate
4-
an AI's programming proficiency. We hope this dataset will **grow rapidly**, and it is already diverse in
5-
terms of problem difficulty, domain,
4+
an AI's programming proficiency. We present code generated by OpenAI's recently released
5+
[codex](https://arxiv.org/abs/2107.03374) 12-billion parameter neural network
6+
solving many of these puzzles. We hope this dataset will
7+
**grow rapidly**, and it is already diverse in terms of problem difficulty, domain,
68
and algorithmic tools needed to solve the problems. Please
79
[propose a new puzzle](../../issues/new?assignees=akalai&labels=New-puzzle&template=new-puzzle.md&title=New+puzzle)
810
or [browse newly proposed puzzles](../../issues?q=is%3Aopen+is%3Aissue+label%3ANew-puzzle)
@@ -33,48 +35,125 @@ your programming compares.
3335
## What is a Python programming puzzle?
3436

3537
Each puzzle takes the form of a Python function that takes an answer as an argument.
36-
The goal is to find an answer which makes the function return `True`.
38+
The answer is an input which makes the function return `True`.
3739
This is called *satisfying* the puzzle, and that is why the puzzles are all named `sat`.
3840

3941
```python
4042
def sat(s: str):
4143
return "Hello " + s == "Hello world"
4244
```
4345

44-
The answer to the above puzzle is the string `"world"` because `sat("world")` returns `True`. The puzzles range from trivial problems like this, to classic puzzles,
46+
The answer to the above puzzle is the string `"world"` because `sat("world")` returns `True`. The puzzles range from
47+
trivial problems like this, to classic puzzles,
4548
to programming competition problems, all the way through open problems in algorithms and mathematics.
46-
A slightly harder example is:
49+
50+
The classic [Towers of Hanoi](https://en.wikipedia.org/wiki/Tower_of_Hanoi) puzzle can be written as follows:
4751
```python
48-
def sat(s: str):
49-
"""find a string with 1000 o's but no consecutive o's."""
50-
return s.count("o") == 1000 and s.count("oo") == 0
52+
def sat(moves: List[List[int]]):
53+
"""
54+
Eight disks of sizes 1-8 are stacked on three towers, with each tower having disks in order of largest to
55+
smallest. Move [i, j] corresponds to taking the smallest disk off tower i and putting it on tower j, and it
56+
is legal as long as the towers remain in sorted order. Find a sequence of moves that moves all the disks
57+
from the first to last towers.
58+
"""
59+
rods = ([8, 7, 6, 5, 4, 3, 2, 1], [], [])
60+
for [i, j] in moves:
61+
rods[j].append(rods[i].pop())
62+
assert rods[j][-1] == min(rods[j]), "larger disk on top of smaller disk"
63+
return rods[0] == rods[1] == []
5164
```
65+
The shortest answer is a list of 255 moves, so instead we ask for the AI to generate *code* that outputs an answer. In
66+
this case, the [codex API](https://beta.openai.com/) generated the following code:
67+
```python
68+
def sol():
69+
# taken from https://www.geeksforgeeks.org/c-program-for-tower-of-hanoi/
70+
moves = []
71+
def hanoi(n, source, temp, dest):
72+
if n > 0:
73+
hanoi(n - 1, source, dest, temp)
74+
moves.append([source, dest])
75+
hanoi(n - 1, temp, source, dest)
76+
hanoi(8, 0, 1, 2)
77+
return moves
78+
```
79+
This was not on its first try, but that is one of the advantages of puzzles---it is easy for the computer to check
80+
its answers so it can generate many answers until it finds one. For this puzzle, about 1 in 1,000 solutions were
81+
satisfactory. Clearly, codex has seen this problem before in other input formats---it even generated a url!
82+
(Upon closer inspection, the website exists and contains Python Tower-of-Hanoi code in a completely different format
83+
with different variable names.)
84+
On a harder, less-standard [Hanoi puzzle variant](puzzles/README.md#towersofhanoiarbitrary) that
85+
requires moving from particular start to end positions, codex didn't solve it on 10,000 attempts.
86+
87+
Next, consider a puzzle inspired by [this easy competitive programming problem](https://codeforces.com/problemset/problem/58/A)
88+
from [codeforces.com](https://codeforces.com) website:
89+
```python
90+
def sat(inds: List[int], string="Sssuubbstrissiingg"):
91+
"""Find increasing indices to make the substring "substring"""
92+
return inds == sorted(inds) and "".join(string[i] for i in inds) == "substring"
93+
```
94+
Codex generated the code below, which when run gives the valid answer `[1, 3, 5, 7, 8, 9, 10, 15, 16]`.
95+
This satisfies this puzzle because it's an increasing list of indices which if you join the
96+
characters `"Sssuubbstrissiingg"` in these indices you get `"substring"`.
97+
```python
98+
def sol(string="Sssuubbstrissiingg"):
99+
x = "substring"
100+
pos = string.index(x[0])
101+
inds = [pos]
102+
while True:
103+
x = x[1:]
104+
if not x:
105+
return inds
106+
pos = string.find(x[0], pos+1)
107+
if pos == -1:
108+
return inds
109+
inds.append(pos)
110+
```
111+
Again, there are multiple valid answers, and again this was out of many attempts (only 1 success in 10k).
112+
52113

53114
A more challenging puzzle that requires [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming) is the
54115
[longest increasing subsequence](https://en.wikipedia.org/wiki/Longest_increasing_subsequence) problem
55116
which we can also describe with strings:
56117
```python
57-
from typing import List
118+
def f(x: List[int], length=20, s="Dynamic programming solves this classic job-interview puzzle!!!"):
119+
"""Find the indices of the longest substring with characters in sorted order"""
120+
return all(s[x[i]] <= s[x[i + 1]] and x[i + 1] > x[i] for i in range(length - 1))
58121

59-
def sat(x: List[int], s="Dynamic programming solves this classic job-interview puzzle!!!"):
60-
"""Find the indexes (possibly negative!) of the longest monotonic subsequence"""
61-
return all(s[x[i]] <= s[x[i+1]] and x[i+1] > x[i] for i in range(25))
62122
```
123+
Codex didn't solve this one.
63124

64-
The classic [Towers of Hanoi](https://en.wikipedia.org/wiki/Tower_of_Hanoi) puzzle can be written as follows:
125+
The dataset also has a number of open problems in computer science and mathematics. For example,
126+
[Conway's 99-graph problem](https://en.wikipedia.org/w/index.php?title=Conway%27s_99-graph_problem)
127+
is an unsolved problem in graph theory
128+
(see also [Five $1,000 Problems (Update 2017)](https://oeis.org/A248380/a248380.pdf))
65129
```python
66-
def sat(moves: List[List[int]]):
67-
"""moves is list of [from, to] pairs"""
68-
t = ([8, 7, 6, 5, 4, 3, 2, 1], [], []) # towers state
69-
return all(t[j].append(t[i].pop()) or t[j][-1] == min(t[j]) for i, j in moves) and t[0] == t[1]
70-
130+
def sat(edges: List[List[int]]):
131+
"""
132+
Find an undirected graph with 99 vertices, in which each two adjacent vertices have exactly one common
133+
neighbor, and in which each two non-adjacent vertices have exactly two common neighbors.
134+
"""
135+
# first compute neighbors sets, N:
136+
N = {i: {j for j in range(99) if j != i and ([i, j] in edges or [j, i] in edges)} for i in range(99)}
137+
return all(len(N[i].intersection(N[j])) == (1 if j in N[i] else 2) for i in range(99) for j in range(i))
71138
```
72139

140+
Why puzzles? One reason is that, if we can solve them better than human programmers,
141+
then we could make progress on some important algorithms problems.
142+
But until then, a second reason is that they can be valuable for training and evaluating AI systems.
143+
Many programming datasets have been proposed over the years, and several have problems of a similar nature
144+
(like programming competition problems). In puzzles, the spec is defined by code, while
145+
other datasets usually use a combination of English and a hidden test set of input-output pairs. English-based
146+
specs are notoriously ambiguous and test the system's understanding of English.
147+
And with input-output test cases, you would have to have solved a puzzle before you pose it,
148+
so what is the use there? Code-based specs
149+
have the advantage that they are unambiguous, there is no need to debug the AI-generated code or fears that it
150+
doesn't do what you want. If it solved the puzzle, then it succeeded by definition.
151+
73152
For more information on the motivation and how programming puzzles can help AI learn to program, see
74153
the paper:
75154
*Programming Puzzles*, by Tal Schuster, Ashwin Kalyan, Alex Polozov, and Adam Tauman Kalai. 2021 (Link to be added shortly)
76155

77-
# [Click here to browse the puzzles](/puzzles/README.md)
156+
# [Click here to browse the puzzles and solutions](/puzzles/README.md)
78157

79158
The problems in this repo are based on:
80159
* Wikipedia articles about [algorithms](https://en.wikipedia.org/wiki/List_of_algorithms), [puzzles](https://en.wikipedia.org/wiki/Category:Logic_puzzles),

generators/IMO.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ class NoRelativePrimes(PuzzleGenerator):
108108

109109

110110
@staticmethod
111-
def sat(nums: List[int], b=6, m=2):
111+
def sat(nums: List[int], b=7, m=6):
112112
"""
113113
Let P(n) = n^2 + n + 1.
114114
@@ -337,7 +337,7 @@ class HalfTag(PuzzleGenerator):
337337
taint_date = [2020, 9, 19]
338338

339339
@staticmethod
340-
def sat(li: List[int], n=3, tags=[0, 1, 2, 0, 0, 1, 1, 1, 2, 2, 0, 2]):
340+
def sat(li: List[int], tags=[3, 0, 3, 2, 0, 1, 0, 3, 1, 1, 2, 2, 0, 2, 1, 3]):
341341
"""
342342
The input tags is a list of 4n integer tags each in range(n) with each tag occurring 4 times.
343343
The goal is to find a subset (list) li of half the indices such that:
@@ -353,12 +353,14 @@ def sat(li: List[int], n=3, tags=[0, 1, 2, 0, 0, 1, 1, 1, 2, 2, 0, 2]):
353353
354354
Note the sum of the output is 33 = (0+1+2+...+11)/2 and the selected tags are [0, 0, 1, 1, 2, 2]
355355
"""
356+
n = max(tags) + 1
356357
assert sorted(tags) == sorted(list(range(n)) * 4), "hint: each tag occurs exactly four times"
357358
assert len(li) == len(set(li)) and min(li) >= 0
358359
return sum(li) * 2 == sum(range(4 * n)) and sorted([tags[i] for i in li]) == [i // 2 for i in range(2 * n)]
359360

360361
@staticmethod
361-
def sol(n, tags):
362+
def sol(tags):
363+
n = max(tags) + 1
362364
pairs = {(i, 4 * n - i - 1) for i in range(2 * n)}
363365
by_tag = {tag: [] for tag in range(n)}
364366
for p in pairs:
@@ -419,7 +421,7 @@ def gen_random(self):
419421
tags = [i // 4 for i in range(4 * n)]
420422
self.random.shuffle(tags)
421423
# print(self.__class__, n, tick())
422-
self.add(dict(n=n, tags=tags))
424+
self.add(dict(tags=tags))
423425

424426

425427

generators/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
from . import compression
1111
from . import conways_game_of_life
1212
from . import games
13-
from . import game_theory
1413
from . import graphs
1514
from . import ICPC
1615
from . import IMO

generators/classic_puzzles.py

Lines changed: 20 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -190,10 +190,6 @@ def sat(quine: str):
190190
def sol():
191191
return "(lambda x: f'({x})({chr(34)}{x}{chr(34)})')(\"lambda x: f'({x})({chr(34)}{x}{chr(34)})'\")"
192192

193-
@staticmethod
194-
def sol2(): # thanks for this simple solution, GPT-3!
195-
return 'quine'
196-
197193

198194
class RevQuine(PuzzleGenerator):
199195
"""Reverse [Quine](https://en.wikipedia.org/wiki/Quine_%28computing%29). The solution we give is from GPT3."""
@@ -252,25 +248,26 @@ class ClockAngle(PuzzleGenerator):
252248
@staticmethod
253249
def sat(hands: List[int], target_angle=45):
254250
"""Find clock hands = [hour, min] such that the angle is target_angle degrees."""
255-
hour, min = hands
256-
return 0 < hour <= 12 and 0 <= min < 60 and ((60 * hour + min) - 12 * min) % 720 == 2 * target_angle
251+
h, m = hands
252+
assert 0 < h <= 12 and 0 <= m < 60
253+
hour_angle = 30 * h + m / 2
254+
minute_angle = 6 * m
255+
return abs(hour_angle - minute_angle) in [target_angle, 360 - target_angle]
257256

258257
@staticmethod
259258
def sol(target_angle):
260-
for hour in range(1, 13):
261-
for min in range(60):
262-
if ((60 * hour + min) - 12 * min) % 720 == 2 * target_angle:
263-
return [hour, min]
259+
for h in range(1, 13):
260+
for m in range(60):
261+
hour_angle = 30 * h + m / 2
262+
minute_angle = 6 * m
263+
if abs(hour_angle - minute_angle) % 360 in [target_angle, 360 - target_angle]:
264+
return [h, m]
265+
266+
def gen_random(self):
267+
target_angle = self.random.randrange(0, 360)
268+
if self.sol(target_angle):
269+
self.add(dict(target_angle=target_angle))
264270

265-
def gen(self, target_num_instances):
266-
for hour in range(1, 13):
267-
for min in range(60):
268-
if len(self.instances) == target_num_instances:
269-
return
270-
double_angle = ((60 * hour + min) - 12 * min) % 720
271-
if double_angle % 2 == 0:
272-
target_angle = double_angle // 2
273-
self.add(dict(target_angle=target_angle))
274271

275272
class Kirkman(PuzzleGenerator):
276273
"""[Kirkman's problem](https://en.wikipedia.org/wiki/Kirkman%27s_schoolgirl_problem)"""
@@ -408,7 +405,6 @@ def mirror(coords): # rotate to all four corners
408405
return next(list(mirror(coords)) for coords in combinations(grid, side // 2) if
409406
test(coords) and test(mirror(coords)))
410407

411-
412408
def gen(self, target_num_instances):
413409
for easy in range(47):
414410
for side in range(47):
@@ -449,12 +445,7 @@ def gen_random(self):
449445
class SquaringTheSquare(PuzzleGenerator):
450446
"""[Squaring the square](https://en.wikipedia.org/wiki/Squaring_the_square)
451447
Wikipedia gives a minimal [solution with 21 squares](https://en.wikipedia.org/wiki/Squaring_the_square)
452-
due to Duijvestijn (1978):
453-
```python
454-
[[0, 0, 50], [0, 50, 29], [0, 79, 33], [29, 50, 25], [29, 75, 4], [33, 75, 37], [50, 0, 35],
455-
[50, 35, 15], [54, 50, 9], [54, 59, 16], [63, 50, 2], [63, 52, 7], [65, 35, 17], [70, 52, 18],
456-
[70, 70, 42], [82, 35, 11], [82, 46, 6], [85, 0, 27], [85, 27, 8], [88, 46, 24], [93, 27, 19]]
457-
```
448+
due to Duijvestijn (1978).
458449
"""
459450

460451
@staticmethod
@@ -484,7 +475,7 @@ class NecklaceSplit(PuzzleGenerator):
484475
"""
485476

486477
@staticmethod
487-
def sat(n: int, lace="bbbbrrbrbrbbrrrr"):
478+
def sat(n: int, lace="bbrbrbbbbbbrrrrrrrbrrrrbbbrbrrbbbrbrrrbrrbrrbrbbrrrrrbrbbbrrrbbbrbbrbbbrbrbb"):
488479
"""
489480
Find a split dividing the given red/blue necklace in half at n so that each piece has an equal number of
490481
reds and blues.
@@ -633,15 +624,11 @@ class WaterPouring(PuzzleGenerator):
633624
"""[Water pouring puzzle](https://en.wikipedia.org/w/index.php?title=Water_pouring_puzzle&oldid=985741928)"""
634625

635626
@staticmethod
636-
def sat(
637-
moves: List[List[int]],
638-
capacities=[8, 5, 3],
639-
init=[8, 0, 0],
640-
goal=[4, 4, 0]
641-
): # moves is list of [from, to] pairs
627+
def sat(moves: List[List[int]], capacities=[8, 5, 3], init=[8, 0, 0], goal=[4, 4, 0]):
642628
"""
643629
Given an initial state of water quantities in jugs and jug capacities, find a sequence of moves (pouring
644630
one jug into another until it is full or the first is empty) to reaches the given goal state.
631+
moves is list of [from, to] pairs
645632
"""
646633
state = init.copy()
647634

generators/codeforces.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Problems inspired by [codeforces](https://codeforces.com)."""
1+
"""Problems inspired by the popular programming competition site [codeforces.com](https://codeforces.com)"""
22

33
from puzzle_generator import PuzzleGenerator
44
from typing import List
@@ -656,7 +656,7 @@ class Sssuubbstriiingg(PuzzleGenerator):
656656
"""Inspired by [Codeforces Problem 58 A](https://codeforces.com/problemset/problem/58/A)"""
657657

658658
@staticmethod
659-
def sat(inds: List[int], string="Sssuubbstriiingg"):
659+
def sat(inds: List[int], string="Sssuubbstrissiingg"):
660660
"""Find increasing indices to make the substring "substring"""
661661
return inds == sorted(inds) and "".join(string[i] for i in inds) == "substring"
662662

@@ -725,7 +725,7 @@ class Moving0s(PuzzleGenerator):
725725
@staticmethod
726726
def sat(seq: List[int], target=[1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0], n_steps=4):
727727
"""
728-
Find a sequence of 0's and 1's so that, after n_steps of swapping each adjacent (0, 1), target target sequence
728+
Find a sequence of 0's and 1's so that, after n_steps of swapping each adjacent (0, 1), the target sequence
729729
is achieved.
730730
"""
731731
s = seq[:] # copy

0 commit comments

Comments
 (0)