|
1 | 1 | # Python Programming Puzzles (P3) |
2 | 2 |
|
3 | 3 | This repo contains a dataset of Python programming puzzles which can be used to teach and evaluate |
4 | | -an AI's programming proficiency. We hope this dataset will **grow rapidly**, and it is already diverse in |
5 | | -terms of problem difficulty, domain, |
| 4 | +an AI's programming proficiency. We present code generated by OpenAI's recently released |
| 5 | +[codex](https://arxiv.org/abs/2107.03374) 12-billion parameter neural network |
| 6 | +solving many of these puzzles. We hope this dataset will |
| 7 | +**grow rapidly**, and it is already diverse in terms of problem difficulty, domain, |
6 | 8 | and algorithmic tools needed to solve the problems. Please |
7 | 9 | [propose a new puzzle](../../issues/new?assignees=akalai&labels=New-puzzle&template=new-puzzle.md&title=New+puzzle) |
8 | 10 | or [browse newly proposed puzzles](../../issues?q=is%3Aopen+is%3Aissue+label%3ANew-puzzle) |
@@ -33,48 +35,125 @@ your programming compares. |
33 | 35 | ## What is a Python programming puzzle? |
34 | 36 |
|
35 | 37 | Each puzzle takes the form of a Python function that takes an answer as an argument. |
36 | | -The goal is to find an answer which makes the function return `True`. |
| 38 | +The answer is an input which makes the function return `True`. |
37 | 39 | This is called *satisfying* the puzzle, and that is why the puzzles are all named `sat`. |
38 | 40 |
|
39 | 41 | ```python |
40 | 42 | def sat(s: str): |
41 | 43 | return "Hello " + s == "Hello world" |
42 | 44 | ``` |
43 | 45 |
|
44 | | -The answer to the above puzzle is the string `"world"` because `sat("world")` returns `True`. The puzzles range from trivial problems like this, to classic puzzles, |
| 46 | +The answer to the above puzzle is the string `"world"` because `sat("world")` returns `True`. The puzzles range from |
| 47 | +trivial problems like this, to classic puzzles, |
45 | 48 | to programming competition problems, all the way through open problems in algorithms and mathematics. |
46 | | -A slightly harder example is: |
| 49 | + |
| 50 | +The classic [Towers of Hanoi](https://en.wikipedia.org/wiki/Tower_of_Hanoi) puzzle can be written as follows: |
47 | 51 | ```python |
48 | | -def sat(s: str): |
49 | | - """find a string with 1000 o's but no consecutive o's.""" |
50 | | - return s.count("o") == 1000 and s.count("oo") == 0 |
| 52 | +def sat(moves: List[List[int]]): |
| 53 | + """ |
| 54 | + Eight disks of sizes 1-8 are stacked on three towers, with each tower having disks in order of largest to |
| 55 | + smallest. Move [i, j] corresponds to taking the smallest disk off tower i and putting it on tower j, and it |
| 56 | + is legal as long as the towers remain in sorted order. Find a sequence of moves that moves all the disks |
| 57 | + from the first to last towers. |
| 58 | + """ |
| 59 | + rods = ([8, 7, 6, 5, 4, 3, 2, 1], [], []) |
| 60 | + for [i, j] in moves: |
| 61 | + rods[j].append(rods[i].pop()) |
| 62 | + assert rods[j][-1] == min(rods[j]), "larger disk on top of smaller disk" |
| 63 | + return rods[0] == rods[1] == [] |
51 | 64 | ``` |
| 65 | +The shortest answer is a list of 255 moves, so instead we ask for the AI to generate *code* that outputs an answer. In |
| 66 | +this case, the [codex API](https://beta.openai.com/) generated the following code: |
| 67 | +```python |
| 68 | +def sol(): |
| 69 | + # taken from https://www.geeksforgeeks.org/c-program-for-tower-of-hanoi/ |
| 70 | + moves = [] |
| 71 | + def hanoi(n, source, temp, dest): |
| 72 | + if n > 0: |
| 73 | + hanoi(n - 1, source, dest, temp) |
| 74 | + moves.append([source, dest]) |
| 75 | + hanoi(n - 1, temp, source, dest) |
| 76 | + hanoi(8, 0, 1, 2) |
| 77 | + return moves |
| 78 | +``` |
| 79 | +This was not on its first try, but that is one of the advantages of puzzles---it is easy for the computer to check |
| 80 | +its answers so it can generate many answers until it finds one. For this puzzle, about 1 in 1,000 solutions were |
| 81 | +satisfactory. Clearly, codex has seen this problem before in other input formats---it even generated a url! |
| 82 | +(Upon closer inspection, the website exists and contains Python Tower-of-Hanoi code in a completely different format |
| 83 | +with different variable names.) |
| 84 | +On a harder, less-standard [Hanoi puzzle variant](puzzles/README.md#towersofhanoiarbitrary) that |
| 85 | +requires moving from particular start to end positions, codex didn't solve it on 10,000 attempts. |
| 86 | + |
| 87 | +Next, consider a puzzle inspired by [this easy competitive programming problem](https://codeforces.com/problemset/problem/58/A) |
| 88 | +from [codeforces.com](https://codeforces.com) website: |
| 89 | +```python |
| 90 | +def sat(inds: List[int], string="Sssuubbstrissiingg"): |
| 91 | + """Find increasing indices to make the substring "substring""" |
| 92 | + return inds == sorted(inds) and "".join(string[i] for i in inds) == "substring" |
| 93 | +``` |
| 94 | +Codex generated the code below, which when run gives the valid answer `[1, 3, 5, 7, 8, 9, 10, 15, 16]`. |
| 95 | +This satisfies this puzzle because it's an increasing list of indices which if you join the |
| 96 | +characters `"Sssuubbstrissiingg"` in these indices you get `"substring"`. |
| 97 | +```python |
| 98 | +def sol(string="Sssuubbstrissiingg"): |
| 99 | + x = "substring" |
| 100 | + pos = string.index(x[0]) |
| 101 | + inds = [pos] |
| 102 | + while True: |
| 103 | + x = x[1:] |
| 104 | + if not x: |
| 105 | + return inds |
| 106 | + pos = string.find(x[0], pos+1) |
| 107 | + if pos == -1: |
| 108 | + return inds |
| 109 | + inds.append(pos) |
| 110 | +``` |
| 111 | +Again, there are multiple valid answers, and again this was out of many attempts (only 1 success in 10k). |
| 112 | + |
52 | 113 |
|
53 | 114 | A more challenging puzzle that requires [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming) is the |
54 | 115 | [longest increasing subsequence](https://en.wikipedia.org/wiki/Longest_increasing_subsequence) problem |
55 | 116 | which we can also describe with strings: |
56 | 117 | ```python |
57 | | -from typing import List |
| 118 | +def f(x: List[int], length=20, s="Dynamic programming solves this classic job-interview puzzle!!!"): |
| 119 | + """Find the indices of the longest substring with characters in sorted order""" |
| 120 | + return all(s[x[i]] <= s[x[i + 1]] and x[i + 1] > x[i] for i in range(length - 1)) |
58 | 121 |
|
59 | | -def sat(x: List[int], s="Dynamic programming solves this classic job-interview puzzle!!!"): |
60 | | - """Find the indexes (possibly negative!) of the longest monotonic subsequence""" |
61 | | - return all(s[x[i]] <= s[x[i+1]] and x[i+1] > x[i] for i in range(25)) |
62 | 122 | ``` |
| 123 | +Codex didn't solve this one. |
63 | 124 |
|
64 | | -The classic [Towers of Hanoi](https://en.wikipedia.org/wiki/Tower_of_Hanoi) puzzle can be written as follows: |
| 125 | +The dataset also has a number of open problems in computer science and mathematics. For example, |
| 126 | +[Conway's 99-graph problem](https://en.wikipedia.org/w/index.php?title=Conway%27s_99-graph_problem) |
| 127 | +is an unsolved problem in graph theory |
| 128 | +(see also [Five $1,000 Problems (Update 2017)](https://oeis.org/A248380/a248380.pdf)) |
65 | 129 | ```python |
66 | | -def sat(moves: List[List[int]]): |
67 | | - """moves is list of [from, to] pairs""" |
68 | | - t = ([8, 7, 6, 5, 4, 3, 2, 1], [], []) # towers state |
69 | | - return all(t[j].append(t[i].pop()) or t[j][-1] == min(t[j]) for i, j in moves) and t[0] == t[1] |
70 | | - |
| 130 | +def sat(edges: List[List[int]]): |
| 131 | + """ |
| 132 | + Find an undirected graph with 99 vertices, in which each two adjacent vertices have exactly one common |
| 133 | + neighbor, and in which each two non-adjacent vertices have exactly two common neighbors. |
| 134 | + """ |
| 135 | + # first compute neighbors sets, N: |
| 136 | + N = {i: {j for j in range(99) if j != i and ([i, j] in edges or [j, i] in edges)} for i in range(99)} |
| 137 | + return all(len(N[i].intersection(N[j])) == (1 if j in N[i] else 2) for i in range(99) for j in range(i)) |
71 | 138 | ``` |
72 | 139 |
|
| 140 | +Why puzzles? One reason is that, if we can solve them better than human programmers, |
| 141 | +then we could make progress on some important algorithms problems. |
| 142 | +But until then, a second reason is that they can be valuable for training and evaluating AI systems. |
| 143 | +Many programming datasets have been proposed over the years, and several have problems of a similar nature |
| 144 | +(like programming competition problems). In puzzles, the spec is defined by code, while |
| 145 | +other datasets usually use a combination of English and a hidden test set of input-output pairs. English-based |
| 146 | +specs are notoriously ambiguous and test the system's understanding of English. |
| 147 | +And with input-output test cases, you would have to have solved a puzzle before you pose it, |
| 148 | +so what is the use there? Code-based specs |
| 149 | +have the advantage that they are unambiguous, there is no need to debug the AI-generated code or fears that it |
| 150 | +doesn't do what you want. If it solved the puzzle, then it succeeded by definition. |
| 151 | + |
73 | 152 | For more information on the motivation and how programming puzzles can help AI learn to program, see |
74 | 153 | the paper: |
75 | 154 | *Programming Puzzles*, by Tal Schuster, Ashwin Kalyan, Alex Polozov, and Adam Tauman Kalai. 2021 (Link to be added shortly) |
76 | 155 |
|
77 | | -# [Click here to browse the puzzles](/puzzles/README.md) |
| 156 | +# [Click here to browse the puzzles and solutions](/puzzles/README.md) |
78 | 157 |
|
79 | 158 | The problems in this repo are based on: |
80 | 159 | * Wikipedia articles about [algorithms](https://en.wikipedia.org/wiki/List_of_algorithms), [puzzles](https://en.wikipedia.org/wiki/Category:Logic_puzzles), |
|
0 commit comments