|
2 | 2 | "cells": [ |
3 | 3 | { |
4 | 4 | "cell_type": "markdown", |
5 | | - "id": "cell-0", |
6 | 5 | "metadata": {}, |
7 | 6 | "source": [ |
8 | 7 | "<div align=\"center\">\n", |
|
50 | 49 | "\n", |
51 | 50 | "What do you do beyond Cartpole?\n", |
52 | 51 | "\n", |
53 | | - "Fast forward to 2025, GRPO is awesome and this time it's not JUST in theory, it works well in practise and is really here! \n", |
| 52 | + "Fast-forward to 2025, GRPO is awesome and this time it's not JUST in theory, it works well in practise and is really here!\n", |
54 | 53 | "\n", |
55 | 54 | "The problem still remains, how do you take these RL algorithms and take them beyond Cartpole?\n", |
56 | 55 | "\n", |
57 | 56 | "A huge part of RL is giving your algorithms environment access to learn. \n", |
58 | 57 | "\n", |
59 | | - "We are excited to introduce an Environement Spec for adding Open Environments for RL Training. This will allow you to focus on your experiments and allow everyone to bring their environments. \n", |
| 58 | + "We are excited to introduce an Environment Spec for adding Open Environments for RL Training. This will allow you to focus on your experiments and allow everyone to bring their environments.\n", |
60 | 59 | "\n", |
61 | 60 | "Focus on experiments, use OpenEnvironments, and build agents that go beyond Cartpole on a single spec.\n", |
62 | 61 | "\n", |
|
65 | 64 | }, |
66 | 65 | { |
67 | 66 | "cell_type": "markdown", |
68 | | - "id": "cell-1", |
69 | 67 | "metadata": {}, |
70 | 68 | "source": [ |
71 | 69 | "## 📋 What You'll Learn\n", |
|
116 | 114 | }, |
117 | 115 | { |
118 | 116 | "cell_type": "markdown", |
119 | | - "id": "cell-2", |
120 | 117 | "metadata": {}, |
121 | 118 | "source": [ |
122 | 119 | "---\n", |
|
156 | 153 | }, |
157 | 154 | { |
158 | 155 | "cell_type": "markdown", |
159 | | - "id": "cell-3", |
160 | 156 | "metadata": {}, |
161 | 157 | "source": [ |
162 | 158 | "---\n", |
|
187 | 183 | { |
188 | 184 | "cell_type": "code", |
189 | 185 | "execution_count": 3, |
190 | | - "id": "cell-4", |
191 | 186 | "metadata": {}, |
192 | 187 | "outputs": [ |
193 | 188 | { |
|
253 | 248 | }, |
254 | 249 | { |
255 | 250 | "cell_type": "markdown", |
256 | | - "id": "cell-5", |
257 | 251 | "metadata": {}, |
258 | 252 | "source": [ |
259 | 253 | "---\n", |
|
327 | 321 | }, |
328 | 322 | { |
329 | 323 | "cell_type": "markdown", |
330 | | - "id": "cell-6", |
331 | 324 | "metadata": {}, |
332 | 325 | "source": [ |
333 | 326 | "### The Architecture\n", |
|
375 | 368 | }, |
376 | 369 | { |
377 | 370 | "cell_type": "markdown", |
378 | | - "id": "cell-7", |
379 | 371 | "metadata": {}, |
380 | 372 | "source": [ |
381 | 373 | "---\n", |
|
394 | 386 | { |
395 | 387 | "cell_type": "code", |
396 | 388 | "execution_count": 4, |
397 | | - "id": "cell-8", |
398 | 389 | "metadata": {}, |
399 | 390 | "outputs": [ |
400 | 391 | { |
|
443 | 434 | }, |
444 | 435 | { |
445 | 436 | "cell_type": "markdown", |
446 | | - "id": "cell-9", |
447 | 437 | "metadata": {}, |
448 | 438 | "source": [ |
449 | 439 | "---\n", |
|
477 | 467 | { |
478 | 468 | "cell_type": "code", |
479 | 469 | "execution_count": 5, |
480 | | - "id": "cell-10", |
481 | 470 | "metadata": {}, |
482 | 471 | "outputs": [ |
483 | 472 | { |
|
576 | 565 | }, |
577 | 566 | { |
578 | 567 | "cell_type": "markdown", |
579 | | - "id": "cell-11", |
580 | 568 | "metadata": {}, |
581 | 569 | "source": [ |
582 | 570 | "---\n", |
|
622 | 610 | { |
623 | 611 | "cell_type": "code", |
624 | 612 | "execution_count": 6, |
625 | | - "id": "cell-12", |
626 | 613 | "metadata": {}, |
627 | 614 | "outputs": [ |
628 | 615 | { |
|
722 | 709 | { |
723 | 710 | "cell_type": "code", |
724 | 711 | "execution_count": 7, |
725 | | - "id": "cell-13", |
726 | 712 | "metadata": {}, |
727 | 713 | "outputs": [ |
728 | 714 | { |
|
810 | 796 | }, |
811 | 797 | { |
812 | 798 | "cell_type": "markdown", |
813 | | - "id": "cell-14", |
814 | 799 | "metadata": {}, |
815 | 800 | "source": [ |
816 | 801 | "### How the Client Works\n", |
|
830 | 815 | }, |
831 | 816 | { |
832 | 817 | "cell_type": "markdown", |
833 | | - "id": "cell-15", |
834 | 818 | "metadata": {}, |
835 | 819 | "source": [ |
836 | 820 | "---\n", |
|
857 | 841 | }, |
858 | 842 | { |
859 | 843 | "cell_type": "markdown", |
860 | | - "id": "cell-16", |
861 | 844 | "metadata": {}, |
862 | 845 | "source": [ |
863 | 846 | "## The Game: Catch 🔴🏓\n", |
|
918 | 901 | { |
919 | 902 | "cell_type": "code", |
920 | 903 | "execution_count": 8, |
921 | | - "id": "cell-17", |
922 | 904 | "metadata": {}, |
923 | 905 | "outputs": [ |
924 | 906 | { |
|
990 | 972 | { |
991 | 973 | "cell_type": "code", |
992 | 974 | "execution_count": 9, |
993 | | - "id": "cell-18", |
994 | 975 | "metadata": {}, |
995 | 976 | "outputs": [ |
996 | 977 | { |
|
1009 | 990 | "evalue": "Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1.", |
1010 | 991 | "output_type": "error", |
1011 | 992 | "traceback": [ |
1012 | | - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", |
1013 | | - "\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", |
1014 | | - "Cell \u001b[1;32mIn[9], line 12\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 12\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpyspiel\u001b[39;00m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m✅ OpenSpiel is installed!\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", |
1015 | | - "\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'pyspiel'", |
| 993 | + "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m", |
| 994 | + "\u001B[1;31mModuleNotFoundError\u001B[0m Traceback (most recent call last)", |
| 995 | + "Cell \u001B[1;32mIn[9], line 12\u001B[0m\n\u001B[0;32m 11\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m:\n\u001B[1;32m---> 12\u001B[0m \u001B[38;5;28;01mimport\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[38;5;21;01mpyspiel\u001B[39;00m\n\u001B[0;32m 13\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m✅ OpenSpiel is installed!\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n", |
| 996 | + "\u001B[1;31mModuleNotFoundError\u001B[0m: No module named 'pyspiel'", |
1016 | 997 | "\nDuring handling of the above exception, another exception occurred:\n", |
1017 | | - "\u001b[1;31mCalledProcessError\u001b[0m Traceback (most recent call last)", |
1018 | | - "Cell \u001b[1;32mIn[9], line 17\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m⚠️ OpenSpiel not found. Installing...\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 16\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01msubprocess\u001b[39;00m\n\u001b[1;32m---> 17\u001b[0m \u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43msys\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecutable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m-m\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpip\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minstall\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m-q\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mopen_spiel\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 18\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m✅ OpenSpiel installed!\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 20\u001b[0m \u001b[38;5;66;03m# Start the OpenSpiel server in background\u001b[39;00m\n", |
1019 | | - "File \u001b[1;32md:\\ANACONDA\\envs\\openenv\\Lib\\subprocess.py:413\u001b[0m, in \u001b[0;36mcheck_call\u001b[1;34m(*popenargs, **kwargs)\u001b[0m\n\u001b[0;32m 411\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m cmd \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m 412\u001b[0m cmd \u001b[38;5;241m=\u001b[39m popenargs[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m--> 413\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m CalledProcessError(retcode, cmd)\n\u001b[0;32m 414\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;241m0\u001b[39m\n", |
1020 | | - "\u001b[1;31mCalledProcessError\u001b[0m: Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1." |
| 998 | + "\u001B[1;31mCalledProcessError\u001B[0m Traceback (most recent call last)", |
| 999 | + "Cell \u001B[1;32mIn[9], line 17\u001B[0m\n\u001B[0;32m 15\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m⚠️ OpenSpiel not found. Installing...\u001B[39m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m 16\u001B[0m \u001B[38;5;28;01mimport\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[38;5;21;01msubprocess\u001B[39;00m\n\u001B[1;32m---> 17\u001B[0m \u001B[43msubprocess\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mcheck_call\u001B[49m\u001B[43m(\u001B[49m\u001B[43m[\u001B[49m\u001B[43msys\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mexecutable\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43m-m\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mpip\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43minstall\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43m-q\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mopen_spiel\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m]\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m 18\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m✅ OpenSpiel installed!\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m 20\u001B[0m \u001B[38;5;66;03m# Start the OpenSpiel server in background\u001B[39;00m\n", |
| 1000 | + "File \u001B[1;32md:\\ANACONDA\\envs\\openenv\\Lib\\subprocess.py:413\u001B[0m, in \u001B[0;36mcheck_call\u001B[1;34m(*popenargs, **kwargs)\u001B[0m\n\u001B[0;32m 411\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m cmd \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[0;32m 412\u001B[0m cmd \u001B[38;5;241m=\u001B[39m popenargs[\u001B[38;5;241m0\u001B[39m]\n\u001B[1;32m--> 413\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m CalledProcessError(retcode, cmd)\n\u001B[0;32m 414\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;241m0\u001B[39m\n", |
| 1001 | + "\u001B[1;31mCalledProcessError\u001B[0m: Command '['d:\\\\ANACONDA\\\\envs\\\\openenv\\\\python.exe', '-m', 'pip', 'install', '-q', 'open_spiel']' returned non-zero exit status 1." |
1021 | 1002 | ] |
1022 | 1003 | } |
1023 | 1004 | ], |
|
1102 | 1083 | { |
1103 | 1084 | "cell_type": "code", |
1104 | 1085 | "execution_count": null, |
1105 | | - "id": "cell-19", |
1106 | 1086 | "metadata": {}, |
1107 | 1087 | "outputs": [], |
1108 | 1088 | "source": [ |
|
1124 | 1104 | { |
1125 | 1105 | "cell_type": "code", |
1126 | 1106 | "execution_count": null, |
1127 | | - "id": "cell-20", |
1128 | 1107 | "metadata": {}, |
1129 | 1108 | "outputs": [], |
1130 | 1109 | "source": [ |
|
1173 | 1152 | }, |
1174 | 1153 | { |
1175 | 1154 | "cell_type": "markdown", |
1176 | | - "id": "cell-21", |
1177 | 1155 | "metadata": {}, |
1178 | 1156 | "source": [ |
1179 | 1157 | "---\n", |
|
1220 | 1198 | { |
1221 | 1199 | "cell_type": "code", |
1222 | 1200 | "execution_count": null, |
1223 | | - "id": "cell-22", |
1224 | 1201 | "metadata": {}, |
1225 | 1202 | "outputs": [], |
1226 | 1203 | "source": [ |
|
1319 | 1296 | }, |
1320 | 1297 | { |
1321 | 1298 | "cell_type": "markdown", |
1322 | | - "id": "cell-23", |
1323 | 1299 | "metadata": {}, |
1324 | 1300 | "source": [ |
1325 | 1301 | "### Watch a Policy Play!" |
|
1328 | 1304 | { |
1329 | 1305 | "cell_type": "code", |
1330 | 1306 | "execution_count": null, |
1331 | | - "id": "cell-24", |
1332 | 1307 | "metadata": {}, |
1333 | 1308 | "outputs": [], |
1334 | 1309 | "source": [ |
|
1397 | 1372 | }, |
1398 | 1373 | { |
1399 | 1374 | "cell_type": "markdown", |
1400 | | - "id": "cell-25", |
1401 | 1375 | "metadata": {}, |
1402 | 1376 | "source": [ |
1403 | 1377 | "---\n", |
|
1416 | 1390 | { |
1417 | 1391 | "cell_type": "code", |
1418 | 1392 | "execution_count": null, |
1419 | | - "id": "cell-26", |
1420 | 1393 | "metadata": {}, |
1421 | 1394 | "outputs": [], |
1422 | 1395 | "source": [ |
|
1477 | 1450 | }, |
1478 | 1451 | { |
1479 | 1452 | "cell_type": "markdown", |
1480 | | - "id": "cell-27", |
1481 | 1453 | "metadata": {}, |
1482 | 1454 | "source": [ |
1483 | 1455 | "---\n", |
|
1576 | 1548 | }, |
1577 | 1549 | { |
1578 | 1550 | "cell_type": "markdown", |
1579 | | - "id": "cell-28", |
1580 | 1551 | "metadata": {}, |
1581 | 1552 | "source": [ |
1582 | 1553 | "---\n", |
|
1711 | 1682 | }, |
1712 | 1683 | { |
1713 | 1684 | "cell_type": "markdown", |
1714 | | - "id": "cell-29", |
1715 | 1685 | "metadata": {}, |
1716 | 1686 | "source": [ |
1717 | 1687 | "---\n", |
|
1725 | 1695 | }, |
1726 | 1696 | { |
1727 | 1697 | "cell_type": "markdown", |
1728 | | - "id": "cell-30", |
1729 | 1698 | "metadata": {}, |
1730 | 1699 | "source": [ |
1731 | 1700 | "## What You Learned\n", |
|
1778 | 1747 | }, |
1779 | 1748 | { |
1780 | 1749 | "cell_type": "markdown", |
1781 | | - "id": "cell-31", |
1782 | 1750 | "metadata": {}, |
1783 | 1751 | "source": [ |
1784 | 1752 | "## OpenEnv vs Traditional RL\n", |
|
1845 | 1813 | }, |
1846 | 1814 | { |
1847 | 1815 | "cell_type": "markdown", |
1848 | | - "id": "cell-32", |
1849 | 1816 | "metadata": {}, |
1850 | 1817 | "source": [ |
1851 | 1818 | "<a id=\"resources\"></a>\n", |
|
0 commit comments