Read the Docs build information Build id: 219867 Project: openai-education-spinningup Version: latest Commit: da53ce34fe57bffc8ea08af975bf60e228b632b9 Date: 2019-06-28T21:44:06.721198Z State: finished Success: True [rtd-command-info] start-time: 2019-06-28T21:44:07.200010Z, end-time: 2019-06-28T21:44:07.205738Z, duration: 0, exit-code: 0 git remote set-url origin git@github.com:openai/spinningup.git [rtd-command-info] start-time: 2019-06-28T21:44:07.287652Z, end-time: 2019-06-28T21:44:07.704530Z, duration: 0, exit-code: 0 git fetch --tags --prune --prune-tags --depth 50 From github.com:openai/spinningup 1cbb9a7..da53ce3 master -> origin/master [rtd-command-info] start-time: 2019-06-28T21:44:07.984364Z, end-time: 2019-06-28T21:44:07.992506Z, duration: 0, exit-code: 0 git checkout --force origin/master Previous HEAD position was 1cbb9a7 Merge pull request #165 from abinitio888/master HEAD is now at da53ce3 Merge pull request #88 from rootulp/patch-2 [rtd-command-info] start-time: 2019-06-28T21:44:08.079547Z, end-time: 2019-06-28T21:44:08.086781Z, duration: 0, exit-code: 0 git clean -d -f -f [rtd-command-info] start-time: 2019-06-28T21:44:09.083147Z, end-time: 2019-06-28T21:44:13.771652Z, duration: 4, exit-code: 0 python3.6 -mvirtualenv --no-site-packages --no-download Using base prefix '/home/docs/.pyenv/versions/3.6.8' New python executable in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/bin/python3.6 Not overwriting existing python script /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/bin/python (you must use /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/bin/python3.6) Installing setuptools, pip, wheel... done. [rtd-command-info] start-time: 2019-06-28T21:44:13.869227Z, end-time: 2019-06-28T21:44:15.126491Z, duration: 1, exit-code: 0 python -m pip install --upgrade --cache-dir /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/.cache/pip pip Requirement already up-to-date: pip in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (19.1.1) [rtd-command-info] start-time: 2019-06-28T21:44:15.217849Z, end-time: 2019-06-28T21:44:19.545698Z, duration: 4, exit-code: 0 python -m pip install --upgrade --cache-dir /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/.cache/pip Pygments==2.3.1 setuptools==41.0.1 docutils==0.14 mock==1.0.1 pillow==5.4.1 alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.8.1 recommonmark==0.5.0 sphinx<2 sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6 Requirement already up-to-date: Pygments==2.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (2.3.1) Requirement already up-to-date: setuptools==41.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (41.0.1) Requirement already up-to-date: docutils==0.14 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (0.14) Requirement already up-to-date: mock==1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (1.0.1) Requirement already up-to-date: pillow==5.4.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (5.4.1) Requirement already up-to-date: alabaster!=0.7.5,<0.8,>=0.7 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (0.7.12) Requirement already up-to-date: commonmark==0.8.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (0.8.1) Requirement already up-to-date: recommonmark==0.5.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (0.5.0) Collecting sphinx<2 Using cached https://files.pythonhosted.org/packages/7d/66/a4af242b4348b729b9d46ce5db23943ce9bca7da9bbe2ece60dc27f26420/Sphinx-1.8.5-py2.py3-none-any.whl Collecting sphinx-rtd-theme<0.5 Using cached https://files.pythonhosted.org/packages/60/b4/4df37087a1d36755e3a3bfd2a30263f358d2dea21938240fa02313d45f51/sphinx_rtd_theme-0.4.3-py2.py3-none-any.whl Requirement already up-to-date: readthedocs-sphinx-ext<0.6 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (0.5.17) Requirement already satisfied, skipping upgrade: future in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from commonmark==0.8.1) (0.17.1) Requirement already satisfied, skipping upgrade: requests>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (2.22.0) Requirement already satisfied, skipping upgrade: packaging in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (19.0) Requirement already satisfied, skipping upgrade: sphinxcontrib-websupport in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (1.1.2) Requirement already satisfied, skipping upgrade: snowballstemmer>=1.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (1.9.0) Requirement already satisfied, skipping upgrade: babel!=2.0,>=1.3 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (2.7.0) Requirement already satisfied, skipping upgrade: imagesize in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (1.1.0) Requirement already satisfied, skipping upgrade: six>=1.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (1.12.0) Requirement already satisfied, skipping upgrade: Jinja2>=2.3 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx<2) (2.10.1) Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx<2) (3.0.4) Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx<2) (2.8) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx<2) (1.25.3) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx<2) (2019.6.16) Requirement already satisfied, skipping upgrade: pyparsing>=2.0.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from packaging->sphinx<2) (2.4.0) Requirement already satisfied, skipping upgrade: pytz>=2015.7 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from babel!=2.0,>=1.3->sphinx<2) (2019.1) Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from Jinja2>=2.3->sphinx<2) (1.1.1) Installing collected packages: sphinx, sphinx-rtd-theme Found existing installation: Sphinx 1.5.6 Uninstalling Sphinx-1.5.6: Successfully uninstalled Sphinx-1.5.6 Found existing installation: sphinx-rtd-theme 0.4.1 Uninstalling sphinx-rtd-theme-0.4.1: Successfully uninstalled sphinx-rtd-theme-0.4.1 Successfully installed sphinx-1.8.5 sphinx-rtd-theme-0.4.3 [rtd-command-info] start-time: 2019-06-28T21:44:19.627240Z, end-time: 2019-06-28T21:44:22.669969Z, duration: 3, exit-code: 0 python -m pip install --exists-action=w --cache-dir /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/.cache/pip -r docs/docs_requirements.txt Requirement already satisfied: cloudpickle==0.5.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 1)) (0.5.2) Requirement already satisfied: gym>=0.10.8 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 2)) (0.13.0) Requirement already satisfied: ipython in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 3)) (7.5.0) Requirement already satisfied: joblib in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 4)) (0.13.2) Requirement already satisfied: matplotlib in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 5)) (3.1.0) Requirement already satisfied: numpy in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 6)) (1.16.4) Requirement already satisfied: pandas in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 7)) (0.24.2) Requirement already satisfied: pytest in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 8)) (4.6.3) Requirement already satisfied: psutil in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 9)) (5.6.3) Requirement already satisfied: scipy in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 10)) (1.3.0) Requirement already satisfied: seaborn==0.8.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 11)) (0.8.1) Collecting sphinx==1.5.6 (from -r docs/docs_requirements.txt (line 12)) Using cached https://files.pythonhosted.org/packages/cd/c3/3fc2985e07f6111b47328be116df9e05d5c2f246a050e2e2ebf6bdc9c692/Sphinx-1.5.6-py2.py3-none-any.whl Requirement already satisfied: sphinx-autobuild==0.7.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 13)) (0.7.1) Collecting sphinx-rtd-theme==0.4.1 (from -r docs/docs_requirements.txt (line 14)) Using cached https://files.pythonhosted.org/packages/87/30/7460f7b77b6e8a080dd3688f750fe5d5666c49358f8941449c5b128fa97d/sphinx_rtd_theme-0.4.1-py2.py3-none-any.whl Requirement already satisfied: tensorflow>=1.8.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 15)) (1.14.0) Requirement already satisfied: tqdm in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from -r docs/docs_requirements.txt (line 16)) (4.32.2) Requirement already satisfied: pyglet>=1.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from gym>=0.10.8->-r docs/docs_requirements.txt (line 2)) (1.3.2) Requirement already satisfied: six in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from gym>=0.10.8->-r docs/docs_requirements.txt (line 2)) (1.12.0) Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (2.0.9) Requirement already satisfied: backcall in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (0.1.0) Requirement already satisfied: pygments in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (2.3.1) Requirement already satisfied: traitlets>=4.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (4.3.2) Requirement already satisfied: jedi>=0.10 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (0.14.0) Requirement already satisfied: setuptools>=18.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (41.0.1) Requirement already satisfied: pickleshare in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (0.7.5) Requirement already satisfied: decorator in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (4.4.0) Requirement already satisfied: pexpect; sys_platform != "win32" in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from ipython->-r docs/docs_requirements.txt (line 3)) (4.7.0) Requirement already satisfied: python-dateutil>=2.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from matplotlib->-r docs/docs_requirements.txt (line 5)) (2.8.0) Requirement already satisfied: cycler>=0.10 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from matplotlib->-r docs/docs_requirements.txt (line 5)) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from matplotlib->-r docs/docs_requirements.txt (line 5)) (1.1.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from matplotlib->-r docs/docs_requirements.txt (line 5)) (2.4.0) Requirement already satisfied: pytz>=2011k in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pandas->-r docs/docs_requirements.txt (line 7)) (2019.1) Requirement already satisfied: importlib-metadata>=0.12 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (0.18) Requirement already satisfied: packaging in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (19.0) Requirement already satisfied: py>=1.5.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (1.8.0) Requirement already satisfied: pluggy<1.0,>=0.12 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (0.12.0) Requirement already satisfied: wcwidth in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (0.1.7) Requirement already satisfied: more-itertools>=4.0.0; python_version > "2.7" in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (7.1.0) Requirement already satisfied: attrs>=17.4.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (19.1.0) Requirement already satisfied: atomicwrites>=1.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pytest->-r docs/docs_requirements.txt (line 8)) (1.3.0) Requirement already satisfied: alabaster<0.8,>=0.7 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (0.7.12) Requirement already satisfied: imagesize in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (1.1.0) Requirement already satisfied: Jinja2>=2.3 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (2.10.1) Requirement already satisfied: docutils>=0.11 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (0.14) Requirement already satisfied: babel!=2.0,>=1.3 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (2.7.0) Requirement already satisfied: snowballstemmer>=1.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (1.9.0) Requirement already satisfied: requests>=2.0.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (2.22.0) Requirement already satisfied: PyYAML>=3.10 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (5.1.1) Requirement already satisfied: watchdog>=0.7.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (0.9.0) Requirement already satisfied: argh>=0.24.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (0.26.2) Requirement already satisfied: livereload>=2.3.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (2.6.1) Requirement already satisfied: port-for==0.3.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (0.3.1) Requirement already satisfied: pathtools>=0.1.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (0.1.2) Requirement already satisfied: tornado>=3.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from sphinx-autobuild==0.7.1->-r docs/docs_requirements.txt (line 13)) (6.0.3) Requirement already satisfied: termcolor>=1.1.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.1.0) Requirement already satisfied: grpcio>=1.8.6 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.21.1) Requirement already satisfied: keras-applications>=1.0.6 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.0.8) Requirement already satisfied: wrapt>=1.11.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.11.2) Requirement already satisfied: absl-py>=0.7.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.7.1) Requirement already satisfied: astor>=0.6.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.8.0) Requirement already satisfied: keras-preprocessing>=1.0.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.1.0) Requirement already satisfied: gast>=0.2.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.2.2) Requirement already satisfied: wheel>=0.26 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.33.4) Requirement already satisfied: tensorboard<1.15.0,>=1.14.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.14.0) Requirement already satisfied: protobuf>=3.6.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (3.8.0) Requirement already satisfied: google-pasta>=0.1.6 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.1.7) Requirement already satisfied: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (1.14.0) Requirement already satisfied: future in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pyglet>=1.2.0->gym>=0.10.8->-r docs/docs_requirements.txt (line 2)) (0.17.1) Requirement already satisfied: ipython-genutils in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from traitlets>=4.2->ipython->-r docs/docs_requirements.txt (line 3)) (0.2.0) Requirement already satisfied: parso>=0.3.0 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from jedi>=0.10->ipython->-r docs/docs_requirements.txt (line 3)) (0.5.0) Requirement already satisfied: ptyprocess>=0.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from pexpect; sys_platform != "win32"->ipython->-r docs/docs_requirements.txt (line 3)) (0.6.0) Requirement already satisfied: zipp>=0.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from importlib-metadata>=0.12->pytest->-r docs/docs_requirements.txt (line 8)) (0.5.1) Requirement already satisfied: MarkupSafe>=0.23 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from Jinja2>=2.3->sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (1.1.1) Requirement already satisfied: certifi>=2017.4.17 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (2019.6.16) Requirement already satisfied: idna<2.9,>=2.5 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (2.8) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (1.25.3) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from requests>=2.0.0->sphinx==1.5.6->-r docs/docs_requirements.txt (line 12)) (3.0.4) Requirement already satisfied: h5py in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (2.9.0) Requirement already satisfied: markdown>=2.6.8 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (3.1.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/envs/latest/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow>=1.8.0->-r docs/docs_requirements.txt (line 15)) (0.15.4) Installing collected packages: sphinx, sphinx-rtd-theme Found existing installation: Sphinx 1.8.5 Uninstalling Sphinx-1.8.5: Successfully uninstalled Sphinx-1.8.5 Found existing installation: sphinx-rtd-theme 0.4.3 Uninstalling sphinx-rtd-theme-0.4.3: Successfully uninstalled sphinx-rtd-theme-0.4.3 Successfully installed sphinx-1.5.6 sphinx-rtd-theme-0.4.1 [rtd-command-info] start-time: 2019-06-28T21:44:23.344844Z, end-time: 2019-06-28T21:44:23.653568Z, duration: 0, exit-code: 0 cat docs/conf.py #!/usr/bin/env python3 # -*- coding: utf-8 -*- # # Spinning Up documentation build configuration file, created by # sphinx-quickstart on Wed Aug 15 04:21:07 2018. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # import os import sys # Make sure spinup is accessible without going through setup.py dirname = os.path.dirname sys.path.insert(0, dirname(dirname(__file__))) # Mock mpi4py to get around having to install it on RTD server (which fails) from unittest.mock import MagicMock class Mock(MagicMock): @classmethod def __getattr__(cls, name): return MagicMock() MOCK_MODULES = ['mpi4py'] sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES) # Finish imports import spinup from recommonmark.parser import CommonMarkParser source_parsers = { '.md': CommonMarkParser, } # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = ['sphinx.ext.imgmath', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc', 'sphinx.ext.napoleon'] #'sphinx.ext.mathjax', ?? # imgmath settings imgmath_image_format = 'svg' imgmath_font_size = 14 # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # source_suffix = ['.rst', '.md'] # source_suffix = '.rst' # The master toctree document. master_doc = 'index' # General information about the project. project = 'Spinning Up' copyright = '2018, OpenAI' author = 'Joshua Achiam' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = '' # The full version, including alpha/beta/rc tags. release = '' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'default' #'sphinx' # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # # html_theme = 'alabaster' html_theme = "sphinx_rtd_theme" # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] html_logo = 'images/spinning-up-logo2.png' html_theme_options = { 'logo_only': True } #html_favicon = 'openai-favicon2_32x32.ico' html_favicon = 'openai_icon.ico' # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. htmlhelp_basename = 'SpinningUpdoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } imgmath_latex_preamble = r''' \usepackage{algorithm} \usepackage{algorithmic} \usepackage{cancel} \usepackage[verbose=true,letterpaper]{geometry} \geometry{ textheight=12in, textwidth=6.5in, top=1in, headheight=12pt, headsep=25pt, footskip=30pt } \newcommand{\E}{{\mathrm E}} \newcommand{\underE}[2]{\underset{\begin{subarray}{c}#1 \end{subarray}}{\E}\left[ #2 \right]} \newcommand{\Epi}[1]{\underset{\begin{subarray}{c}\tau \sim \pi \end{subarray}}{\E}\left[ #1 \right]} ''' # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'SpinningUp.tex', 'Spinning Up Documentation', 'Joshua Achiam', 'manual'), ] # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'spinningup', 'Spinning Up Documentation', [author], 1) ] # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'SpinningUp', 'Spinning Up Documentation', author, 'SpinningUp', 'One line description of project.', 'Miscellaneous'), ] def setup(app): app.add_stylesheet('css/modify.css') ########################################################################### # auto-created readthedocs.org specific configuration # ########################################################################### # # The following code was added during an automated build on readthedocs.org # It is auto created and injected for every build. The result is based on the # conf.py.tmpl file found in the readthedocs.org codebase: # https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/doc_builder/templates/doc_builder/conf.py.tmpl # import importlib import sys import os.path from six import string_types from sphinx import version_info # Get suffix for proper linking to GitHub # This is deprecated in Sphinx 1.3+, # as each page can have its own suffix if globals().get('source_suffix', False): if isinstance(source_suffix, string_types): SUFFIX = source_suffix elif isinstance(source_suffix, (list, tuple)): # Sphinx >= 1.3 supports list/tuple to define multiple suffixes SUFFIX = source_suffix[0] elif isinstance(source_suffix, dict): # Sphinx >= 1.8 supports a mapping dictionary for mulitple suffixes SUFFIX = list(source_suffix.keys())[0] # make a ``list()`` for py2/py3 compatibility else: # default to .rst SUFFIX = '.rst' else: SUFFIX = '.rst' # Add RTD Static Path. Add to the end because it overwrites previous files. if not 'html_static_path' in globals(): html_static_path = [] if os.path.exists('_static'): html_static_path.append('_static') # Add RTD Theme only if they aren't overriding it already using_rtd_theme = ( ( 'html_theme' in globals() and html_theme in ['default'] and # Allow people to bail with a hack of having an html_style 'html_style' not in globals() ) or 'html_theme' not in globals() ) if using_rtd_theme: theme = importlib.import_module('sphinx_rtd_theme') html_theme = 'sphinx_rtd_theme' html_style = None html_theme_options = {} if 'html_theme_path' in globals(): html_theme_path.append(theme.get_html_theme_path()) else: html_theme_path = [theme.get_html_theme_path()] if globals().get('websupport2_base_url', False): websupport2_base_url = 'https://readthedocs.com/websupport' websupport2_static_url = 'https://media.readthedocs.com/' #Add project information to the template context. context = { 'using_theme': using_rtd_theme, 'html_theme': html_theme, 'current_version': "latest", 'version_slug': "latest", 'MEDIA_URL': "https://media.readthedocs.com/media/", 'STATIC_URL': "https://media.readthedocs.com/", 'PRODUCTION_DOMAIN': "readthedocs.com", 'versions': [ ("latest", "/en/latest/"), ], 'downloads': [ ("pdf", "//readthedocs.com/projects/openai-education-spinningup/downloads/pdf/latest/"), ("html", "//readthedocs.com/projects/openai-education-spinningup/downloads/htmlzip/latest/"), ("epub", "//readthedocs.com/projects/openai-education-spinningup/downloads/epub/latest/"), ], 'subprojects': [ ], 'slug': 'openai-education-spinningup', 'name': u'spinningup', 'rtd_language': u'en', 'programming_language': u'words', 'canonical_url': 'https://spinningup.openai.com/en/latest/', 'analytics_code': 'UA-129132782-1', 'single_version': False, 'conf_py_path': '/docs/', 'api_host': 'https://readthedocs.com', 'github_user': 'openai', 'github_repo': 'spinningup', 'github_version': 'master', 'display_github': True, 'bitbucket_user': 'None', 'bitbucket_repo': 'None', 'bitbucket_version': 'master', 'display_bitbucket': False, 'gitlab_user': 'None', 'gitlab_repo': 'None', 'gitlab_version': 'master', 'display_gitlab': False, 'READTHEDOCS': True, 'using_theme': (html_theme == "default"), 'new_theme': (html_theme == "sphinx_rtd_theme"), 'source_suffix': SUFFIX, 'ad_free': False, 'user_analytics_code': 'UA-129132782-1', 'global_analytics_code': 'UA-17997319-2', 'commit': 'da53ce34', } if 'html_context' in globals(): html_context.update(context) else: html_context = context # Add custom RTD extension if 'extensions' in globals(): # Insert at the beginning because it can interfere # with other extensions. # See https://github.com/rtfd/readthedocs.org/pull/4054 extensions.insert(0, "readthedocs_ext.readthedocs") else: extensions = ["readthedocs_ext.readthedocs"] project_language = 'en' # User's Sphinx configurations language_user = globals().get('language', None) latex_engine_user = globals().get('latex_engine', None) latex_elements_user = globals().get('latex_elements', None) # Remove this once xindy gets installed in Docker image and XINDYOPS # env variable is supported # https://github.com/rtfd/readthedocs-docker-images/pull/98 latex_use_xindy = False chinese = any([ language_user in ('zh_CN', 'zh_TW'), project_language in ('zh_CN', 'zh_TW'), ]) japanese = any([ language_user == 'ja', project_language == 'ja', ]) if chinese: latex_engine = latex_engine_user or 'xelatex' latex_elements_rtd = { 'preamble': '\\usepackage[UTF8]{ctex}\n', } latex_elements = latex_elements_user or latex_elements_rtd elif japanese: latex_engine = latex_engine_user or 'platex' [rtd-command-info] start-time: 2019-06-28T21:44:23.750093Z, end-time: 2019-06-28T21:45:51.440911Z, duration: 87, exit-code: 0 python sphinx-build -T -E -b readthedocs -d _build/doctrees-readthedocs -D language=en . _build/html Running Sphinx v1.5.6 making output directory... WARNING: Logging before flag parsing goes to stderr. W0628 21:44:26.274164 139987855175808 deprecation_wrapper.py:119] From /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/spinup/utils/mpi_tf.py:29: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. loading translations [en]... done building [mo]: targets for 0 po files that are out of date building [readthedocs]: targets for 31 source files that are out of date updating environment: 31 added, 0 changed, 0 removed reading sources... [ 3%] algorithms/ddpg reading sources... [ 6%] algorithms/ppo reading sources... [ 9%] algorithms/sac reading sources... [ 12%] algorithms/td3 reading sources... [ 16%] algorithms/trpo reading sources... [ 19%] algorithms/vpg reading sources... [ 22%] etc/acknowledgements reading sources... [ 25%] etc/author reading sources... [ 29%] index reading sources... [ 32%] spinningup/bench reading sources... [ 35%] spinningup/exercise2_1_soln reading sources... [ 38%] spinningup/exercise2_2_soln reading sources... [ 41%] spinningup/exercises reading sources... [ 45%] spinningup/extra_pg_proof1 reading sources... [ 48%] spinningup/extra_pg_proof2 reading sources... [ 51%] spinningup/keypapers reading sources... [ 54%] spinningup/rl_intro reading sources... [ 58%] spinningup/rl_intro2 reading sources... [ 61%] spinningup/rl_intro3 reading sources... [ 64%] spinningup/rl_intro4 reading sources... [ 67%] spinningup/spinningup reading sources... [ 70%] user/algorithms reading sources... [ 74%] user/installation reading sources... [ 77%] user/introduction reading sources... [ 80%] user/plotting reading sources... [ 83%] user/running reading sources... [ 87%] user/saving_and_loading reading sources... [ 90%] utils/logger reading sources... [ 93%] utils/mpi reading sources... [ 96%] utils/plotter reading sources... [100%] utils/run_utils /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercises.rst:3: WARNING: Duplicate explicit target name: "solution available here.". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro3.rst:3: WARNING: Duplicate explicit target name: "on github". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro3.rst:380: WARNING: Line block ends without a blank line. /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/plotting.rst:55: WARNING: Duplicate explicit target name: "cmdoption-arg-default". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/plotting.rst:63: WARNING: Duplicate explicit target name: "cmdoption-arg-default". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/running.rst:144: WARNING: Duplicate explicit target name: "cmdoption--ac_kwargs". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/running.rst:285: WARNING: Inline strong start-string without end-string. /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/saving_and_loading.rst:119: WARNING: Duplicate explicit target name: "cmdoption-arg-default". /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/user/saving_and_loading.rst:146: WARNING: Duplicate explicit target name: "cmdoption-arg-default". looking for now-outdated files... none found pickling environment... done checking consistency... /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_1_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_2_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof1.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof2.rst:: WARNING: document isn't included in any toctree done preparing documents... /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro4.rst:: WARNING: document isn't included in any toctree done writing output... [ 3%] algorithms/ddpg writing output... [ 6%] algorithms/ppo writing output... [ 9%] algorithms/sac writing output... [ 12%] algorithms/td3 writing output... [ 16%] algorithms/trpo writing output... [ 19%] algorithms/vpg writing output... [ 22%] etc/acknowledgements writing output... [ 25%] etc/author writing output... [ 29%] index writing output... [ 32%] spinningup/bench writing output... [ 35%] spinningup/exercise2_1_soln writing output... [ 38%] spinningup/exercise2_2_soln writing output... [ 41%] spinningup/exercises writing output... [ 45%] spinningup/extra_pg_proof1 writing output... [ 48%] spinningup/extra_pg_proof2 writing output... [ 51%] spinningup/keypapers writing output... [ 54%] spinningup/rl_intro writing output... [ 58%] spinningup/rl_intro2 writing output... [ 61%] spinningup/rl_intro3 writing output... [ 64%] spinningup/rl_intro4 writing output... [ 67%] spinningup/spinningup writing output... [ 70%] user/algorithms writing output... [ 74%] user/installation writing output... [ 77%] user/introduction writing output... [ 80%] user/plotting writing output... [ 83%] user/running writing output... [ 87%] user/saving_and_loading writing output... [ 90%] utils/logger writing output... [ 93%] utils/mpi writing output... [ 96%] utils/plotter writing output... [100%] utils/run_utils generating indices... genindex py-modindex highlighting module code... [ 10%] spinup.algos.ddpg.ddpg highlighting module code... [ 20%] spinup.algos.ppo.ppo highlighting module code... [ 30%] spinup.algos.sac.sac highlighting module code... [ 40%] spinup.algos.td3.td3 highlighting module code... [ 50%] spinup.algos.trpo.trpo highlighting module code... [ 60%] spinup.algos.vpg.vpg highlighting module code... [ 70%] spinup.utils.logx highlighting module code... [ 80%] spinup.utils.mpi_tools highlighting module code... [ 90%] spinup.utils.mpi_tf highlighting module code... [100%] spinup.utils.run_utils writing additional pages... search copying images... [ 10%] images/spinning-up-in-rl.png copying images... [ 20%] spinningup/../images/bench/bench_halfcheetah.svg copying images... [ 30%] spinningup/../images/bench/bench_hopper.svg copying images... [ 40%] spinningup/../images/bench/bench_walker.svg copying images... [ 50%] spinningup/../images/bench/bench_swim.svg copying images... [ 60%] spinningup/../images/bench/bench_ant.svg copying images... [ 70%] spinningup/../images/ex2-1_trpo_hopper.png copying images... [ 80%] spinningup/../images/ex2-2_ddpg_bug.svg copying images... [ 90%] spinningup/../images/rl_diagram_transparent_bg.png copying images... [100%] spinningup/../images/rl_algorithms_9_15.svg copying static files... WARNING: favicon file 'openai_icon.ico' does not exist done copying extra files... done dumping search index in English (code: en) ... done dumping object inventory... done build succeeded, 15 warnings. [rtd-command-info] start-time: 2019-06-28T21:45:51.655013Z, end-time: 2019-06-28T21:47:03.193622Z, duration: 71, exit-code: 0 python sphinx-build -T -b readthedocssinglehtmllocalmedia -d _build/doctrees-readthedocssinglehtmllocalmedia -D language=en . _build/localmedia Running Sphinx v1.5.6 making output directory... WARNING: Logging before flag parsing goes to stderr. W0628 21:45:54.407126 139787790471296 deprecation_wrapper.py:119] From /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/spinup/utils/mpi_tf.py:29: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. loading translations [en]... done loading pickled environment... done building [mo]: targets for 0 po files that are out of date building [readthedocssinglehtmllocalmedia]: all documents updating environment: 0 added, 2 changed, 0 removed reading sources... [ 50%] user/algorithms reading sources... [100%] user/installation looking for now-outdated files... none found pickling environment... done checking consistency... /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_1_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_2_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof1.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof2.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro4.rst:: WARNING: document isn't included in any toctree done preparing documents... done assembling single document... user/introduction user/installation user/algorithms user/running user/saving_and_loading user/plotting spinningup/rl_intro spinningup/rl_intro2 spinningup/rl_intro3 spinningup/spinningup spinningup/keypapers spinningup/exercises spinningup/bench algorithms/vpg algorithms/trpo algorithms/ppo algorithms/ddpg algorithms/td3 algorithms/sac utils/logger utils/plotter utils/mpi utils/run_utils etc/acknowledgements etc/author writing... done writing additional files... copying images... [ 12%] images/spinning-up-in-rl.png copying images... [ 25%] spinningup/../images/rl_diagram_transparent_bg.png copying images... [ 37%] spinningup/../images/rl_algorithms_9_15.svg copying images... [ 50%] spinningup/../images/bench/bench_halfcheetah.svg copying images... [ 62%] spinningup/../images/bench/bench_hopper.svg copying images... [ 75%] spinningup/../images/bench/bench_walker.svg copying images... [ 87%] spinningup/../images/bench/bench_swim.svg copying images... [100%] spinningup/../images/bench/bench_ant.svg copying static files... WARNING: favicon file 'openai_icon.ico' does not exist done copying extra files... done dumping object inventory... done build succeeded, 6 warnings. [rtd-command-info] start-time: 2019-06-28T21:47:03.418794Z, end-time: 2019-06-28T21:47:07.358175Z, duration: 3, exit-code: 0 python sphinx-build -b latex -D language=en -d _build/doctrees . _build/latex Running Sphinx v1.5.6 making output directory... WARNING: Logging before flag parsing goes to stderr. W0628 21:47:05.958384 139776752234624 deprecation_wrapper.py:119] From /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/spinup/utils/mpi_tf.py:29: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. loading translations [en]... done loading pickled environment... done building [mo]: targets for 0 po files that are out of date building [latex]: all documents updating environment: 0 added, 2 changed, 0 removed reading sources... [ 50%] user/algorithms reading sources... [100%] user/installation looking for now-outdated files... none found pickling environment... done checking consistency... /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_1_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_2_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof1.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof2.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro4.rst:: WARNING: document isn't included in any toctree done processing SpinningUp.tex... index user/introduction user/installation user/algorithms user/running user/saving_and_loading user/plotting spinningup/rl_intro spinningup/rl_intro2 spinningup/rl_intro3 spinningup/spinningup spinningup/keypapers spinningup/exercises spinningup/bench algorithms/vpg algorithms/trpo algorithms/ppo algorithms/ddpg algorithms/td3 algorithms/sac utils/logger utils/plotter utils/mpi utils/run_utils etc/acknowledgements etc/author resolving references... writing... done copying images... images/spinning-up-in-rl.png spinningup/../images/rl_diagram_transparent_bg.png spinningup/../images/rl_algorithms_9_15.svg spinningup/../images/bench/bench_halfcheetah.svg spinningup/../images/bench/bench_hopper.svg spinningup/../images/bench/bench_walker.svg spinningup/../images/bench/bench_swim.svg spinningup/../images/bench/bench_ant.svg copying TeX support files... done build succeeded, 5 warnings. [rtd-command-info] start-time: 2019-06-28T21:47:07.800509Z, end-time: 2019-06-28T21:47:09.148927Z, duration: 1, exit-code: 0 pdflatex -interaction=nonstopmode /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/_build/latex/SpinningUp.tex This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex) restricted \write18 enabled. entering extended mode (/home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/c heckouts/latest/docs/_build/latex/SpinningUp.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 84 language(s) loaded. (./sphinxmanual.cls Document Class: sphinxmanual 2017/03/26 v1.5.4 Document class (Sphinx manual) (/usr/share/texlive/texmf-dist/tex/latex/base/report.cls Document Class: report 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo))) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/utf8.def (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/ot1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/omsenc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/cmap/cmap.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/fontenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.def)<>) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/generic/babel/babel.sty (/usr/share/texlive/texmf-dist/tex/generic/babel/switch.def) (/usr/share/texlive/texmf-dist/tex/generic/babel-english/english.ldf (/usr/share/texlive/texmf-dist/tex/generic/babel/babel.def (/usr/share/texlive/texmf-dist/tex/generic/babel/txtbabel.def)))) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/times.sty) (/usr/share/texlive/texmf-dist/tex/latex/fncychap/fncychap.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/longtable.sty) (./sphinx.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphicx.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty) (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphics.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/trig.sty) (/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/graphics.cfg) (/usr/share/texlive/texmf-dist/tex/latex/graphics-def/pdftex.def))) (/usr/share/texlive/texmf-dist/tex/latex/fancyhdr/fancyhdr.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/textcomp.sty (/usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.def (/usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/titlesec/titlesec.sty) (/usr/share/texlive/texmf-dist/tex/latex/tabulary/tabulary.sty (/usr/share/texlive/texmf-dist/tex/latex/tools/array.sty)) (/usr/share/texlive/texmf-dist/tex/latex/base/makeidx.sty) (/usr/share/texlive/texmf-dist/tex/latex/framed/framed.sty) (/usr/share/texlive/texmf-dist/tex/latex/xcolor/xcolor.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/color.cfg)) (/usr/share/texlive/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix <2008/02/07> (tvz)) (/usr/share/texlive/texmf-dist/tex/latex/threeparttable/threeparttable.sty) (./footnotehyper-sphinx.sty (/usr/share/texlive/texmf-dist/tex/latex/mdwtools/footnote.sty)) (/usr/share/texlive/texmf-dist/tex/latex/float/float.sty) (/usr/share/texlive/texmf-dist/tex/latex/wrapfig/wrapfig.sty) (/usr/share/texlive/texmf-dist/tex/latex/parskip/parskip.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/alltt.sty) (/usr/share/texlive/texmf-dist/tex/latex/upquote/upquote.sty) (/usr/share/texlive/texmf-dist/tex/latex/capt-of/capt-of.sty) (./needspace.sty) (./sphinxhighlight.sty) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/kvoptions.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ltxcmds.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/kvsetkeys.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/infwarerr.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/etexcmds.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifluatex.sty)))) (/usr/share/texlive/texmf-dist/tex/generic/pdftex/pdfcolor.tex) ** (sphinx) defining (legacy) text style macros without \sphinx prefix ** if clashes with packages, set latex_keep_old_macro_names=False in conf.py ) (/usr/share/texlive/texmf-dist/tex/latex/geometry/geometry.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifpdf.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifvtex.sty) (/usr/share/texlive/texmf-dist/tex/generic/ifxetex/ifxetex.sty)) (/usr/share/texlive/texmf-dist/tex/latex/multirow/multirow.sty) (/usr/share/texlive/texmf-dist/tex/latex/eqparbox/eqparbox.sty (/usr/share/texlive/texmf-dist/tex/latex/environ/environ.sty (/usr/share/texlive/texmf-dist/tex/latex/trimspaces/trimspaces.sty))) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/hyperref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-hyperref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-generic.sty)) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/auxhook.sty) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/pd1enc.def) (/usr/share/texlive/texmf-dist/tex/latex/latexconfig/hyperref.cfg) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/puenc.def) (/usr/share/texlive/texmf-dist/tex/latex/url/url.sty)) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/hpdftex.def (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/rerunfilecheck.sty)) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/hypcap.sty) Writing index file SpinningUp.idx No file SpinningUp.aux. (/usr/share/texlive/texmf-dist/tex/latex/base/ts1cmr.fd) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1ptm.fd) (/usr/share/texlive/texmf-dist/tex/context/base/mkii/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] ) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/epstopdf-base.sty (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/grfext.sty) (/usr/share/texlive/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg)) *geometry* driver: auto-detecting *geometry* detected driver: pdftex (/usr/share/texlive/texmf-dist/tex/latex/hyperref/nameref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/gettitlestring.sty)) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1phv.fd)<><><><> (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd) [1{/var/lib/texmf/fo nts/map/pdftex/updmap/pdftex.map}] [2] [1] [2] (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1pcr.fd) [1 <./spinning-up-in- rl.png>] [2] Chapter 1. (/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1ptm.fd) LaTeX Warning: Hyper reference `user/introduction:introduction' on page 3 undef ined on input line 68. LaTeX Warning: Hyper reference `user/introduction:what-this-is' on page 3 undef ined on input line 71. LaTeX Warning: Hyper reference `user/introduction:why-we-built-this' on page 3 undefined on input line 74. LaTeX Warning: Hyper reference `user/introduction:how-this-serves-our-mission' on page 3 undefined on input line 77. LaTeX Warning: Hyper reference `user/introduction:code-design-philosophy' on pa ge 3 undefined on input line 80. LaTeX Warning: Hyper reference `user/introduction:support-plan' on page 3 undef ined on input line 83. [3] [4] [5] [6] Chapter 2. LaTeX Warning: Hyper reference `user/installation:installation' on page 7 undef ined on input line 207. LaTeX Warning: Hyper reference `user/installation:installing-python' on page 7 undefined on input line 210. LaTeX Warning: Hyper reference `user/installation:installing-openmpi' on page 7 undefined on input line 213. LaTeX Warning: Hyper reference `user/installation:ubuntu' on page 7 undefined o n input line 216. LaTeX Warning: Hyper reference `user/installation:mac-os-x' on page 7 undefined on input line 219. LaTeX Warning: Hyper reference `user/installation:installing-spinning-up' on pa ge 7 undefined on input line 224. LaTeX Warning: Hyper reference `user/installation:check-your-install' on page 7 undefined on input line 227. LaTeX Warning: Hyper reference `user/installation:installing-mujoco-optional' o n page 7 undefined on input line 230. [7] [8] [9] [10] Chapter 3. LaTeX Warning: Hyper reference `user/algorithms:algorithms' on page 11 undefine d on input line 361. LaTeX Warning: Hyper reference `user/algorithms:what-s-included' on page 11 und efined on input line 364. LaTeX Warning: Hyper reference `user/algorithms:why-these-algorithms' on page 1 1 undefined on input line 367. LaTeX Warning: Hyper reference `user/algorithms:the-on-policy-algorithms' on pa ge 11 undefined on input line 370. LaTeX Warning: Hyper reference `user/algorithms:the-off-policy-algorithms' on p age 11 undefined on input line 373. LaTeX Warning: Hyper reference `user/algorithms:code-format' on page 11 undefin ed on input line 378. LaTeX Warning: Hyper reference `user/algorithms:the-algorithm-file' on page 11 undefined on input line 381. LaTeX Warning: Hyper reference `user/algorithms:the-core-file' on page 11 undef ined on input line 384. [11] [12] [13] [14] Chapter 4. LaTeX Warning: Hyper reference `user/running:running-experiments' on page 15 un defined on input line 530. LaTeX Warning: Hyper reference `user/running:launching-from-the-command-line' o n page 15 undefined on input line 533. LaTeX Warning: Hyper reference `user/running:setting-hyperparameters-from-the-c ommand-line' on page 15 undefined on input line 536. LaTeX Warning: Hyper reference `user/running:launching-multiple-experiments-at- once' on page 15 undefined on input line 539. LaTeX Warning: Hyper reference `user/running:special-flags' on page 15 undefine d on input line 542. LaTeX Warning: Hyper reference `user/running:environment-flag' on page 15 undef ined on input line 545. LaTeX Warning: Hyper reference `user/running:shortcut-flags' on page 15 undefin ed on input line 548. LaTeX Warning: Hyper reference `user/running:config-flags' on page 15 undefined on input line 551. LaTeX Warning: Hyper reference `user/running:where-results-are-saved' on page 1 5 undefined on input line 556. LaTeX Warning: Hyper reference `user/running:how-is-suffix-determined' on page 15 undefined on input line 559. LaTeX Warning: Hyper reference `user/running:extra' on page 15 undefined on inp ut line 564. LaTeX Warning: Hyper reference `user/running:launching-from-scripts' on page 15 undefined on input line 569. LaTeX Warning: Hyper reference `user/running:using-experimentgrid' on page 15 u ndefined on input line 572. [15] [16] [17] [18] (/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1pcr.fd) [19] [20] Chapter 5. LaTeX Warning: Hyper reference `user/saving_and_loading:experiment-outputs' on page 21 undefined on input line 900. LaTeX Warning: Hyper reference `user/saving_and_loading:algorithm-outputs' on p age 21 undefined on input line 903. LaTeX Warning: Hyper reference `user/saving_and_loading:save-directory-location ' on page 21 undefined on input line 906. LaTeX Warning: Hyper reference `user/saving_and_loading:loading-and-running-tra ined-policies' on page 21 undefined on input line 909. LaTeX Warning: Hyper reference `user/saving_and_loading:if-environment-saves-su ccessfully' on page 21 undefined on input line 912. LaTeX Warning: Hyper reference `user/saving_and_loading:environment-not-found-e rror' on page 21 undefined on input line 915. LaTeX Warning: Hyper reference `user/saving_and_loading:using-trained-value-fun ctions' on page 21 undefined on input line 918. [21] LaTeX Warning: Hyper reference `user/saving_and_loading:details-below' on page 22 undefined on input line 963. [22] [23] [24] [25] [26] Chapter 6. [27] [28] Chapter 7. LaTeX Warning: Hyper reference `spinningup/rl_intro:part-1-key-concepts-in-rl' on page 29 undefined on input line 1279. LaTeX Warning: Hyper reference `spinningup/rl_intro:what-can-rl-do' on page 29 undefined on input line 1282. LaTeX Warning: Hyper reference `spinningup/rl_intro:key-concepts-and-terminolog y' on page 29 undefined on input line 1285. LaTeX Warning: Hyper reference `spinningup/rl_intro:optional-formalism' on page 29 undefined on input line 1288. [29] [30 <./rl_diagram_transparent_bg.png>] [31] [32] [33] ! Undefined control sequence. ... } P(\tau |\pi ) R(\tau ) = \underE {\tau \sim \pi }{R(\tau )}... l.1531 ...nderE{\tau\sim \pi}{R(\tau)}.\end{split} ! Undefined control sequence. ... } P(\tau |\pi ) R(\tau ) = \underE {\tau \sim \pi }{R(\tau )}... l.1531 ...nderE{\tau\sim \pi}{R(\tau)}.\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{R(\tau )\... l.1550 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{R(\tau )\... l.1550 ...R(\tau)\left| s_0 = s\right.}\end{split} [34] ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{R(\tau )\... l.1557 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{R(\tau )\... l.1557 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...split}V^*(s) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1564 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...split}V^*(s) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1564 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...lit}Q^*(s,a) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1571 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...lit}Q^*(s,a) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1571 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a\sim \pi }{Q^{\pi }(s,a)... l.1585 ...erE{a\sim \pi}{Q^{\pi}(s,a)},\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a\sim \pi }{Q^{\pi }(s,a)... l.1585 ...erE{a\sim \pi}{Q^{\pi}(s,a)},\end{split} [35] ! Undefined control sequence. V^{\pi }(s) &= \underE {a \sim \pi \\ s'\sim P}{r(s,a) + \gamma ... l.1618 \end{align*} ! Missing } inserted. } l.1618 \end{align*} ! Missing { inserted. { l.1618 \end{align*} ! Undefined control sequence. ...}(s')}, \\ Q^{\pi }(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1618 \end{align*} ! Undefined control sequence. ... {s'\sim P}{r(s,a) + \gamma \underE {a'\sim \pi }{Q^{\pi }(s',... l.1618 \end{align*} ! Undefined control sequence. V^{\pi }(s) &= \underE {a \sim \pi \\ s'\sim P}{r(s,a) + \gamma ... l.1618 \end{align*} ! Missing } inserted. } l.1618 \end{align*} ! Missing \endgroup inserted. \endgroup l.1618 \end{align*} ! Misplaced \omit. \math@cr@@@ ...@ \@ne \add@amps \maxfields@ \omit \kern -\alignsep@ \iftag@ ... l.1618 \end{align*} ! Missing { inserted. { l.1618 \end{align*} ! Undefined control sequence. ...}(s')}, \\ Q^{\pi }(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1618 \end{align*} ! Undefined control sequence. ... {s'\sim P}{r(s,a) + \gamma \underE {a'\sim \pi }{Q^{\pi }(s',... l.1618 \end{align*} ! Undefined control sequence. V^*(s) &= \max _a \underE {s'\sim P}{r(s,a) + \gamma V^*(s')}, \... l.1625 \end{align*} ! Undefined control sequence. ...ma V^*(s')}, \\ Q^*(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1625 \end{align*} ! Undefined control sequence. V^*(s) &= \max _a \underE {s'\sim P}{r(s,a) + \gamma V^*(s')}, \... l.1625 \end{align*} ! Undefined control sequence. ...ma V^*(s')}, \\ Q^*(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1625 \end{align*} [36] [37] [38] Chapter 8. LaTeX Warning: Hyper reference `spinningup/rl_intro2:part-2-kinds-of-rl-algorit hms' on page 39 undefined on input line 1678. LaTeX Warning: Hyper reference `spinningup/rl_intro2:a-taxonomy-of-rl-algorithm s' on page 39 undefined on input line 1681. LaTeX Warning: Hyper reference `spinningup/rl_intro2:links-to-algorithms-in-tax onomy' on page 39 undefined on input line 1684. ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.1699 ...ncludegraphics{{rl_algorithms_9_15}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.1699 ...ncludegraphics{{rl_algorithms_9_15}.svg} LaTeX Warning: Hyper reference `spinningup/rl_intro2:citations-below' on page 3 9 undefined on input line 1700. [39] [40] [41] [42] Chapter 9. LaTeX Warning: Hyper reference `spinningup/rl_intro3:part-3-intro-to-policy-opt imization' on page 43 undefined on input line 1841. LaTeX Warning: Hyper reference `spinningup/rl_intro3:deriving-the-simplest-poli cy-gradient' on page 43 undefined on input line 1844. LaTeX Warning: Hyper reference `spinningup/rl_intro3:implementing-the-simplest- policy-gradient' on page 43 undefined on input line 1847. LaTeX Warning: Hyper reference `spinningup/rl_intro3:expected-grad-log-prob-lem ma' on page 43 undefined on input line 1850. LaTeX Warning: Hyper reference `spinningup/rl_intro3:don-t-let-the-past-distrac t-you' on page 43 undefined on input line 1853. LaTeX Warning: Hyper reference `spinningup/rl_intro3:implementing-reward-to-go- policy-gradient' on page 43 undefined on input line 1856. LaTeX Warning: Hyper reference `spinningup/rl_intro3:baselines-in-policy-gradie nts' on page 43 undefined on input line 1859. LaTeX Warning: Hyper reference `spinningup/rl_intro3:other-forms-of-the-policy- gradient' on page 43 undefined on input line 1862. LaTeX Warning: Hyper reference `spinningup/rl_intro3:recap' on page 43 undefine d on input line 1865. ! Undefined control sequence. l.1890 ...ected return \(J(\pi_{\theta}) = \underE {\tau \sim \pi_{\theta}}{R... [43] ! Undefined control sequence. ...} \log P(\tau | \theta ) &= \cancel {\nabla _{\theta } \log \r... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} + \sum _{t=0}^{T} \bigg ( \cancel {\nabla _{\theta } \log P(... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} \log P(\tau | \theta ) &= \cancel {\nabla _{\theta } \log \r... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} + \sum _{t=0}^{T} \bigg ( \cancel {\nabla _{\theta } \log P(... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...eta }) &= \nabla _{\theta } \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...Log-derivative trick} \\ &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...heta } J(\pi _{\theta }) &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...eta }) &= \nabla _{\theta } \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...Log-derivative trick} \\ &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...heta } J(\pi _{\theta }) &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} [44] [45] [46] [47] ! Undefined control sequence. \split@tag \begin {split}\underE {x \sim P_{\theta }}{\nabla _{\t... l.2082 ...eta} \log P_{\theta}(x)} = 0.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {x \sim P_{\theta }}{\nabla _{\t... l.2082 ...eta} \log P_{\theta}(x)} = 0.\end{split} ! Undefined control sequence. ...eta }(x) \\ \therefore 0 &= \underE {x \sim P_{\theta }}{\nabl... l.2099 ...{\theta} \log P_{\theta}(x)}.\end{split} ! Undefined control sequence. ...eta }(x) \\ \therefore 0 &= \underE {x \sim P_{\theta }}{\nabl... l.2099 ...{\theta} \log P_{\theta}(x)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2107 ..._{\theta}(a_t |s_t) R(\tau)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2107 ..._{\theta}(a_t |s_t) R(\tau)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2115 ...R(s_{t'}, a_{t'}, s_{t'+1})}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2115 ...R(s_{t'}, a_{t'}, s_{t'+1})}.\end{split} [48] ! Undefined control sequence. \split@tag \begin {split}\underE {a_t \sim \pi _{\theta }}{\nabla... l.2167 ...\theta}(a_t|s_t) b(s_t)} = 0.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a_t \sim \pi _{\theta }}{\nabla... l.2167 ...\theta}(a_t|s_t) b(s_t)} = 0.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2171 ..., s_{t'+1}) - b(s_t)\right)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2171 ..., s_{t'+1}) - b(s_t)\right)}.\end{split} [49] ! Undefined control sequence. ...phi _k = \arg \min _{\phi } \underE {s_t, \hat {R}_t \sim \pi ... l.2185 ...(s_t) - \hat{R}_t \right)^2},\end{split} ! Undefined control sequence. ...phi _k = \arg \min _{\phi } \underE {s_t, \hat {R}_t \sim \pi ... l.2185 ...(s_t) - \hat{R}_t \right)^2},\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2199 ...i_{\theta}(a_t |s_t) \Phi_t},\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2199 ...i_{\theta}(a_t |s_t) \Phi_t},\end{split} [50] [51] [52] Chapter 10. LaTeX Warning: Hyper reference `spinningup/spinningup:spinning-up-as-a-deep-rl- researcher' on page 53 undefined on input line 2253. LaTeX Warning: Hyper reference `spinningup/spinningup:the-right-background' on page 53 undefined on input line 2256. LaTeX Warning: Hyper reference `spinningup/spinningup:learn-by-doing' on page 5 3 undefined on input line 2259. LaTeX Warning: Hyper reference `spinningup/spinningup:developing-a-research-pro ject' on page 53 undefined on input line 2262. LaTeX Warning: Hyper reference `spinningup/spinningup:doing-rigorous-research-i n-rl' on page 53 undefined on input line 2265. LaTeX Warning: Hyper reference `spinningup/spinningup:closing-thoughts' on page 53 undefined on input line 2268. LaTeX Warning: Hyper reference `spinningup/spinningup:ps-other-resources' on pa ge 53 undefined on input line 2271. LaTeX Warning: Hyper reference `spinningup/spinningup:references' on page 53 un defined on input line 2274. [53] [54] [55] [56] [57] [58] Chapter 11. LaTeX Warning: Hyper reference `spinningup/keypapers:key-papers-in-deep-rl' on page 59 undefined on input line 2383. LaTeX Warning: Hyper reference `spinningup/keypapers:model-free-rl' on page 59 undefined on input line 2386. LaTeX Warning: Hyper reference `spinningup/keypapers:exploration' on page 59 un defined on input line 2389. LaTeX Warning: Hyper reference `spinningup/keypapers:transfer-and-multitask-rl' on page 59 undefined on input line 2392. LaTeX Warning: Hyper reference `spinningup/keypapers:hierarchy' on page 59 unde fined on input line 2395. LaTeX Warning: Hyper reference `spinningup/keypapers:memory' on page 59 undefin ed on input line 2398. LaTeX Warning: Hyper reference `spinningup/keypapers:model-based-rl' on page 59 undefined on input line 2401. LaTeX Warning: Hyper reference `spinningup/keypapers:meta-rl' on page 59 undefi ned on input line 2404. LaTeX Warning: Hyper reference `spinningup/keypapers:scaling-rl' on page 59 und efined on input line 2407. LaTeX Warning: Hyper reference `spinningup/keypapers:rl-in-the-real-world' on p age 59 undefined on input line 2410. LaTeX Warning: Hyper reference `spinningup/keypapers:safety' on page 59 undefin ed on input line 2413. LaTeX Warning: Hyper reference `spinningup/keypapers:imitation-learning-and-inv erse-reinforcement-learning' on page 59 undefined on input line 2416. LaTeX Warning: Hyper reference `spinningup/keypapers:reproducibility-analysis-a nd-critique' on page 59 undefined on input line 2419. LaTeX Warning: Hyper reference `spinningup/keypapers:bonus-classic-papers-in-rl -theory-or-review' on page 59 undefined on input line 2422. [59] [60] Overfull \vbox (103.35579pt too high) has occurred while \output is active [61] [62] Chapter 12. LaTeX Warning: Hyper reference `spinningup/exercises:exercises' on page 63 unde fined on input line 2511. LaTeX Warning: Hyper reference `spinningup/exercises:problem-set-1-basics-of-im plementation' on page 63 undefined on input line 2514. LaTeX Warning: Hyper reference `spinningup/exercises:problem-set-2-algorithm-fa ilure-modes' on page 63 undefined on input line 2517. LaTeX Warning: Hyper reference `spinningup/exercises:challenges' on page 63 und efined on input line 2520. [63] [64] [65] [66] Chapter 13. LaTeX Warning: Hyper reference `spinningup/bench:benchmarks-for-spinning-up-imp lementations' on page 67 undefined on input line 2659. LaTeX Warning: Hyper reference `spinningup/bench:performance-in-each-environmen t' on page 67 undefined on input line 2662. LaTeX Warning: Hyper reference `spinningup/bench:halfcheetah' on page 67 undefi ned on input line 2665. LaTeX Warning: Hyper reference `spinningup/bench:hopper' on page 67 undefined o n input line 2668. LaTeX Warning: Hyper reference `spinningup/bench:walker' on page 67 undefined o n input line 2671. LaTeX Warning: Hyper reference `spinningup/bench:swimmer' on page 67 undefined on input line 2674. LaTeX Warning: Hyper reference `spinningup/bench:ant' on page 67 undefined on i nput line 2677. LaTeX Warning: Hyper reference `spinningup/bench:experiment-details' on page 67 undefined on input line 2682. ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2700 ...includegraphics{{bench_halfcheetah}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2700 ...includegraphics{{bench_halfcheetah}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2709 ...phinxincludegraphics{{bench_hopper}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2709 ...phinxincludegraphics{{bench_hopper}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2718 ...phinxincludegraphics{{bench_walker}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2718 ...phinxincludegraphics{{bench_walker}.svg} [67] ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2727 ...\sphinxincludegraphics{{bench_swim}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2727 ...\sphinxincludegraphics{{bench_swim}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2736 ...t\sphinxincludegraphics{{bench_ant}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2736 ...t\sphinxincludegraphics{{bench_ant}.svg} [68] [69] [70] Chapter 14. LaTeX Warning: Hyper reference `algorithms/vpg:vanilla-policy-gradient' on page 71 undefined on input line 2759. LaTeX Warning: Hyper reference `algorithms/vpg:background' on page 71 undefined on input line 2762. LaTeX Warning: Hyper reference `algorithms/vpg:quick-facts' on page 71 undefine d on input line 2765. LaTeX Warning: Hyper reference `algorithms/vpg:key-equations' on page 71 undefi ned on input line 2768. LaTeX Warning: Hyper reference `algorithms/vpg:exploration-vs-exploitation' on page 71 undefined on input line 2771. LaTeX Warning: Hyper reference `algorithms/vpg:pseudocode' on page 71 undefined on input line 2774. LaTeX Warning: Hyper reference `algorithms/vpg:documentation' on page 71 undefi ned on input line 2779. LaTeX Warning: Hyper reference `algorithms/vpg:saved-model-contents' on page 71 undefined on input line 2782. LaTeX Warning: Hyper reference `algorithms/vpg:references' on page 71 undefined on input line 2787. LaTeX Warning: Hyper reference `algorithms/vpg:relevant-papers' on page 71 unde fined on input line 2790. LaTeX Warning: Hyper reference `algorithms/vpg:why-these-papers' on page 71 und efined on input line 2793. LaTeX Warning: Hyper reference `algorithms/vpg:other-public-implementations' on page 71 undefined on input line 2796. [71] ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2833 },\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2833 },\end{split} ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2850 ...rithms/vpg:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2851 \caption {Vanilla Policy Gradient Algorithm} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2853 \begin{algorithmic} [1] ! Undefined control sequence. l.2854 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.2855 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.2856 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.2857 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.2858 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.2859 \STATE Estimate policy gradient as ! Undefined control sequence. l.2863 \STATE Compute policy update, either using standard gradient ascent, ! Undefined control sequence. l.2868 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.2873 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2874 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2875 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 2881--2881 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 2881--2881 \T1/ptm/m/it/10 steps_per_epoch=4000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 epochs=50\ T1/ptm/m/n/10 , \T1/ptm/m/it/10 gamma=0.99\T1/ptm/m/n/10 , \T1/ptm/m/it/10 pi_l r=0.0003\T1/ptm/m/n/10 , \T1/ptm/m/it/10 vf_lr=0.001\T1/ptm/m/n/10 , [72] [73] [74] [75] [76] Chapter 15. LaTeX Warning: Hyper reference `algorithms/trpo:trust-region-policy-optimizatio n' on page 77 undefined on input line 3081. LaTeX Warning: Hyper reference `algorithms/trpo:background' on page 77 undefine d on input line 3084. LaTeX Warning: Hyper reference `algorithms/trpo:quick-facts' on page 77 undefin ed on input line 3087. LaTeX Warning: Hyper reference `algorithms/trpo:key-equations' on page 77 undef ined on input line 3090. LaTeX Warning: Hyper reference `algorithms/trpo:exploration-vs-exploitation' on page 77 undefined on input line 3093. LaTeX Warning: Hyper reference `algorithms/trpo:pseudocode' on page 77 undefine d on input line 3096. LaTeX Warning: Hyper reference `algorithms/trpo:documentation' on page 77 undef ined on input line 3101. LaTeX Warning: Hyper reference `algorithms/trpo:saved-model-contents' on page 7 7 undefined on input line 3104. LaTeX Warning: Hyper reference `algorithms/trpo:references' on page 77 undefine d on input line 3109. LaTeX Warning: Hyper reference `algorithms/trpo:relevant-papers' on page 77 und efined on input line 3112. LaTeX Warning: Hyper reference `algorithms/trpo:why-these-papers' on page 77 un defined on input line 3115. LaTeX Warning: Hyper reference `algorithms/trpo:other-public-implementations' o n page 77 undefined on input line 3118. [77] ! Undefined control sequence. ...al L}(\theta _k, \theta ) = \underE {s,a \sim \pi _{\theta _k}... l.3162 },\end{split} ! Undefined control sequence. ...al L}(\theta _k, \theta ) = \underE {s,a \sim \pi _{\theta _k}... l.3162 },\end{split} ! Undefined control sequence. ...{KL}(\theta || \theta _k) = \underE {s \sim \pi _{\theta _k}}{... l.3168 }.\end{split} ! Undefined control sequence. ...{KL}(\theta || \theta _k) = \underE {s \sim \pi _{\theta _k}}{... l.3168 }.\end{split} [78] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3217 ...ithms/trpo:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3218 \caption {Trust Region Policy Optimization} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3220 \begin{algorithmic} [1] ! Undefined control sequence. l.3221 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.3222 \STATE Hyperparameters: KL-divergence limit $\delta$, backtrackin... ! Undefined control sequence. l.3223 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.3224 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.3225 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.3226 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.3227 \STATE Estimate policy gradient as ! Undefined control sequence. l.3231 \STATE Use the conjugate gradient algorithm to compute ! Undefined control sequence. l.3236 \STATE Update the policy by backtracking line search with [79] ! Undefined control sequence. l.3241 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.3246 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3247 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3248 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 steps_per_epoch=4000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 epochs=50\ T1/ptm/m/n/10 , \T1/ptm/m/it/10 gamma=0.99\T1/ptm/m/n/10 , \T1/ptm/m/it/10 delt a=0.01\T1/ptm/m/n/10 , \T1/ptm/m/it/10 vf_lr=0.001\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 train_v_iters=80\T1/ptm/m/n/10 , \T1/ptm/m/it/10 damp-ing_coeff =0.1\T1/ptm/m/n/10 , \T1/ptm/m/it/10 cg_iters=10\T1/ptm/m/n/10 , \T1/ptm/m/it/1 0 back-track_iters=10\T1/ptm/m/n/10 , \T1/ptm/m/it/10 back- Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 track_coeff=0.8\T1/ptm/m/n/10 , \T1/ptm/m/it/10 lam=0.97\T1/ptm /m/n/10 , \T1/ptm/m/it/10 max_ep_len=1000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 log-g er_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 save_freq=10\T1/ptm/m/n/10 , [80] Overfull \vbox (100.02797pt too high) has occurred while \output is active [81] [82] [83] [84] Chapter 16. LaTeX Warning: Hyper reference `algorithms/ppo:proximal-policy-optimization' on page 85 undefined on input line 3528. LaTeX Warning: Hyper reference `algorithms/ppo:background' on page 85 undefined on input line 3531. LaTeX Warning: Hyper reference `algorithms/ppo:quick-facts' on page 85 undefine d on input line 3534. LaTeX Warning: Hyper reference `algorithms/ppo:key-equations' on page 85 undefi ned on input line 3537. LaTeX Warning: Hyper reference `algorithms/ppo:exploration-vs-exploitation' on page 85 undefined on input line 3540. LaTeX Warning: Hyper reference `algorithms/ppo:pseudocode' on page 85 undefined on input line 3543. LaTeX Warning: Hyper reference `algorithms/ppo:documentation' on page 85 undefi ned on input line 3548. LaTeX Warning: Hyper reference `algorithms/ppo:saved-model-contents' on page 85 undefined on input line 3551. LaTeX Warning: Hyper reference `algorithms/ppo:references' on page 85 undefined on input line 3556. LaTeX Warning: Hyper reference `algorithms/ppo:relevant-papers' on page 85 unde fined on input line 3559. LaTeX Warning: Hyper reference `algorithms/ppo:why-these-papers' on page 85 und efined on input line 3562. LaTeX Warning: Hyper reference `algorithms/ppo:other-public-implementations' on page 85 undefined on input line 3565. [85] [86] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3674 ...rithms/ppo:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3675 \caption {PPO-Clip} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3677 \begin{algorithmic} [1] ! Undefined control sequence. l.3678 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.3679 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.3680 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.3681 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.3682 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.3683 \STATE Update the policy by maximizing the PPO-Clip objective: ! Undefined control sequence. l.3691 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.3696 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3697 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3698 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 3704--3704 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , [87] [88] [89] [90] Chapter 17. LaTeX Warning: Hyper reference `algorithms/ddpg:deep-deterministic-policy-gradi ent' on page 91 undefined on input line 3924. LaTeX Warning: Hyper reference `algorithms/ddpg:background' on page 91 undefine d on input line 3927. LaTeX Warning: Hyper reference `algorithms/ddpg:quick-facts' on page 91 undefin ed on input line 3930. LaTeX Warning: Hyper reference `algorithms/ddpg:key-equations' on page 91 undef ined on input line 3933. LaTeX Warning: Hyper reference `algorithms/ddpg:the-q-learning-side-of-ddpg' on page 91 undefined on input line 3936. LaTeX Warning: Hyper reference `algorithms/ddpg:the-policy-learning-side-of-ddp g' on page 91 undefined on input line 3939. LaTeX Warning: Hyper reference `algorithms/ddpg:exploration-vs-exploitation' on page 91 undefined on input line 3944. LaTeX Warning: Hyper reference `algorithms/ddpg:pseudocode' on page 91 undefine d on input line 3947. LaTeX Warning: Hyper reference `algorithms/ddpg:documentation' on page 91 undef ined on input line 3952. LaTeX Warning: Hyper reference `algorithms/ddpg:saved-model-contents' on page 9 1 undefined on input line 3955. LaTeX Warning: Hyper reference `algorithms/ddpg:references' on page 91 undefine d on input line 3960. LaTeX Warning: Hyper reference `algorithms/ddpg:relevant-papers' on page 91 und efined on input line 3963. LaTeX Warning: Hyper reference `algorithms/ddpg:why-these-papers' on page 91 un defined on input line 3966. LaTeX Warning: Hyper reference `algorithms/ddpg:other-public-implementations' o n page 91 undefined on input line 3969. [91] [92] [93] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4088 ...ithms/ddpg:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4089 \caption {Deep Deterministic Policy Gradient} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4091 \begin{algorithmic} [1] ! Undefined control sequence. l.4092 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4093 \STATE Set target parameters equal to main parameters $\theta_{\t... ! Undefined control sequence. l.4094 \REPEAT ! Undefined control sequence. l.4095 \STATE Observe state $s$ and select action $a = \text{clip}(\... ! Undefined control sequence. l.4096 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4097 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4098 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4099 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4100 \IF {it's time to update} ! Undefined control sequence. l.4101 \FOR {however many updates} ! Undefined control sequence. l.4102 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4103 \STATE Compute targets ! Undefined control sequence. l.4107 \STATE Update Q-function by one step of gradient desc... ! Undefined control sequence. l.4111 \STATE Update policy by one step of gradient ascent u... ! Undefined control sequence. l.4115 \STATE Update target networks with ! Undefined control sequence. l.4120 \ENDFOR ! Undefined control sequence. l.4121 \ENDIF ! Undefined control sequence. l.4122 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4123 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4124 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4130--4130 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4130--4130 \T1/ptm/m/it/10 pi_lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 q_lr=0.001\T1/ptm/m /n/10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_st eps=10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 act_noise=0.1\T1/ptm/m/n/10 , [94] [95] [96] Chapter 18. LaTeX Warning: Hyper reference `algorithms/td3:twin-delayed-ddpg' on page 97 un defined on input line 4345. LaTeX Warning: Hyper reference `algorithms/td3:background' on page 97 undefined on input line 4348. LaTeX Warning: Hyper reference `algorithms/td3:quick-facts' on page 97 undefine d on input line 4351. LaTeX Warning: Hyper reference `algorithms/td3:key-equations' on page 97 undefi ned on input line 4354. LaTeX Warning: Hyper reference `algorithms/td3:exploration-vs-exploitation' on page 97 undefined on input line 4357. LaTeX Warning: Hyper reference `algorithms/td3:pseudocode' on page 97 undefined on input line 4360. LaTeX Warning: Hyper reference `algorithms/td3:documentation' on page 97 undefi ned on input line 4365. LaTeX Warning: Hyper reference `algorithms/td3:saved-model-contents' on page 97 undefined on input line 4368. LaTeX Warning: Hyper reference `algorithms/td3:references' on page 97 undefined on input line 4373. LaTeX Warning: Hyper reference `algorithms/td3:relevant-papers' on page 97 unde fined on input line 4376. LaTeX Warning: Hyper reference `algorithms/td3:other-public-implementations' on page 97 undefined on input line 4379. [97] ! Undefined control sequence. ...}L(\phi _1, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4436 },\end{split} ! Undefined control sequence. ...}L(\phi _1, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4436 },\end{split} ! Undefined control sequence. ...}L(\phi _2, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4440 }.\end{split} ! Undefined control sequence. ...}L(\phi _2, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4440 }.\end{split} [98] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4464 ...rithms/td3:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4465 \caption {Twin Delayed DDPG} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4467 \begin{algorithmic} [1] ! Undefined control sequence. l.4468 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4469 \STATE Set target parameters equal to main parameters $\theta_{\t... ! Undefined control sequence. l.4470 \REPEAT ! Undefined control sequence. l.4471 \STATE Observe state $s$ and select action $a = \text{clip}(\... ! Undefined control sequence. l.4472 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4473 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4474 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4475 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4476 \IF {it's time to update} ! Undefined control sequence. l.4477 \FOR {$j$ in range(however many updates)} ! Undefined control sequence. l.4478 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4479 \STATE Compute target actions ! Undefined control sequence. l.4483 \STATE Compute targets ! Undefined control sequence. l.4487 \STATE Update Q-functions by one step of gradient des... ! Undefined control sequence. l.4491 \IF { $j \mod$ \texttt{policy\_delay} $ = 0$} ! Undefined control sequence. l.4492 \STATE Update policy by one step of gradient asce... ! Undefined control sequence. l.4496 \STATE Update target networks with ! Undefined control sequence. l.4501 \ENDIF ! Undefined control sequence. l.4502 \ENDFOR ! Undefined control sequence. l.4503 \ENDIF ! Undefined control sequence. l.4504 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4505 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4506 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4512--4512 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4512--4512 \T1/ptm/m/it/10 pi_lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 q_lr=0.001\T1/ptm/m /n/10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_st eps=10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 act_noise=0.1\T1/ptm/m/n/10 , \T1/ptm /m/it/10 tar- [99] [100] [101] [102] Chapter 19. LaTeX Warning: Hyper reference `algorithms/sac:soft-actor-critic' on page 103 u ndefined on input line 4738. LaTeX Warning: Hyper reference `algorithms/sac:background' on page 103 undefine d on input line 4741. LaTeX Warning: Hyper reference `algorithms/sac:quick-facts' on page 103 undefin ed on input line 4744. LaTeX Warning: Hyper reference `algorithms/sac:key-equations' on page 103 undef ined on input line 4747. LaTeX Warning: Hyper reference `algorithms/sac:entropy-regularized-reinforcemen t-learning' on page 103 undefined on input line 4750. LaTeX Warning: Hyper reference `algorithms/sac:id1' on page 103 undefined on in put line 4753. LaTeX Warning: Hyper reference `algorithms/sac:exploration-vs-exploitation' on page 103 undefined on input line 4758. LaTeX Warning: Hyper reference `algorithms/sac:pseudocode' on page 103 undefine d on input line 4761. LaTeX Warning: Hyper reference `algorithms/sac:documentation' on page 103 undef ined on input line 4766. LaTeX Warning: Hyper reference `algorithms/sac:saved-model-contents' on page 10 3 undefined on input line 4769. LaTeX Warning: Hyper reference `algorithms/sac:references' on page 103 undefine d on input line 4774. LaTeX Warning: Hyper reference `algorithms/sac:relevant-papers' on page 103 und efined on input line 4777. LaTeX Warning: Hyper reference `algorithms/sac:other-public-implementations' on page 103 undefined on input line 4780. [103] ! Undefined control sequence. ...it@tag \begin {split}H(P) = \underE {x \sim P}{-\log P(x)}.\en... l.4827 ...underE{x \sim P}{-\log P(x)}.\end{split} ! Undefined control sequence. ...it@tag \begin {split}H(P) = \underE {x \sim P}{-\log P(x)}.\en... l.4827 ...underE{x \sim P}{-\log P(x)}.\end{split} ! Undefined control sequence. ...}\pi ^* = \arg \max _{\pi } \underE {\tau \sim \pi }{ \sum _{t... l.4831 ...pi(\cdot|s_t)\right) \bigg)},\end{split} ! Undefined control sequence. ...}\pi ^* = \arg \max _{\pi } \underE {\tau \sim \pi }{ \sum _{t... l.4831 ...pi(\cdot|s_t)\right) \bigg)},\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{ \left . ... l.4835 ...ight) \bigg) \right| s_0 = s}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{ \left . ... l.4835 ...ight) \bigg) \right| s_0 = s}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{ \left . ... l.4839 ...ght)\right| s_0 = s, a_0 = a}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{ \left . ... l.4839 ...ght)\right| s_0 = s, a_0 = a}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a \sim \pi }{Q^{\pi }(s,a... l.4843 ...ha H\left(\pi(\cdot|s)\right)\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a \sim \pi }{Q^{\pi }(s,a... l.4843 ...ha H\left(\pi(\cdot|s)\right)\end{split} ! Undefined control sequence. ...gin {split}Q^{\pi }(s,a) &= \underE {s' \sim P \\ a' \sim \pi ... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing } inserted. } l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing { inserted. { l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...s')\right ) \right )} \\ &= \underE {s' \sim P}{R(s,a,s') + \g... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...gin {split}Q^{\pi }(s,a) &= \underE {s' \sim P \\ a' \sim \pi ... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing } inserted. } l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing { inserted. { l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...s')\right ) \right )} \\ &= \underE {s' \sim P}{R(s,a,s') + \g... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} [104] ! Undefined control sequence. ...begin {split}V^{\pi }(s) &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...pi (\cdot |s)\right ) \\ &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...begin {split}V^{\pi }(s) &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...pi (\cdot |s)\right ) \\ &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...it}L(\psi , {\mathcal D}) = \underE {s \sim \mathcal {D} \\ \t... l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing } inserted. } l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing { inserted. { l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Undefined control sequence. ...it}L(\psi , {\mathcal D}) = \underE {s \sim \mathcal {D} \\ \t... l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing } inserted. } l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing { inserted. { l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi }{Q^{\pi }(s,a) - \a... l.4885 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi }{Q^{\pi }(s,a) - \a... l.4885 ...s,a) - \alpha \log \pi(a|s)}.\end{split} [105] ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi _{\theta }}{Q^{\pi _... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...\log \pi _{\theta }(a|s)} = \underE {\xi \sim \mathcal {N}}{Q^... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi _{\theta }}{Q^{\pi _... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...\log \pi _{\theta }(a|s)} = \underE {\xi \sim \mathcal {N}}{Q^... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...egin {split}\max _{\theta } \underE {s \sim \mathcal {D} \\ \x... l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing } inserted. } l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing { inserted. { l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Undefined control sequence. ...egin {split}\max _{\theta } \underE {s \sim \mathcal {D} \\ \x... l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing } inserted. } l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing { inserted. { l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4924 ...rithms/sac:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4925 \caption {Soft Actor-Critic} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4927 \begin{algorithmic} [1] ! Undefined control sequence. l.4928 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4929 \STATE Set target parameters equal to main parameters $\psi_{\tex... ! Undefined control sequence. l.4930 \REPEAT ! Undefined control sequence. l.4931 \STATE Observe state $s$ and select action $a \sim \pi_{\thet... ! Undefined control sequence. l.4932 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4933 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4934 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4935 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4936 \IF {it's time to update} ! Undefined control sequence. l.4937 \FOR {$j$ in range(however many updates)} ! Undefined control sequence. l.4938 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4939 \STATE Compute targets for Q and V functions: [106] ! Undefined control sequence. l.4944 \STATE Update Q-functions by one step of gradient des... ! Undefined control sequence. l.4948 \STATE Update V-function by one step of gradient desc... ! Undefined control sequence. l.4952 \STATE Update policy by one step of gradient ascent u... ! Undefined control sequence. l.4957 \STATE Update target value network with ! Undefined control sequence. l.4961 \ENDFOR ! Undefined control sequence. l.4962 \ENDIF ! Undefined control sequence. l.4963 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4964 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4965 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4971--4971 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4971--4971 \T1/ptm/m/it/10 lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 al-pha=0.2\T1/ptm/m/n/ 10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_steps =10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 max_ep_len=1000\T1/ptm/m/n/10 , \T1/ptm/ m/it/10 log- [107] Overfull \vbox (77.80809pt too high) has occurred while \output is active [108] [109] [110] Chapter 20. LaTeX Warning: Hyper reference `utils/logger:logger' on page 111 undefined on i nput line 5235. LaTeX Warning: Hyper reference `utils/logger:using-a-logger' on page 111 undefi ned on input line 5238. LaTeX Warning: Hyper reference `utils/logger:examples' on page 111 undefined on input line 5241. LaTeX Warning: Hyper reference `utils/logger:logging-and-mpi' on page 111 undef ined on input line 5244. LaTeX Warning: Hyper reference `utils/logger:logger-classes' on page 111 undefi ned on input line 5249. LaTeX Warning: Hyper reference `utils/logger:loading-saved-graphs' on page 111 undefined on input line 5252. [111] [112] [113] [114] LaTeX Warning: Hyper reference `utils/logger:spinup.utils.logx.Logger' on page 115 undefined on input line 5550. [115] [116] Chapter 21. [117] [118] Chapter 22. LaTeX Warning: Hyper reference `utils/mpi:mpi-tools' on page 119 undefined on i nput line 5695. LaTeX Warning: Hyper reference `utils/mpi:module-spinup.utils.mpi_tools' on pag e 119 undefined on input line 5698. LaTeX Warning: Hyper reference `utils/mpi:mpi-tensorflow-utilities' on page 119 undefined on input line 5701. [119] [120] Chapter 23. LaTeX Warning: Hyper reference `utils/run_utils:run-utils' on page 121 undefine d on input line 5828. LaTeX Warning: Hyper reference `utils/run_utils:experimentgrid' on page 121 und efined on input line 5831. LaTeX Warning: Hyper reference `utils/run_utils:calling-experiments' on page 12 1 undefined on input line 5834. [121] Underfull \hbox (badness 10000) in paragraph at lines 5986--5986 []\T1/ptm/m/it/10 exp_name\T1/ptm/m/n/10 , \T1/ptm/m/it/10 thunk\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , \T1/ptm/m/it/10 num_cpu=1\T1/ptm/m/n/1 0 , [122] [123] [124] Chapter 24. [125] [126] Chapter 25. [127] [128] Chapter 26. [129] [130] LaTeX Warning: Reference `utils/mpi:module-spinup.utils.mpi_tf' on page 131 und efined on input line 6119. LaTeX Warning: Reference `utils/mpi:module-spinup.utils.mpi_tools' on page 131 undefined on input line 6120. [131] No file SpinningUp.ind. (./SpinningUp.aux) Package rerunfilecheck Warning: File `SpinningUp.out' has changed. (rerunfilecheck) Rerun to get outlines right (rerunfilecheck) or use package `bookmark'. LaTeX Warning: There were undefined references. LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. ) (see the transcript file for additional information){/usr/share/texlive/texmf-d ist/fonts/enc/dvips/base/8r.enc}< /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr5.pfb> Output written on SpinningUp.pdf (135 pages, 1117443 bytes). Transcript written on SpinningUp.log. [rtd-command-info] start-time: 2019-06-28T21:47:09.264321Z, end-time: 2019-06-28T21:47:09.626052Z, duration: 0, exit-code: 0 makeindex -s python.ist SpinningUp.idx This is makeindex, version 2.15 [TeX Live 2017] (kpathsea + Thai support). Scanning style file ./python.ist.......done (7 attributes redefined, 0 ignored). Scanning input file SpinningUp.idx....done (78 entries accepted, 0 rejected). Sorting entries....done (506 comparisons). Generating output file SpinningUp.ind....done (144 lines written, 0 warnings). Output written in SpinningUp.ind. Transcript written in SpinningUp.ilg. [rtd-command-info] start-time: 2019-06-28T21:47:09.715433Z, end-time: 2019-06-28T21:47:11.135944Z, duration: 1, exit-code: 0 pdflatex -interaction=nonstopmode /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/_build/latex/SpinningUp.tex This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex) restricted \write18 enabled. entering extended mode (/home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/c heckouts/latest/docs/_build/latex/SpinningUp.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 84 language(s) loaded. (./sphinxmanual.cls Document Class: sphinxmanual 2017/03/26 v1.5.4 Document class (Sphinx manual) (/usr/share/texlive/texmf-dist/tex/latex/base/report.cls Document Class: report 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo))) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/utf8.def (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/ot1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/omsenc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/cmap/cmap.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/fontenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.def)<>) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/generic/babel/babel.sty (/usr/share/texlive/texmf-dist/tex/generic/babel/switch.def) (/usr/share/texlive/texmf-dist/tex/generic/babel-english/english.ldf (/usr/share/texlive/texmf-dist/tex/generic/babel/babel.def (/usr/share/texlive/texmf-dist/tex/generic/babel/txtbabel.def)))) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/times.sty) (/usr/share/texlive/texmf-dist/tex/latex/fncychap/fncychap.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/longtable.sty) (./sphinx.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphicx.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty) (/usr/share/texlive/texmf-dist/tex/latex/graphics/graphics.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/trig.sty) (/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/graphics.cfg) (/usr/share/texlive/texmf-dist/tex/latex/graphics-def/pdftex.def))) (/usr/share/texlive/texmf-dist/tex/latex/fancyhdr/fancyhdr.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/textcomp.sty (/usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.def (/usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/titlesec/titlesec.sty) (/usr/share/texlive/texmf-dist/tex/latex/tabulary/tabulary.sty (/usr/share/texlive/texmf-dist/tex/latex/tools/array.sty)) (/usr/share/texlive/texmf-dist/tex/latex/base/makeidx.sty) (/usr/share/texlive/texmf-dist/tex/latex/framed/framed.sty) (/usr/share/texlive/texmf-dist/tex/latex/xcolor/xcolor.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/color.cfg)) (/usr/share/texlive/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix <2008/02/07> (tvz)) (/usr/share/texlive/texmf-dist/tex/latex/threeparttable/threeparttable.sty) (./footnotehyper-sphinx.sty (/usr/share/texlive/texmf-dist/tex/latex/mdwtools/footnote.sty)) (/usr/share/texlive/texmf-dist/tex/latex/float/float.sty) (/usr/share/texlive/texmf-dist/tex/latex/wrapfig/wrapfig.sty) (/usr/share/texlive/texmf-dist/tex/latex/parskip/parskip.sty) (/usr/share/texlive/texmf-dist/tex/latex/base/alltt.sty) (/usr/share/texlive/texmf-dist/tex/latex/upquote/upquote.sty) (/usr/share/texlive/texmf-dist/tex/latex/capt-of/capt-of.sty) (./needspace.sty) (./sphinxhighlight.sty) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/kvoptions.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ltxcmds.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/kvsetkeys.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/infwarerr.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/etexcmds.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifluatex.sty)))) (/usr/share/texlive/texmf-dist/tex/generic/pdftex/pdfcolor.tex) ** (sphinx) defining (legacy) text style macros without \sphinx prefix ** if clashes with packages, set latex_keep_old_macro_names=False in conf.py ) (/usr/share/texlive/texmf-dist/tex/latex/geometry/geometry.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifpdf.sty) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifvtex.sty) (/usr/share/texlive/texmf-dist/tex/generic/ifxetex/ifxetex.sty)) (/usr/share/texlive/texmf-dist/tex/latex/multirow/multirow.sty) (/usr/share/texlive/texmf-dist/tex/latex/eqparbox/eqparbox.sty (/usr/share/texlive/texmf-dist/tex/latex/environ/environ.sty (/usr/share/texlive/texmf-dist/tex/latex/trimspaces/trimspaces.sty))) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/hyperref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-hyperref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-generic.sty)) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/auxhook.sty) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/pd1enc.def) (/usr/share/texlive/texmf-dist/tex/latex/latexconfig/hyperref.cfg) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/puenc.def) (/usr/share/texlive/texmf-dist/tex/latex/url/url.sty)) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/hpdftex.def (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/rerunfilecheck.sty)) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/hypcap.sty) Writing index file SpinningUp.idx (./SpinningUp.aux LaTeX Warning: Label `alg1' multiply defined. LaTeX Warning: Label `alg1' multiply defined. LaTeX Warning: Label `alg1' multiply defined. LaTeX Warning: Label `alg1' multiply defined. LaTeX Warning: Label `alg1' multiply defined. ) (/usr/share/texlive/texmf-dist/tex/latex/base/ts1cmr.fd) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1ptm.fd) (/usr/share/texlive/texmf-dist/tex/context/base/mkii/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] ) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/epstopdf-base.sty (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/grfext.sty) (/usr/share/texlive/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg)) *geometry* driver: auto-detecting *geometry* detected driver: pdftex (/usr/share/texlive/texmf-dist/tex/latex/hyperref/nameref.sty (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/gettitlestring.sty)) (./SpinningUp.out) (./SpinningUp.out) (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1phv.fd)<><><><> (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd) [1{/var/lib/texmf/fo nts/map/pdftex/updmap/pdftex.map}] [2] (./SpinningUp.toc [1] [2]) [3] [4] (/usr/share/texlive/texmf-dist/tex/latex/psnfss/t1pcr.fd) [1 <./spinning-up-in- rl.png>] [2] Chapter 1. (/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1ptm.fd) [3] [4] [5] [6] Chapter 2. [7] [8] [9] [10] Chapter 3. [11] [12] [13] [14] Chapter 4. [15] [16] [17] [18] (/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1pcr.fd) [19] [20] Chapter 5. [21] [22] [23] [24] [25] [26] Chapter 6. [27] [28] Chapter 7. [29] [30 <./rl_diagram_transparent_bg.png>] [31] [32] [33] ! Undefined control sequence. ... } P(\tau |\pi ) R(\tau ) = \underE {\tau \sim \pi }{R(\tau )}... l.1531 ...nderE{\tau\sim \pi}{R(\tau)}.\end{split} ! Undefined control sequence. ... } P(\tau |\pi ) R(\tau ) = \underE {\tau \sim \pi }{R(\tau )}... l.1531 ...nderE{\tau\sim \pi}{R(\tau)}.\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{R(\tau )\... l.1550 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{R(\tau )\... l.1550 ...R(\tau)\left| s_0 = s\right.}\end{split} [34] ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{R(\tau )\... l.1557 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{R(\tau )\... l.1557 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...split}V^*(s) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1564 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...split}V^*(s) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1564 ...R(\tau)\left| s_0 = s\right.}\end{split} ! Undefined control sequence. ...lit}Q^*(s,a) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1571 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...lit}Q^*(s,a) = \max _{\pi } \underE {\tau \sim \pi }{R(\tau )\... l.1571 ...eft| s_0 = s, a_0 = a\right.}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a\sim \pi }{Q^{\pi }(s,a)... l.1585 ...erE{a\sim \pi}{Q^{\pi}(s,a)},\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a\sim \pi }{Q^{\pi }(s,a)... l.1585 ...erE{a\sim \pi}{Q^{\pi}(s,a)},\end{split} [35] ! Undefined control sequence. V^{\pi }(s) &= \underE {a \sim \pi \\ s'\sim P}{r(s,a) + \gamma ... l.1618 \end{align*} ! Missing } inserted. } l.1618 \end{align*} ! Missing { inserted. { l.1618 \end{align*} ! Undefined control sequence. ...}(s')}, \\ Q^{\pi }(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1618 \end{align*} ! Undefined control sequence. ... {s'\sim P}{r(s,a) + \gamma \underE {a'\sim \pi }{Q^{\pi }(s',... l.1618 \end{align*} ! Undefined control sequence. V^{\pi }(s) &= \underE {a \sim \pi \\ s'\sim P}{r(s,a) + \gamma ... l.1618 \end{align*} ! Missing } inserted. } l.1618 \end{align*} ! Missing \endgroup inserted. \endgroup l.1618 \end{align*} ! Misplaced \omit. \math@cr@@@ ...@ \@ne \add@amps \maxfields@ \omit \kern -\alignsep@ \iftag@ ... l.1618 \end{align*} ! Missing { inserted. { l.1618 \end{align*} ! Undefined control sequence. ...}(s')}, \\ Q^{\pi }(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1618 \end{align*} ! Undefined control sequence. ... {s'\sim P}{r(s,a) + \gamma \underE {a'\sim \pi }{Q^{\pi }(s',... l.1618 \end{align*} ! Undefined control sequence. V^*(s) &= \max _a \underE {s'\sim P}{r(s,a) + \gamma V^*(s')}, \... l.1625 \end{align*} ! Undefined control sequence. ...ma V^*(s')}, \\ Q^*(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1625 \end{align*} ! Undefined control sequence. V^*(s) &= \max _a \underE {s'\sim P}{r(s,a) + \gamma V^*(s')}, \... l.1625 \end{align*} ! Undefined control sequence. ...ma V^*(s')}, \\ Q^*(s,a) &= \underE {s'\sim P}{r(s,a) + \gamma... l.1625 \end{align*} [36] [37] [38] Chapter 8. ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.1699 ...ncludegraphics{{rl_algorithms_9_15}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.1699 ...ncludegraphics{{rl_algorithms_9_15}.svg} [39] [40] [41] [42] Chapter 9. ! Undefined control sequence. l.1890 ...ected return \(J(\pi_{\theta}) = \underE {\tau \sim \pi_{\theta}}{R... [43] ! Undefined control sequence. ...} \log P(\tau | \theta ) &= \cancel {\nabla _{\theta } \log \r... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} + \sum _{t=0}^{T} \bigg ( \cancel {\nabla _{\theta } \log P(... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} \log P(\tau | \theta ) &= \cancel {\nabla _{\theta } \log \r... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...} + \sum _{t=0}^{T} \bigg ( \cancel {\nabla _{\theta } \log P(... l.1921 ... \log \pi_{\theta}(a_t |s_t).\end{split} ! Undefined control sequence. ...eta }) &= \nabla _{\theta } \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...Log-derivative trick} \\ &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...heta } J(\pi _{\theta }) &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...eta }) &= \nabla _{\theta } \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...Log-derivative trick} \\ &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} ! Undefined control sequence. ...heta } J(\pi _{\theta }) &= \underE {\tau \sim \pi _{\theta }}... l.1933 \end{align*} \end{sphinxadmonition} [44] [45] [46] [47] ! Undefined control sequence. \split@tag \begin {split}\underE {x \sim P_{\theta }}{\nabla _{\t... l.2082 ...eta} \log P_{\theta}(x)} = 0.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {x \sim P_{\theta }}{\nabla _{\t... l.2082 ...eta} \log P_{\theta}(x)} = 0.\end{split} ! Undefined control sequence. ...eta }(x) \\ \therefore 0 &= \underE {x \sim P_{\theta }}{\nabl... l.2099 ...{\theta} \log P_{\theta}(x)}.\end{split} ! Undefined control sequence. ...eta }(x) \\ \therefore 0 &= \underE {x \sim P_{\theta }}{\nabl... l.2099 ...{\theta} \log P_{\theta}(x)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2107 ..._{\theta}(a_t |s_t) R(\tau)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2107 ..._{\theta}(a_t |s_t) R(\tau)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2115 ...R(s_{t'}, a_{t'}, s_{t'+1})}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2115 ...R(s_{t'}, a_{t'}, s_{t'+1})}.\end{split} [48] ! Undefined control sequence. \split@tag \begin {split}\underE {a_t \sim \pi _{\theta }}{\nabla... l.2167 ...\theta}(a_t|s_t) b(s_t)} = 0.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a_t \sim \pi _{\theta }}{\nabla... l.2167 ...\theta}(a_t|s_t) b(s_t)} = 0.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2171 ..., s_{t'+1}) - b(s_t)\right)}.\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2171 ..., s_{t'+1}) - b(s_t)\right)}.\end{split} [49] ! Undefined control sequence. ...phi _k = \arg \min _{\phi } \underE {s_t, \hat {R}_t \sim \pi ... l.2185 ...(s_t) - \hat{R}_t \right)^2},\end{split} ! Undefined control sequence. ...phi _k = \arg \min _{\phi } \underE {s_t, \hat {R}_t \sim \pi ... l.2185 ...(s_t) - \hat{R}_t \right)^2},\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2199 ...i_{\theta}(a_t |s_t) \Phi_t},\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2199 ...i_{\theta}(a_t |s_t) \Phi_t},\end{split} [50] [51] [52] Chapter 10. [53] [54] [55] [56] [57] [58] Chapter 11. [59] [60] Overfull \vbox (103.35579pt too high) has occurred while \output is active [61] [62] Chapter 12. [63] [64] [65] [66] Chapter 13. ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2700 ...includegraphics{{bench_halfcheetah}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2700 ...includegraphics{{bench_halfcheetah}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2709 ...phinxincludegraphics{{bench_hopper}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2709 ...phinxincludegraphics{{bench_hopper}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2718 ...phinxincludegraphics{{bench_walker}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2718 ...phinxincludegraphics{{bench_walker}.svg} [67] ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2727 ...\sphinxincludegraphics{{bench_swim}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2727 ...\sphinxincludegraphics{{bench_swim}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2736 ...t\sphinxincludegraphics{{bench_ant}.svg} ! LaTeX Error: Unknown graphics extension: .svg. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2736 ...t\sphinxincludegraphics{{bench_ant}.svg} [68] [69] [70] Chapter 14. [71] ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2833 },\end{split} ! Undefined control sequence. ...theta } J(\pi _{\theta }) = \underE {\tau \sim \pi _{\theta }}... l.2833 },\end{split} ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2850 ...rithms/vpg:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2851 \caption {Vanilla Policy Gradient Algorithm} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2853 \begin{algorithmic} [1] ! Undefined control sequence. l.2854 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.2855 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.2856 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.2857 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.2858 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.2859 \STATE Estimate policy gradient as ! Undefined control sequence. l.2863 \STATE Compute policy update, either using standard gradient ascent, ! Undefined control sequence. l.2868 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.2873 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2874 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.2875 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 2881--2881 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 2881--2881 \T1/ptm/m/it/10 steps_per_epoch=4000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 epochs=50\ T1/ptm/m/n/10 , \T1/ptm/m/it/10 gamma=0.99\T1/ptm/m/n/10 , \T1/ptm/m/it/10 pi_l r=0.0003\T1/ptm/m/n/10 , \T1/ptm/m/it/10 vf_lr=0.001\T1/ptm/m/n/10 , [72] [73] [74] [75] [76] Chapter 15. [77] ! Undefined control sequence. ...al L}(\theta _k, \theta ) = \underE {s,a \sim \pi _{\theta _k}... l.3162 },\end{split} ! Undefined control sequence. ...al L}(\theta _k, \theta ) = \underE {s,a \sim \pi _{\theta _k}... l.3162 },\end{split} ! Undefined control sequence. ...{KL}(\theta || \theta _k) = \underE {s \sim \pi _{\theta _k}}{... l.3168 }.\end{split} ! Undefined control sequence. ...{KL}(\theta || \theta _k) = \underE {s \sim \pi _{\theta _k}}{... l.3168 }.\end{split} [78] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3217 ...ithms/trpo:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3218 \caption {Trust Region Policy Optimization} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3220 \begin{algorithmic} [1] ! Undefined control sequence. l.3221 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.3222 \STATE Hyperparameters: KL-divergence limit $\delta$, backtrackin... ! Undefined control sequence. l.3223 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.3224 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.3225 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.3226 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.3227 \STATE Estimate policy gradient as ! Undefined control sequence. l.3231 \STATE Use the conjugate gradient algorithm to compute ! Undefined control sequence. l.3236 \STATE Update the policy by backtracking line search with [79] ! Undefined control sequence. l.3241 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.3246 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3247 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3248 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 steps_per_epoch=4000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 epochs=50\ T1/ptm/m/n/10 , \T1/ptm/m/it/10 gamma=0.99\T1/ptm/m/n/10 , \T1/ptm/m/it/10 delt a=0.01\T1/ptm/m/n/10 , \T1/ptm/m/it/10 vf_lr=0.001\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 train_v_iters=80\T1/ptm/m/n/10 , \T1/ptm/m/it/10 damp-ing_coeff =0.1\T1/ptm/m/n/10 , \T1/ptm/m/it/10 cg_iters=10\T1/ptm/m/n/10 , \T1/ptm/m/it/1 0 back-track_iters=10\T1/ptm/m/n/10 , \T1/ptm/m/it/10 back- Underfull \hbox (badness 10000) in paragraph at lines 3254--3254 \T1/ptm/m/it/10 track_coeff=0.8\T1/ptm/m/n/10 , \T1/ptm/m/it/10 lam=0.97\T1/ptm /m/n/10 , \T1/ptm/m/it/10 max_ep_len=1000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 log-g er_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 save_freq=10\T1/ptm/m/n/10 , [80] Overfull \vbox (100.02797pt too high) has occurred while \output is active [81] [82] [83] [84] Chapter 16. [85] [86] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3674 ...rithms/ppo:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3675 \caption {PPO-Clip} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3677 \begin{algorithmic} [1] ! Undefined control sequence. l.3678 \STATE Input: initial policy parameters $\theta_0$, initial value... ! Undefined control sequence. l.3679 \FOR {$k = 0,1,2,...$} ! Undefined control sequence. l.3680 \STATE Collect set of trajectories ${\mathcal D}_k = \{\tau_i\}$ ... ! Undefined control sequence. l.3681 \STATE Compute rewards-to-go $\hat{R}_t$. ! Undefined control sequence. l.3682 \STATE Compute advantage estimates, $\hat{A}_t$ (using any method... ! Undefined control sequence. l.3683 \STATE Update the policy by maximizing the PPO-Clip objective: ! Undefined control sequence. l.3691 \STATE Fit value function by regression on mean-squared error: ! Undefined control sequence. l.3696 \ENDFOR ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3697 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.3698 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 3704--3704 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , [87] [88] [89] [90] Chapter 17. [91] [92] [93] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4088 ...ithms/ddpg:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4089 \caption {Deep Deterministic Policy Gradient} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4091 \begin{algorithmic} [1] ! Undefined control sequence. l.4092 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4093 \STATE Set target parameters equal to main parameters $\theta_{\t... ! Undefined control sequence. l.4094 \REPEAT ! Undefined control sequence. l.4095 \STATE Observe state $s$ and select action $a = \text{clip}(\... ! Undefined control sequence. l.4096 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4097 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4098 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4099 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4100 \IF {it's time to update} ! Undefined control sequence. l.4101 \FOR {however many updates} ! Undefined control sequence. l.4102 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4103 \STATE Compute targets ! Undefined control sequence. l.4107 \STATE Update Q-function by one step of gradient desc... ! Undefined control sequence. l.4111 \STATE Update policy by one step of gradient ascent u... ! Undefined control sequence. l.4115 \STATE Update target networks with ! Undefined control sequence. l.4120 \ENDFOR ! Undefined control sequence. l.4121 \ENDIF ! Undefined control sequence. l.4122 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4123 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4124 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4130--4130 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4130--4130 \T1/ptm/m/it/10 pi_lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 q_lr=0.001\T1/ptm/m /n/10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_st eps=10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 act_noise=0.1\T1/ptm/m/n/10 , [94] [95] [96] Chapter 18. [97] ! Undefined control sequence. ...}L(\phi _1, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4436 },\end{split} ! Undefined control sequence. ...}L(\phi _1, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4436 },\end{split} ! Undefined control sequence. ...}L(\phi _2, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4440 }.\end{split} ! Undefined control sequence. ...}L(\phi _2, {\mathcal D}) = \underE {(s,a,r,s',d) \sim {\mathc... l.4440 }.\end{split} [98] ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4464 ...rithms/td3:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4465 \caption {Twin Delayed DDPG} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4467 \begin{algorithmic} [1] ! Undefined control sequence. l.4468 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4469 \STATE Set target parameters equal to main parameters $\theta_{\t... ! Undefined control sequence. l.4470 \REPEAT ! Undefined control sequence. l.4471 \STATE Observe state $s$ and select action $a = \text{clip}(\... ! Undefined control sequence. l.4472 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4473 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4474 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4475 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4476 \IF {it's time to update} ! Undefined control sequence. l.4477 \FOR {$j$ in range(however many updates)} ! Undefined control sequence. l.4478 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4479 \STATE Compute target actions ! Undefined control sequence. l.4483 \STATE Compute targets ! Undefined control sequence. l.4487 \STATE Update Q-functions by one step of gradient des... ! Undefined control sequence. l.4491 \IF { $j \mod$ \texttt{policy\_delay} $ = 0$} ! Undefined control sequence. l.4492 \STATE Update policy by one step of gradient asce... ! Undefined control sequence. l.4496 \STATE Update target networks with ! Undefined control sequence. l.4501 \ENDIF ! Undefined control sequence. l.4502 \ENDFOR ! Undefined control sequence. l.4503 \ENDIF ! Undefined control sequence. l.4504 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4505 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4506 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4512--4512 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4512--4512 \T1/ptm/m/it/10 pi_lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 q_lr=0.001\T1/ptm/m /n/10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_st eps=10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 act_noise=0.1\T1/ptm/m/n/10 , \T1/ptm /m/it/10 tar- [99] [100] [101] [102] Chapter 19. [103] ! Undefined control sequence. ...it@tag \begin {split}H(P) = \underE {x \sim P}{-\log P(x)}.\en... l.4827 ...underE{x \sim P}{-\log P(x)}.\end{split} ! Undefined control sequence. ...it@tag \begin {split}H(P) = \underE {x \sim P}{-\log P(x)}.\en... l.4827 ...underE{x \sim P}{-\log P(x)}.\end{split} ! Undefined control sequence. ...}\pi ^* = \arg \max _{\pi } \underE {\tau \sim \pi }{ \sum _{t... l.4831 ...pi(\cdot|s_t)\right) \bigg)},\end{split} ! Undefined control sequence. ...}\pi ^* = \arg \max _{\pi } \underE {\tau \sim \pi }{ \sum _{t... l.4831 ...pi(\cdot|s_t)\right) \bigg)},\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{ \left . ... l.4835 ...ight) \bigg) \right| s_0 = s}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {\tau \sim \pi }{ \left . ... l.4835 ...ight) \bigg) \right| s_0 = s}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{ \left . ... l.4839 ...ght)\right| s_0 = s, a_0 = a}\end{split} ! Undefined control sequence. ...egin {split}Q^{\pi }(s,a) = \underE {\tau \sim \pi }{ \left . ... l.4839 ...ght)\right| s_0 = s, a_0 = a}\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a \sim \pi }{Q^{\pi }(s,a... l.4843 ...ha H\left(\pi(\cdot|s)\right)\end{split} ! Undefined control sequence. ...\begin {split}V^{\pi }(s) = \underE {a \sim \pi }{Q^{\pi }(s,a... l.4843 ...ha H\left(\pi(\cdot|s)\right)\end{split} ! Undefined control sequence. ...gin {split}Q^{\pi }(s,a) &= \underE {s' \sim P \\ a' \sim \pi ... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing } inserted. } l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing { inserted. { l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...s')\right ) \right )} \\ &= \underE {s' \sim P}{R(s,a,s') + \g... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...gin {split}Q^{\pi }(s,a) &= \underE {s' \sim P \\ a' \sim \pi ... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing } inserted. } l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Missing { inserted. { l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} ! Undefined control sequence. ...s')\right ) \right )} \\ &= \underE {s' \sim P}{R(s,a,s') + \g... l.4848 ...,a,s') + \gamma V^{\pi}(s')}.\end{split} [104] ! Undefined control sequence. ...begin {split}V^{\pi }(s) &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...pi (\cdot |s)\right ) \\ &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...begin {split}V^{\pi }(s) &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...pi (\cdot |s)\right ) \\ &= \underE {a \sim \pi }{Q^{\pi }(s,a... l.4871 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. ...it}L(\psi , {\mathcal D}) = \underE {s \sim \mathcal {D} \\ \t... l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing } inserted. } l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing { inserted. { l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Undefined control sequence. ...it}L(\psi , {\mathcal D}) = \underE {s \sim \mathcal {D} \\ \t... l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing } inserted. } l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Missing { inserted. { l.4879 ...tilde{a}|s) \right)\Bigg)^2}.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi }{Q^{\pi }(s,a) - \a... l.4885 ...s,a) - \alpha \log \pi(a|s)}.\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi }{Q^{\pi }(s,a) - \a... l.4885 ...s,a) - \alpha \log \pi(a|s)}.\end{split} [105] ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi _{\theta }}{Q^{\pi _... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...\log \pi _{\theta }(a|s)} = \underE {\xi \sim \mathcal {N}}{Q^... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. \split@tag \begin {split}\underE {a \sim \pi _{\theta }}{Q^{\pi _... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...\log \pi _{\theta }(a|s)} = \underE {\xi \sim \mathcal {N}}{Q^... l.4902 ...\tilde{a}_{\theta}(s,\xi)|s)}\end{split} ! Undefined control sequence. ...egin {split}\max _{\theta } \underE {s \sim \mathcal {D} \\ \x... l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing } inserted. } l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing { inserted. { l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Undefined control sequence. ...egin {split}\max _{\theta } \underE {s \sim \mathcal {D} \\ \x... l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing } inserted. } l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! Missing { inserted. { l.4906 ...tilde{a}_{\theta}(s,\xi)|s)},\end{split} ! LaTeX Error: Environment algorithm undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4924 ...rithms/sac:pseudocode}}\begin{algorithm} [H] ! LaTeX Error: \caption outside float. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4925 \caption {Soft Actor-Critic} ! LaTeX Error: Environment algorithmic undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4927 \begin{algorithmic} [1] ! Undefined control sequence. l.4928 \STATE Input: initial policy parameters $\theta$, Q-function para... ! Undefined control sequence. l.4929 \STATE Set target parameters equal to main parameters $\psi_{\tex... ! Undefined control sequence. l.4930 \REPEAT ! Undefined control sequence. l.4931 \STATE Observe state $s$ and select action $a \sim \pi_{\thet... ! Undefined control sequence. l.4932 \STATE Execute $a$ in the environment ! Undefined control sequence. l.4933 \STATE Observe next state $s'$, reward $r$, and done signal $... ! Undefined control sequence. l.4934 \STATE Store $(s,a,r,s',d)$ in replay buffer $\mathcal{D}$ ! Undefined control sequence. l.4935 \STATE If $s'$ is terminal, reset environment state. ! Undefined control sequence. l.4936 \IF {it's time to update} ! Undefined control sequence. l.4937 \FOR {$j$ in range(however many updates)} ! Undefined control sequence. l.4938 \STATE Randomly sample a batch of transitions, $B = \... ! Undefined control sequence. l.4939 \STATE Compute targets for Q and V functions: [106] ! Undefined control sequence. l.4944 \STATE Update Q-functions by one step of gradient des... ! Undefined control sequence. l.4948 \STATE Update V-function by one step of gradient desc... ! Undefined control sequence. l.4952 \STATE Update policy by one step of gradient ascent u... ! Undefined control sequence. l.4957 \STATE Update target value network with ! Undefined control sequence. l.4961 \ENDFOR ! Undefined control sequence. l.4962 \ENDIF ! Undefined control sequence. l.4963 \UNTIL {convergence} ! LaTeX Error: \begin{document} ended by \end{algorithmic}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4964 \end{algorithmic} ! LaTeX Error: \begin{document} ended by \end{algorithm}. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.4965 \end{algorithm} Underfull \hbox (badness 10000) in paragraph at lines 4971--4971 []\T1/ptm/m/it/10 env_fn\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac-tor_critic=\T1/ptm/m/n/10 , \T1/ptm/m/it/10 ac_kwargs={}\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , Underfull \hbox (badness 10000) in paragraph at lines 4971--4971 \T1/ptm/m/it/10 lr=0.001\T1/ptm/m/n/10 , \T1/ptm/m/it/10 al-pha=0.2\T1/ptm/m/n/ 10 , \T1/ptm/m/it/10 batch_size=100\T1/ptm/m/n/10 , \T1/ptm/m/it/10 start_steps =10000\T1/ptm/m/n/10 , \T1/ptm/m/it/10 max_ep_len=1000\T1/ptm/m/n/10 , \T1/ptm/ m/it/10 log- [107] Overfull \vbox (77.80809pt too high) has occurred while \output is active [108] [109] [110] Chapter 20. [111] [112] [113] [114] [115] [116] Chapter 21. [117] [118] Chapter 22. [119] [120] Chapter 23. [121] Underfull \hbox (badness 10000) in paragraph at lines 5986--5986 []\T1/ptm/m/it/10 exp_name\T1/ptm/m/n/10 , \T1/ptm/m/it/10 thunk\T1/ptm/m/n/10 , \T1/ptm/m/it/10 seed=0\T1/ptm/m/n/10 , \T1/ptm/m/it/10 num_cpu=1\T1/ptm/m/n/1 0 , [122] [123] [124] Chapter 24. [125] [126] Chapter 25. [127] [128] Chapter 26. [129] [130] [131] (./SpinningUp.ind [132] Underfull \hbox (badness 7522) in paragraph at lines 47--48 []\T1/ptm/m/n/10 add() (spinup.utils.run_utils.ExperimentGrid method), Overfull \hbox (5.61969pt too wide) in paragraph at lines 48--49 []\T1/ptm/m/n/10 apply_gradients() (spinup.utils.mpi_tf.MpiAdamOptimizer Overfull \hbox (17.83952pt too wide) in paragraph at lines 74--75 []\T1/ptm/m/n/10 compute_gradients() (spinup.utils.mpi_tf.MpiAdamOptimizer [133] Underfull \hbox (badness 10000) in paragraph at lines 103--104 []\T1/ptm/m/n/10 mpi_statistics_scalar() (in mod-ule Underfull \hbox (badness 10000) in paragraph at lines 119--120 []\T1/ptm/m/n/10 run() (spinup.utils.run_utils.ExperimentGrid method), Underfull \hbox (badness 10000) in paragraph at lines 140--141 []\T1/ptm/m/n/10 variant_name() (spinup.utils.run_utils.ExperimentGrid Underfull \hbox (badness 10000) in paragraph at lines 141--142 []\T1/ptm/m/n/10 variants() (spinup.utils.run_utils.ExperimentGrid [134]) (./SpinningUp.aux) Package rerunfilecheck Warning: File `SpinningUp.out' has changed. (rerunfilecheck) Rerun to get outlines right (rerunfilecheck) or use package `bookmark'. LaTeX Warning: There were multiply-defined labels. ) (see the transcript file for additional information){/usr/share/texlive/texmf-d ist/fonts/enc/dvips/base/8r.enc}< /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr5.pfb> Output written on SpinningUp.pdf (140 pages, 1145556 bytes). Transcript written on SpinningUp.log. [rtd-command-info] start-time: 2019-06-28T21:47:11.242677Z, end-time: 2019-06-28T21:47:11.552006Z, duration: 0, exit-code: 0 mv -f /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/_build/latex/SpinningUp.pdf /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/artifacts/latest/sphinx_pdf/openai-education-spinningup.pdf [rtd-command-info] start-time: 2019-06-28T21:47:11.639811Z, end-time: 2019-06-28T21:48:34.720138Z, duration: 83, exit-code: 0 python sphinx-build -T -b epub -d _build/doctrees-epub -D language=en . _build/epub Running Sphinx v1.5.6 making output directory... WARNING: Logging before flag parsing goes to stderr. W0628 21:47:14.180595 139981803151488 deprecation_wrapper.py:119] From /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/spinup/utils/mpi_tf.py:29: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. loading translations [en]... done loading pickled environment... done building [mo]: targets for 0 po files that are out of date building [epub]: targets for 31 source files that are out of date updating environment: 0 added, 2 changed, 0 removed reading sources... [ 50%] user/algorithms reading sources... [100%] user/installation looking for now-outdated files... none found pickling environment... done checking consistency... /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_1_soln.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/exercise2_2_soln.rst:: WARNING: document isn't included in any toctree done /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof1.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/extra_pg_proof2.rst:: WARNING: document isn't included in any toctree /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/spinningup/rl_intro4.rst:: WARNING: document isn't included in any toctree preparing documents... done writing output... [ 3%] algorithms/ddpg writing output... [ 6%] algorithms/ppo writing output... [ 9%] algorithms/sac writing output... [ 12%] algorithms/td3 writing output... [ 16%] algorithms/trpo writing output... [ 19%] algorithms/vpg writing output... [ 22%] etc/acknowledgements writing output... [ 25%] etc/author writing output... [ 29%] index writing output... [ 32%] spinningup/bench writing output... [ 35%] spinningup/exercise2_1_soln writing output... [ 38%] spinningup/exercise2_2_soln writing output... [ 41%] spinningup/exercises writing output... [ 45%] spinningup/extra_pg_proof1 writing output... [ 48%] spinningup/extra_pg_proof2 writing output... [ 51%] spinningup/keypapers writing output... [ 54%] spinningup/rl_intro writing output... [ 58%] spinningup/rl_intro2 writing output... [ 61%] spinningup/rl_intro3 writing output... [ 64%] spinningup/rl_intro4 writing output... [ 67%] spinningup/spinningup writing output... [ 70%] user/algorithms writing output... [ 74%] user/installation writing output... [ 77%] user/introduction writing output... [ 80%] user/plotting writing output... [ 83%] user/running writing output... [ 87%] user/saving_and_loading writing output... [ 90%] utils/logger writing output... [ 93%] utils/mpi writing output... [ 96%] utils/plotter writing output... [100%] utils/run_utils generating indices... genindex py-modindex writing additional pages... copying images... [ 10%] images/spinning-up-in-rl.png copying images... [ 20%] spinningup/../images/bench/bench_halfcheetah.svg copying images... [ 30%] spinningup/../images/bench/bench_hopper.svg copying images... [ 40%] spinningup/../images/bench/bench_walker.svg copying images... [ 50%] spinningup/../images/bench/bench_swim.svg copying images... [ 60%] spinningup/../images/bench/bench_ant.svg copying images... [ 70%] spinningup/../images/ex2-1_trpo_hopper.png copying images... [ 80%] spinningup/../images/ex2-2_ddpg_bug.svg copying images... [ 90%] spinningup/../images/rl_diagram_transparent_bg.png copying images... [100%] spinningup/../images/rl_algorithms_9_15.svg copying static files... WARNING: favicon file 'openai_icon.ico' does not exist done copying extra files... done writing mimetype file... writing META-INF/container.xml file... writing content.opf file... WARNING: unknown mimetype for _static/openai-favicon2_32x32.ico, ignoring WARNING: unknown mimetype for _static/openai_icon.ico, ignoring writing nav.xhtml file... writing toc.ncx file... writing SpinningUp.epub file... build succeeded, 8 warnings. [rtd-command-info] start-time: 2019-06-28T21:48:34.849679Z, end-time: 2019-06-28T21:48:35.158933Z, duration: 0, exit-code: 0 mv -f /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/checkouts/latest/docs/_build/epub/SpinningUp.epub /home/docs/checkouts/readthedocs.org/user_builds/openai-education-spinningup/artifacts/latest/sphinx_epub/openai-education-spinningup.epub