Skip to content

Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements

Compare
Choose a tag to compare
@araffin araffin released this 10 Jan 16:52
· 150 commits to master since this release
6b8905a

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Warning
Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note
A2C and PPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue #1233

Breaking Changes:

  • Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters,
    please use an EvalCallback instead
  • Removed deprecated sde_net_arch parameter
  • Removed ret attributes in VecNormalize, please use returns instead
  • VecNormalize now updates the observation space when normalizing images

New Features:

  • Introduced mypy type checking
  • Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
  • Added with_bias argument to create_mlp
  • Added support for multidimensional spaces.MultiBinary observations
  • Features extractors now properly support unnormalized image-like observations (3D tensor)
    when passing normalize_images=False
  • Added normalized_image parameter to NatureCNN and CombinedExtractor
  • Added support for Python 3.10

SB3-Contrib

  • Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
  • Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

RL Zoo

  • Added support for python file for configuration
  • Added monitor_kwargs parameter

Bug Fixes:

  • Fixed ProgressBarCallback under-reporting (@dominicgkerr)
  • Fixed return type of evaluate_actions in ActorCritcPolicy to reflect that entropy is an optional tensor (@Rocamonde)
  • Fixed type annotation of policy in BaseAlgorithm and OffPolicyAlgorithm
  • Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the custom_objects workaround
  • Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
  • Fix type annotation of model in evaluate_policy
  • Fixed Self return type using TypeVar
  • Fixed the env checker, the key was not passed when checking images from Dict observation space
  • Fixed normalize_images which was not passed to parent class in some cases
  • Fixed load_from_vector that was broken with newer PyTorch version when passing PyTorch tensor

Deprecations:

  • You should now explicitely pass a features_extractor parameter when calling extract_features()
  • Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

  • Used issue forms instead of issue templates
  • Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
  • Fixed flake8 config to be compatible with flake8 6+
  • Goal-conditioned environments are now characterized by the availability of the compute_reward method, rather than by their inheritance to gym.GoalEnv
  • Replaced CartPole-v0 by CartPole-v1 is tests
  • Fixed tests/test_distributions.py type hints
  • Fixed stable_baselines3/common/type_aliases.py type hints
  • Fixed stable_baselines3/common/torch_layers.py type hints
  • Fixed stable_baselines3/common/env_util.py type hints
  • Fixed stable_baselines3/common/preprocessing.py type hints
  • Fixed stable_baselines3/common/atari_wrappers.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_check_nan.py type hints
  • Exposed modules in __init__.py with the __all__ attribute (@ZikangXiong)
  • Upgraded GitHub CI/setup-python to v4 and checkout to v3
  • Set tensors construction directly on the device (~8% speed boost on GPU)
  • Monkey-patched np.bool = bool so gym 0.21 is compatible with NumPy 1.24+
  • Standardized the use of from gym import spaces
  • Modified get_system_info to avoid issue linked to copy-pasting on GitHub issue

Documentation:

  • Updated Hugging Face Integration page (@simoninithomas)
  • Changed env to vec_env when environment is vectorized
  • Updated custom policy docs to better explain the mlp_extractor's dimensions (@AlexPasqua)
  • Updated custom policy documentation (@athatheo)
  • Improved tensorboard callback doc
  • Clarify doc when using image-like input
  • Added RLeXplore to the project page (@yuanmingqi)