Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Warning
Shared layers in MLP policy (mlp_extractor
) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior ofnet_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.
Note
A2C and PPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue #1233
Breaking Changes:
- Removed deprecated
create_eval_env
,eval_env
,eval_log_path
,n_eval_episodes
andeval_freq
parameters,
please use anEvalCallback
instead - Removed deprecated
sde_net_arch
parameter - Removed
ret
attributes inVecNormalize
, please usereturns
instead VecNormalize
now updates the observation space when normalizing images
New Features:
- Introduced mypy type checking
- Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
- Added
with_bias
argument tocreate_mlp
- Added support for multidimensional
spaces.MultiBinary
observations - Features extractors now properly support unnormalized image-like observations (3D tensor)
when passingnormalize_images=False
- Added
normalized_image
parameter toNatureCNN
andCombinedExtractor
- Added support for Python 3.10
SB3-Contrib
- Fixed a bug in
RecurrentPPO
where the lstm states where incorrectly reshaped forn_lstm_layers > 1
(thanks @kolbytn) - Fixed
RuntimeError: rnn: hx is not contiguous
while predicting terminal values forRecurrentPPO
whenn_lstm_layers > 1
RL Zoo
- Added support for python file for configuration
- Added
monitor_kwargs
parameter
Bug Fixes:
- Fixed
ProgressBarCallback
under-reporting (@dominicgkerr) - Fixed return type of
evaluate_actions
inActorCritcPolicy
to reflect that entropy is an optional tensor (@Rocamonde) - Fixed type annotation of
policy
inBaseAlgorithm
andOffPolicyAlgorithm
- Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the
custom_objects
workaround - Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
- Fix type annotation of
model
inevaluate_policy
- Fixed
Self
return type usingTypeVar
- Fixed the env checker, the key was not passed when checking images from Dict observation space
- Fixed
normalize_images
which was not passed to parent class in some cases - Fixed
load_from_vector
that was broken with newer PyTorch version when passing PyTorch tensor
Deprecations:
- You should now explicitely pass a
features_extractor
parameter when callingextract_features()
- Deprecated shared layers in
MlpExtractor
(@AlexPasqua)
Others:
- Used issue forms instead of issue templates
- Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
- Fixed flake8 config to be compatible with flake8 6+
- Goal-conditioned environments are now characterized by the availability of the
compute_reward
method, rather than by their inheritance togym.GoalEnv
- Replaced
CartPole-v0
byCartPole-v1
is tests - Fixed
tests/test_distributions.py
type hints - Fixed
stable_baselines3/common/type_aliases.py
type hints - Fixed
stable_baselines3/common/torch_layers.py
type hints - Fixed
stable_baselines3/common/env_util.py
type hints - Fixed
stable_baselines3/common/preprocessing.py
type hints - Fixed
stable_baselines3/common/atari_wrappers.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_check_nan.py
type hints - Exposed modules in
__init__.py
with the__all__
attribute (@ZikangXiong) - Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device (~8% speed boost on GPU)
- Monkey-patched
np.bool = bool
so gym 0.21 is compatible with NumPy 1.24+ - Standardized the use of
from gym import spaces
- Modified
get_system_info
to avoid issue linked to copy-pasting on GitHub issue
Documentation:
- Updated Hugging Face Integration page (@simoninithomas)
- Changed
env
tovec_env
when environment is vectorized - Updated custom policy docs to better explain the
mlp_extractor
's dimensions (@AlexPasqua) - Updated custom policy documentation (@athatheo)
- Improved tensorboard callback doc
- Clarify doc when using image-like input
- Added RLeXplore to the project page (@yuanmingqi)