Regym/i2a-TODO.md at master · Near32/Regym · GitHub

Imagination Augmented (ML) Agents

Create an I2A agent which collects all necessary imformation from environment

Figure out how to pass current PPO agent into I2A

Imagination module

Rollout strategy Distilled policy

Actor Network acting as the rolloutout strategy.

Environment model

Create interface to receive observation and output next observation + reward
Implement functionality as a Recurrent NN (what loss function? MSE? Cross Entropy?). Prediction on pixel space
Implement Conditioned-($\beta$) Variational Auto Encoder to move away from pixel space
Figure out how to train C-$\beta$-VAE from training policies online (How to fit training in our RL loop)

Imagined trajectory Encoder

The rollout encoding strategy is carried on by a Recurrent Convolutional Network.
Create an LSTM which encodes backwards the imagined trajectories into an rollout embeddings
Many model architecture can be used here (cf. PlaNet algorithm and papers...).
Implement a vanilla approach, with great care when it comes to data representation manipulation, i.e. switching between batched representation and sequence representation.

Aggregator

Create an aggregator which simply concatenates the rollout embeddings into an imagination code.
Use an attention mechanism

Tying it all together

Concatenate imagination code with last layer of the Model Free (ppo) into a fully connected layer which outputs an action and a value function.
Compute loss and propagate gradients backwards? (FIGURE OUT)