- Add dataset (with 100,000 pieces of < user, weight, clicks > pairs) for supervised learning.
- SL baseline and user action model are updated.
- Add RL and SL baselines. Try
python ./virtualTB/ReinforcementLearning/main.py
to see more.
- This changelog file
- Add ./model/LeaveModel.py : predict when a virtual user will leave the vTaobao platform
- Fix the range of env.observation_space
- Now virtual users may not click an item more than once
- The user action model now never contains the information of whether user continue browsing any more. Instead we employ a new model which simulates when the user will leave