I used Machine Learning to make a Logistic Regression model using scikit-learn, pandas, numpy, to predict the results of FIFA 2018 World Cup.
-
The goal is to use Machine Learning to predict who is going to win the FIFA World Cup 2018.
-
Predict the outcome of individual matches for the entire competition.
-
Run simulation of the next matches i.e quarter finals, semi finals and finals.
These goals present a unique real-world Machine Learning prediction problem and involve solving various Machine Learning tasks: data integration, feature modelling and outcome prediction.
In this link You can show the real result in this World Cup Results
I used two data sets from Kaggle - Results of the matches since 1930 and the World Cup 2018 Dataset. I used results of historical matches since the beginning of the championship (1930) for all participating teams.
- PyCharm
- Numpy
- Pandas
- Scikit-learn
- string-color
I chose Logistic Regression in my model and got an accuracy of 57% on the training set and 55% accuracy on the test set. I also used the FIFA ranking as of April 2018 dataset and a dataset containing the fixture of the group stages of the tournament.
-
Dataset - to improve dataset you could use FIFA, the game not the organisation, to assess the quality of each team player.
-
A confusion matrix would be great to analyse which games the model got wrong.
-
We could ensemble that is, we could try stacking more models together to improve the accuracy.