Skip to content

Simple Machin Learning model to regression which team in world cup 2018

Notifications You must be signed in to change notification settings

FerasBarahmeh/World-Cup-Exception

Repository files navigation

Predictions World Cup 2018

I used Machine Learning to make a Logistic Regression model using scikit-learn, pandas, numpy, to predict the results of FIFA 2018 World Cup.

Goals

  1. The goal is to use Machine Learning to predict who is going to win the FIFA World Cup 2018.

  2. Predict the outcome of individual matches for the entire competition.

  3. Run simulation of the next matches i.e quarter finals, semi finals and finals.

These goals present a unique real-world Machine Learning prediction problem and involve solving various Machine Learning tasks: data integration, feature modelling and outcome prediction.

How do I make sure that goals are achieved?

In this link You can show the real result in this World Cup Results

Data

I used two data sets from Kaggle - Results of the matches since 1930 and the World Cup 2018 Dataset. I used results of historical matches since the beginning of the championship (1930) for all participating teams.

Environment and tools

  1. PyCharm
  2. Numpy
  3. Pandas
  4. Scikit-learn
  5. string-color

I chose Logistic Regression in my model and got an accuracy of 57% on the training set and 55% accuracy on the test set. I also used the FIFA ranking as of April 2018 dataset and a dataset containing the fixture of the group stages of the tournament.

Areas of further Research/ Improvement

  1. Dataset - to improve dataset you could use FIFA, the game not the organisation, to assess the quality of each team player.

  2. A confusion matrix would be great to analyse which games the model got wrong.

  3. We could ensemble that is, we could try stacking more models together to improve the accuracy.

Releases

No releases published

Packages

No packages published

Languages