Predicting football match winner based only on the outcome of previous matches between the two teams

I’m a huge football (soccer) fan and interested in machine learning too. As a project for my ML course I’m trying to build a model that would predict the chance of winning for the home team, given the names of the home and away team. (I query my dataset and accordingly create datapoints based on previous matches between those 2 teams)

I have data for several seasons for all teams however I have the following issues that I would like some advice with. The EPL (English Premier League) has 20 teams which play each other at home and away (380 total games in a season). Thus, each season, any two teams play each other only twice.

I have data for the past 10+ years, resulting in 2*10=20 datapoints for the two teams. However I do not want to go past 3 years since I believe teams change quite considerably over time (ManCity, Liverpool) and this would only introduce more error into the system.

So this results in just around 6-8 data points for each pair of team. However, I do have several features (up to 20+) for each data point like full-time goals, half time goals, passes, shots, yellows, reds, etc. for both teams so I can include features like recent form, recent home form, recent away form etc.

However the idea of just having only 6-8 datapoints to train with seems incorrect to me. Any thoughts on how I could counter this problem? (if this is a problem in the first place)


What about improving your dataset by means of taking into consideration also some data about the matches vs the same opponent?


TeamA vs TeamC: 1-0
TeamB vs TeamC: 2-0
=> "infer" the fake outcome: TeamA vs TeamB: 1-2

Furthermore, in my opinion this kind of date are better than the data that you proposed, because the last year teams are often very different teams.

Source : Link , Question Author : keithxm23 , Answer Author : Gala

Leave a Comment