How To Create a Football Betting Model

Sports betting has quite the allure for a lot of people. By simply watching a lot of sports, following the teams every move, watching all of their games, you can then use this knowledge to make a lot of money by betting on the outcomes of these games. And all of this is something we would have done anyways!

If it only were that simple. The problem with watching games and coming up with accurate predictions based on what you feel or think yourself is that it seems to be close to impossible. Biases will always come into play, whether you think you can ignore them or not, and mistakes are easy to make.

But one thing that does not include bias is numbers. 'The numbers don't lie' you will hear some say, and that is true. If you are able to collect data about a football league, come up with a hypothesis on how previous results can predict future results, test this and end up with a profitable strategy, chances are you will have a working model that removes these biases and carves out the profit.

In this article I will show you how you can get started creating your own betting models, free from all the biases and other second-guesses that are prevalent in a simple human brain.

Note that copying what I am doing here is unlikely to make you an outright winner. This is merely an example created to teach you how to get started modelling football for betting purposes. It probably will need more work before it can turn a profit.

Structure for this Guide

To not talk too generally and over the heads of those of you that might be new to this, whether it comes to betting or coding, I figured the best thing would be to use examples when going through this guide to show you the steps needed to take when going at this.

We will start out with simple examples of how to manipulate data using python, then adding onto those ideas by checking simple ideas and angles that you might have at times, and then end up with trying a crack at creating a full-blown betting model where the aim is to have it try to predict future results based on past results.

The model we will try our hand at is to predict football results using a the Poisson distribution to predict how many goals the teams will score on average and from this get the win percentage chances based on our output.

This is not a unique idea I have and it has already been talked about a lot. It all probably started with the excellent paper written by Dixon and Coles, Modelling Association Football Scores and Inefficiencies in the Football Betting Market, put out back in 1997. This paper shows how you can get a pretty accurate prediction for outcomes in future football matches based on previous results.

Topics

Skills, programs and software needed

Python
Statistical Inference
Data
Sports Betting

First Example: Betting on Home Underdogs in the Premier League

Coding the Model
Full Code for our Home Underdogs Model

Second Example: Using the Poisson Distribution to Predict Football Matches

Theory on Creating the Betting Model
Coding the Model
Results
Discussion
Full Code for our Poisson Betting Model

I have split this article into different categories as it ended up being a lot longer than I initially thought it would be. This also makes it possible for you to skip topics that you feel comfortable with and already know and get straight to the stuff you are looking to learn.

Still there should be something interesting in every article here for those looking to learn how to make a model to bet on sports with, even if you are an expert at this, but feel free to jump around the topics to get a feel for them.

I am also gonna assume that those of you that read this article already are familiar with betting and sports betting in general. If not, I would recommend reading our beginners guide to online betting to learn the basics before proceeding. We will use both betting and coding lingo after this, so consider yourself warned!

Skills, programs and software needed

Before you dive into this guide looking to make a boatload of money, I suggest that you take a few moments to reflect on a couple of things and also think about what kind of skill set you currently possess.

You see, as I have already mentioned, simply copying this model is unlikely to turn you into a winner, so you need to understand what is going on and how to change up the model for it to become a profitable venture for you.

In order to get something out of this guide, we have some articles and skills you should acquire to get the most out of it. Note that these are not needed, but good to have if you want to understand what we are doing here.

Python

We will use Python to develop and use our model, so it would be wise to at least have a basic understanding of coding and scripting. Note that my skills are not very good either, so if you are very skilled at Python (or any other programming language), chances are you can vastly improve this model.

A couple of resources I could recommend if you want to get started learning Python is either:
CodeCademy.com's Python Course, a solid course that teaches you all the basics needed to get started creating programs or Learn Python the Hard Way, which is quite a bit more tedious, but I am partial to it as it hammers (albeit a few) concepts into your head and forces you to learn them.

You also need Python installed on your computer to use it, so go to the Python webpage to check out how to do that.

Statistical Inference

Although this is a HUGE topic, you still should have a basic understanding of statistics if you want to get anything out of this. Without it, you would be hard pressed to be able to put your theories into your model without making crucial mistakes.

Since we will be focusing on the Poisson distribution in this guide, you should at least read the wikipedia entry for that topic to see if you are able to understand what is being proposed and operated on here.

Data

If you are going to model sports, you need some data to base your work on. At times this can be hard to get a hold of, but luckily for football this is a bit easier where you have the great website www.football-data.co.uk which provides basic data for the bigger football leagues in Europe.

We are going to use these data sets throughout this tutorial.

Sports Betting

Most likely, if you have stumbled upon this page, you already know at least the basics about betting online, and I guess more than that isn't exactly needed. But it is definitely recommended!

Knowing the terms and bets available, what markets are the most popular, where to look for decent value and so on are all good things to be aware of before you start tailing your own model.

Betting on Home Underdogs in the Premier League

We are going to start this off easy by doing some basic data manipulation with Python on some data from last years season.

Let us assume we want to take a closer look at how home underdogs did during the Premier League 2015/2016 season.

We start out by creating a folder where we drop the data file, which we have renamed to '20152016.csv', and create a new Python script named 'home-underdogs.py'.

Coding the Model

The first thing we need to do is import a module called csv. This way we are able to tell Python that we are manipulating comma separated values, which is the format our data is written in.

import csv

Then we need to load our file using this module, and also skip the first row as that only contains the headers for each column and what their meaning is.

csv_file = csv.reader(open('20152016.csv'))
next(csv_file)

Now we have a variable in Python that is reading our data, next step is actually reading it and doing something with this. But first we will need to set up some other variables we will use when iterating over the lines. I will explain better what these are for and how to use these later.

First we should counters for how many games ended up in home dogs winning, and also how many did not.

upsets = 0
non_upsets = 0

Since we want to see how betting on these events would do in real life, we also need to set up some variables about our bankroll and bet sizing.

starting_bankroll = 100	
wagering_size = 5

bankroll = starting_bankroll

Now we get to core of things when we introduce a for loop to our current code. Basically what this does is that it iterates over every instance that you set for it. You could say that you want to run a code 10 times, and then use a for loop with a range(0, 9).

What we want though is to read over every game (or line of data) and do something to every game. Also, for every iteration of games we want to store parts of the data in different variables. Team names, number of goals scored, betting odds and other things. This is done by using the variable name we set when looping over the csv_file, which is game, and grab the column we want for that specific line.

So if you want the home team name, you simply write game[2]. The reason you use 2 and not 3 is because the numbering starts at 0 in Python.

So to set this up we create a for loop, with variables needed, like this:

for game in csv_file:
	home_team = game[2]
	away_team = game[3]

	home_goals = int(game[4])
	away_goals = int(game[5])

	home_odds = float(game[23])
	draw_odds = float(game[24])
	away_odds = float(game[25])

Note all the variables after the loop is there mainly to help us out. You don't need to write these out as you can just do it directly in your manipulation, but trust me, it is better to write this out as for those times you don't use the code for some time and then come back you will have no idea what you have done.

I hope you guys are still with me here, we don't have much left before we can run our code.

Now we need to use the current game data and see if it meets our criteria for a bet or not. Remember that we are looking for home underdogs. To simplify things I have set the cut off for home underdogs to be when the odds of a home team win is higher than an away team win, which I guess make sense.

To do this, we use what is called an if function. Basically what it does is that it takes a condition that needs to be met, and if that is true, then it runs some code, otherwise it does something else, which can be nothing.

And we want to check if the home odds is higher than the away odds. If they are not, we don't want to do anything.

	if home_odds > away_odds:

Now if we find that a game has a home underdog, we want to check to see if it ended up a win or not for the home team. And again for this we use and if statement, this time with an else function at the end for what the program should do when the home team did NOT win.

	if home_odds > away_odds:
		if home_goals > away_goals:
			upsets += 1
			bankroll += wagering_size * (home_odds - 1)
		else:
			non_upsets += 1
			bankroll -= wagering_size

So what happens here is that we look at a game and compare the odds of the home team winning and the odds of the away team winning. If there is higher odds for a home win, we proceed, if not, we ignore that game.

Then we check to see if the game ended in a win. This is done by testing if there were more home goals than away goals. If it was, we add 1 to our counter for home underdog wins (which we have called upsets). We also update our bankroll as if we would have bet on that game. Here we multiply our bet size with the odds we would have gotten.

Here we also have an else function added for those times the home team does NOT win. Here we add to the counter for non home underdog wins (named non_upsets) and also update the bankroll as if we would have wagered on it and thus lost in this case.

Now we are don with the loop and can run the program. But we also want to be able to see what our program is calculating for us and thus we should write out our variables in our interpreter.

The way you can get this info is to use the print command to write whatever we want. You simply write print(“whatever you want to write”) and that is it.

Now we also want to add some of our variables to the print outs, and that requires a bit extra work. By adding %s to the text, ending it with % and then the variable names will add them to the text, like so:

print ("Starting bankroll = '%s'" % (starting_bankroll))

Now the variable starting_bankroll will be filled in for the %s when it is printed out.

What you want to look at can vary, and how you want it presented can be up to you, but here are some things that I wanted to peek at and made the program print for us:

ROI = ((bankroll - starting_bankroll) / (wagering_size * (upsets + non_upsets))) * 100		

print ("There were '%s' upsets out of '%s' total matches" % (upsets, upsets + non_upsets))
print ("Starting bankroll = '%s'" % (starting_bankroll))
print ("Finishing bankroll = '%s' | ROI = '%s'" % (bankroll, ROI))

First I calculated the ROI (return on investment) for this experiment over the season.

Second I wanted to print out the number of games that was played, and how many of those included a home underdog.

Next was how big our starting bankroll was, and then finally what our ending bankroll was after the season ended.

Full Code for our Home Underdogs Model

And here is the finished code in all it's glory:

import csv

csv_file = csv.reader(open('20152016.csv'))
next(csv_file)

upsets = 0
non_upsets = 0	

starting_bankroll = 100	
wagering_size = 5

bankroll = starting_bankroll

for game in csv_file:
	home_team = game[2]
	away_team = game[3]

	home_goals = int(game[4])
	away_goals = int(game[5])

	home_odds = float(game[23])
	draw_odds = float(game[24])
	away_odds = float(game[25])

	if home_odds > away_odds:
		if home_goals > away_goals:
			upsets += 1
			bankroll += wagering_size * (home_odds - 1)
		else:
			non_upsets += 1
			bankroll -= wagering_size

ROI = ((bankroll - starting_bankroll) / (wagering_size * (upsets + non_upsets))) * 100		

print ("There were '%s' upsets out of '%s' total matches" % (upsets, upsets + non_upsets))
print ("Starting bankroll = '%s'" % (starting_bankroll))
print ("Finishing bankroll = '%s' | ROI = '%s'" % (bankroll, ROI))

Here is the output when I run the code:

We can see that we end up with a positive result after the whole of 2015/2016 Premier League season, which isn't bad. This after 114 bets placed, where 30 of those won.

The profit wasn't all that much to write home about at 7.9 units, and neither was the ROI which was a measly 1.39%. This is likely to be a good result based purely on luck, but it could be the basis for further analysis.

Maybe you want to check some of the other seasons as well to get a bigger sample size. All you need to do then is add the other csv files to your folder and input their name in the csv_file variable.

Another thing could be to check how it would have fared if you bet on draws where the odds were 3.8 or higher, or maybe see how betting on teams that come off a loss do? There are plenty of things you can explore once you know the basics of coding, so if you have any interest in betting on sports it is highly recommended getting into coding.

Using the Poisson Distribution to Predict Football Matches

Now that we have looked at some basic coding and simple betting angles, we can move on to some bigger and potentially better things.

Simple angles like the ones I have shown above are usually not good for profitable betting in future games and are not predictive features, but more patterns that will emerge. This is not to say they don't work, but often they will for a time, and then suddenly they won't, but it is hard to tell when if you don't have a hypothesis grounded in your angle.

That is what we are going to try our hand at now when we move over to cover the Poisson distribution.

Theory on Creating the Betting Model

The way that we get started creating a model is to first identify what we are looking to predict. And for most sports this is simply determining which teams will score the most goals or points, and concede the least of them.

To do this we can look at the different factors that correlates with high goal scoring, like possession of the ball, shots on goal and other relevant ones. In this example however, we will go with a much simpler approach as we will simply look at the previous scoring rates of teams (and concede rates) and compare them to the league averages.

A model that is often referenced when people are looking for ways to start predicting football matches is the excellent paper by Dixon and Coles, Modelling Association Football Scores and Inefficiencies in the Football Betting Market, mentioned earlier on this page.

They propose that you can look at past results and scores between different teams within the same league system and from these past results be able to predict future scores and results.

We are now going to build a (very basic) version of this model to use for predicting future soccer results.

Coding the Model

Let us just get right into the bits and pieces of the code. You can scroll down and find the full code if you want to read it in one go. From here I will simply explain what I have done and what the different pieces of the code does.

import csv, math, ast, numpy as np

First we import the different modules that we need in this script. For now this is the csv, math, ast and numpy. The numpy as np is simply to shorten the module name when we write out the code and isn't necessary to do. Note that you must use numpy instead of np in our code then.

Most of these will already be installed on your python installations, but numpy you probably need to fetch yourself. There are plenty of guides out there to show you how to do it, so a simple google search should be helpful.

def poisson(actual, mean):
    return math.pow(mean, actual) * math.exp(-mean) / math.factorial(actual)

Here we are going to create our first function that we can reuse. Since we are going to use Poisson a couple of times throughout the script, we you can rather write out the code once and then use it again as often you like with just write a short line instead of the whole sequence.

def means we define a new function that we name poisson. The variables in the parenthesis is the arguments you must provide when you use it.

Then you write the sequence you want it to run. In this case it is simply a calculation of the probability of 'actual' amounts of goals scored, when the mean is equal to 'mean'. This writes out the probability 'p'.

We are going to use this function later in the script.

csvFile = '20152016.csv'

team_list = []

k = open('team_list.txt', 'w')
k.write("""{
""")

Here is a longer list of code, but it is some simpler ones where we are merely setting up our variables, data and other inputs we are going to use.

We start of by defining the data we want to use, namely the 2015-2016 Premier League season.

Then we create an empty list, called team_list. Done by using brackets with no info in them.

Next we open a file named 'team_list.txt'. You do not need to create this as if it does not exist, Python will create it. If it does exist, using the 'w' tag after the name ensures that it deletes all the info in the file and starts fresh.

Then we write the first line in the file, which is the start of a dictionary. This is needed to hold and update our data for each teams variables.

csvRead = csv.reader(open(csvFile))
next(csvRead)

for row in csvRead:
	if row[2] not in team_list:
		team_list.append(row[2])
	if row[3] not in team_list:
		team_list.append(row[3])

team_list.sort()

for team in team_list:
	k.write("""	'%s': {'home_goals': 0, 'away_goals': 0, 'home_conceded': 0, 'away_conceded': 0, 'home_games': 0, 'away_games': 0, 'alpha_h': 0, 'beta_h': 0, 'alpha_a': 0, 'beta_a': 0},
""" % (team))

k.write("}")
k.close()

s = open('team_list.txt', 'r').read()
dict = ast.literal_eval(s)

Next we want to iterate over our data file and find all the team names that are going to be used. This is done like the previous example where we open the csv file, skip the first line and then create a for loop.

The for loop simply reads both team names and then checks if the names are in our newly created team_list. If they are not, they get added to the list.

After it has checked all the team names, it sorts the new list alphabetically.

Then it goes over to another for loop where it iterates over all the teams that have been found. For each team it will be written a line in the text file with different variables that we are going to use. In this example we will record all the home goals and away goals they score, as well as how many goals they concede both at home and away. We also track the amount of home and away games.

Last we have some variables that might not be all too obvious what is at first glance. Remember what we have set out to do here: calculate the scoring rate and concede rate for every team, and from this calculate probabilities of winning a game.

We then write the end of the file and close it. This is so as to save what we have written if we want to open it again.

The next two lines creates our dictionary where we will hold and update our data we read. This is done by using the ast module which will read the team_list.txt file and then create a dictionary called 'dict'.

GAMES_PLAYED = 0
WEEKS_WAIT = 4
TOTAL_VALUE = 0

csvRead = csv.reader(open(csvFile))
next(csvRead)

Then we write a few variables we will use throughout the script. GAMES_PLAYED is simply the total tally of games played, WEEKS_WAIT is the number of weeks we want to wait before we start placing bets on our model and then TOTAL_VALUE will be used to update how are betting are doing.

Next we open another iteration of the data and skip the first line as usual, before we start with our main loop.

for game in csvRead:
	home_team = game[2]
	away_team = game[3]

	home_goals = int(game[4])
	away_goals = int(game[5])

	home_win_prob = 0
	draw_win_prob = 0
	away_win_prob = 0
	
	curr_home_goals = 0
	curr_away_goals = 0
	avg_home_goals = 1
	avg_away_goals = 1
	
	team_bet = ''
	ev_bet = ''

We start our for loop where we look at each individual game in our data file and do some work on each of them.

Here we pull out some data we need in the team names and goals scored by each of them. We then create some more variables that we are going to use as well, most which should be self-explanatory.

Now you may ask: why not place them outside the loop like some of the other ones? The reason is that we want to reset these after every game we have analyzed.

	# GETTING UPDATED VARIABLES
	for key, value in dict.items():
		curr_home_goals += dict[key]['home_goals']
		curr_away_goals += dict[key]['away_goals']
		
		if GAMES_PLAYED > (WEEKS_WAIT * 10):
			avg_home_goals = curr_home_goals / (GAMES_PLAYED)
			avg_away_goals = curr_away_goals / (GAMES_PLAYED)

The first “work” we are going to do is update variables based on what our previous run did to our data. We iterate over all of the teams in our dictionary called 'dict' and add the amount of goals scored at home to the home goal variable and goals scored away to that variable.

Then we calculate the average amount of goals scored (both home and away). Note that you might see we have run an if function before this calculation, and that is because we imposed a limit on how many data points we would like before we starting calculating variables and placing bets.

This is because the model can be quite erratic when it has few data points. Let us say Arsenal win their two first matches 4-0 or something, then the model may think Arsenal is crazy good, having a high scoring rate and conceding no goals and probably bet on any odds on Arsenal. This might not be wrong, but you can adjust this value yourself depending on how you feel. I think 4 weeks waiting time is a good number.

	# CALCULATING FACTORS
	if GAMES_PLAYED > (WEEKS_WAIT * 10):
		home_team_a = (dict[home_team]['alpha_h'] + dict[home_team]['alpha_a']) / 2
		away_team_a = (dict[away_team]['alpha_h'] + dict[away_team]['alpha_a']) / 2
		
		home_team_d = (dict[home_team]['beta_h'] + dict[home_team]['beta_a']) / 2
		away_team_d = (dict[away_team]['beta_h'] + dict[away_team]['beta_a']) / 2
		
		home_team_exp = avg_home_goals * home_team_a * away_team_d
		away_team_exp = avg_away_goals * away_team_a * home_team_d

After getting our average values calculated, we then do the same for each individual team. Again we wait a set amount of weeks to start this calculation.

We are looking to calculate the attacking rate of the home team and the away team, as well as both teams defensive ratings.

These values will then be used to calculate an expected value for how many goals will be scored by both the home and the away team.

	# RUNNING POISSON	
		l = open('poisson.txt', 'w')
		
		for i in range(10):
			for j in range(10):
				prob = poisson(i, home_team_exp) * poisson(j, away_team_exp)
				l.write("Prob%s%s = %s\n" % (i, j, prob))
		
		l.close()

Now we are getting to the part where we will use Poisson to calculate some probabilities, based on previous calculated variables.

We start off by opening a new text file we call 'poisson.txt'. This is to store the different probabilities that we are looking to create here and so to use them later.

Then we start a for loop for variable i between 0 and 10, and the same for j. These are to represent the different amounts of goals the home team and away team are scoring in different scenarios, respectively.

So we start out with i = 0 and j = 0, that means the outcome of a match being 0-0. We then start working on calculating the probability of that happening, which is simply the chance of the home team scoring 0 goals, multiplied with the chance of the away team to score 0 goals.

The odds of this score is then written to our text file, and then the next iteration is run, until we have every score up to 10-10 calculated.

		with open('poisson.txt') as f:
			for line in f:
				
				home_goals_m = int(line.split(' = ')[0][4])
				away_goals_m = int(line.split(' = ')[0][5])
				
				prob = float(line.split(' = ')[1])
				
				if home_goals_m > away_goals_m:
					home_win_prob += prob
				elif home_goals_m == away_goals_m:
					draw_win_prob += prob
				elif home_goals_m < away_goals_m:
					away_win_prob += prob

Next we are reading the file we just wrote to sum up all the probabilities of the home team winning, all the probabilities of a draw and then all the probabilities of an away win. No we have made our predictions for each teams chances of winning.

	#CALCULATE VALUE
		bet365odds_h, bet365odds_d, bet365odds_a = float(game[23]), float(game[24]), float(game[25])
		
		ev_h = (home_win_prob * (bet365odds_h - 1)) - (1 - home_win_prob)
		ev_d = (draw_win_prob * (bet365odds_d - 1)) - (1 - draw_win_prob)
		ev_a = (away_win_prob * (bet365odds_a - 1)) - (1 - away_win_prob)
		
		highestEV = max(ev_h, ev_d, ev_a)

With these probabilities we can start looking for good bets, and to do that we need to compare them to the price we can get on the different outcomes.

Thus we fetch the odds from Bet365 (I will use Bet365 in this example as it seems to be a popular bookmaker) and use these odds to calculate the expected value (EV) of the three different outcomes.

Out of these we will get the one with the highest EV, as we are not going to bet on two or more different outcomes here.

		if (ev_h == highestEV) and (ev_h > 0):
			team_bet = home_team
			ev_bet = ev_h
			if home_goals > away_goals:
				TOTAL_VALUE += (bet365odds_h - 1)
			else:
				TOTAL_VALUE -= 1
				
		elif (ev_d == highestEV) and (ev_d > 0):
			team_bet = 'Draw'
			ev_bet = ev_d
			if home_goals == away_goals:
				TOTAL_VALUE += (bet365odds_d - 1)
			else:
				TOTAL_VALUE -= 1
		elif (ev_a == highestEV) and (ev_a > 0):
			team_bet = away_team
			ev_bet = ev_a
			if home_goals < away_goals:
				TOTAL_VALUE += (bet365odds_a - 1)
			else:
				TOTAL_VALUE -= 1
		
		if (team_bet != '') and (ev_bet != ''):
			print ("Bet on '%s' (EV = %s)" % (team_bet, ev_bet))	
			print (TOTAL_VALUE)

After this we write an if function to determine which bets we are placing. We compare the EV to the ones calculated to find which our model likes the most and also make sure that the EV is positive.

Then we calculate how our bet would have fared, as we also have the results for each game already in our data. If the bet is a win, we add the odds to our bankroll (variable TOTAL_VALUE), if not, we deduct one unit.

The last if function is simply one I have written to have the Python interpreter write out all the bets that is placed. Usually good practice to have the program write out different things here and there so it is easier to find where problems might arise or changes could be made.

	# UPDATE VARIABLES AFTER MATCH HAS BEEN PLAYED
	dict[home_team]['home_goals'] += home_goals
	dict[home_team]['home_conceded'] += away_goals
	dict[home_team]['home_games'] += 1
	
	dict[away_team]['away_goals'] += away_goals
	dict[away_team]['away_conceded'] += home_goals
	dict[away_team]['away_games'] += 1
	
	GAMES_PLAYED += 1
	
	# CREATE FACTORS
	if GAMES_PLAYED > (WEEKS_WAIT * 10):
		for key, value in dict.items():
			alpha_h = (dict[key]['home_goals'] / dict[key]['home_games']) / avg_home_goals
			beta_h = (dict[key]['home_conceded'] / dict[key]['home_games']) / avg_away_goals

			alpha_a = (dict[key]['away_goals'] / dict[key]['away_games']) / avg_away_goals
			beta_a = (dict[key]['away_conceded'] / dict[key]['away_games']) / avg_home_goals

			dict[key]['alpha_h'] = alpha_h
			dict[key]['beta_h'] = beta_h
			dict[key]['alpha_a'] = alpha_a
			dict[key]['beta_a'] = beta_a

The final step of our script is to update the variables that we use to calculate the different rates for the different teams.

We start out with adding the goals scored and conceded in this game we are in right now, and also the total tally of games played.

Next we update the attack and defense rate for each team with the updated data. This is then written to our dictionary.

Results

Now that we run the script, we get the following results:

We see that we end up with about 17.5 units of profit after a season of betting. Out of the 380 games that was placed, we founds bets on 334 of them. We know that we tell the model to hold off on the first 40 games, so this means that it finds a value bet on almost all of the games out there.

These results net us an ROI of 5.24%.

Discussion

That is pretty decent isn't it? 5% ROI is not massive by any means, but profit it profit right?

Well, the idea behind this article wasn't to spoon feed you how to be a winner at sports betting, it takes more than a simple modelling to do that. Our intent was rather to show you how you can get started doing it yourself.

I also want to note something about the results that even though it looks like a winner here, I have run it on some other seasons as well, and we end up with some big losers and some winners there, so it could all be down to random luck that this set we looked at turned out to be profitable.

Maybe you can take a look at some other season to test it out?

It actually works! - You now actually have a working betting model. It might not win money (yet!), but with some work it might.
It beats the house edge - We saw some varying results when running this model over different seasons, but we believe we can conclude with one thing: we can beat the house edge. That should mean we are onto something with this model, and we should further our work to be able to come closer to our holy grail.
Plenty of possible improvements - Again, we have barely touched the surface here, and we are also only working with the bare basics of a betting model here. If we add some more features to it, we could turn things around by quite a margin.

Things we could do to improve the model would be to account for things like different states of the game and value more recent results higher. These are a bit more complicated to add and is out of the scope of this guide, but we can confirm that when we have tinkered with these it has improved our results.

Note though that you should be careful with adding too much information to a model as you might get a problem with what we call overfitting.

Full Code for the Poisson Betting Model

Here is the full code for you to copy if you feel like it, but as I have learned from Zed Shaw, you should probably write it out yourself to practice actual writing of code ;)

import csv, math, ast, numpy as np

def poisson(actual, mean):
    return math.pow(mean, actual) * math.exp(-mean) / math.factorial(actual)

csvFile = '20152016.csv'

team_list = []

k = open('team_list.txt', 'w')
k.write("""{
""")

csvRead = csv.reader(open(csvFile))
next(csvRead)

for row in csvRead:
	if row[2] not in team_list:
		team_list.append(row[2])
	if row[3] not in team_list:
		team_list.append(row[3])

team_list.sort()

for team in team_list:
	k.write("""	'%s': {'home_goals': 0, 'away_goals': 0, 'home_conceded': 0, 'away_conceded': 0, 'home_games': 0, 'away_games': 0, 'alpha_h': 0, 'beta_h': 0, 'alpha_a': 0, 'beta_a': 0},
""" % (team))

k.write("}")
k.close()

s = open('team_list.txt', 'r').read()
dict = ast.literal_eval(s)

GAMES_PLAYED = 0
WEEKS_WAIT = 4
TOTAL_VALUE = 0

csvRead = csv.reader(open(csvFile))
next(csvRead)

for game in csvRead:
	home_team = game[2]
	away_team = game[3]

	home_goals = int(game[4])
	away_goals = int(game[5])

	home_win_prob = 0
	draw_win_prob = 0
	away_win_prob = 0
	
	curr_home_goals = 0
	curr_away_goals = 0
	avg_home_goals = 1
	avg_away_goals = 1
	
	team_bet = ''
	ev_bet = ''
	
	# GETTING UPDATED VARIABLES
	for key, value in dict.items():
		curr_home_goals += dict[key]['home_goals']
		curr_away_goals += dict[key]['away_goals']
		
		if GAMES_PLAYED > (WEEKS_WAIT * 10):
			avg_home_goals = curr_home_goals / (GAMES_PLAYED)
			avg_away_goals = curr_away_goals / (GAMES_PLAYED)
	
	
	# CALCULATING FACTORS
	if GAMES_PLAYED > (WEEKS_WAIT * 10):
		home_team_a = (dict[home_team]['alpha_h'] + dict[home_team]['alpha_a']) / 2
		away_team_a = (dict[away_team]['alpha_h'] + dict[away_team]['alpha_a']) / 2
		
		home_team_d = (dict[home_team]['beta_h'] + dict[home_team]['beta_a']) / 2
		away_team_d = (dict[away_team]['beta_h'] + dict[away_team]['beta_a']) / 2
		
		home_team_exp = avg_home_goals * home_team_a * away_team_d
		away_team_exp = avg_away_goals * away_team_a * home_team_d
	
	
	# RUNNING POISSON	
		l = open('poisson.txt', 'w')
		
		for i in range(10):
			for j in range(10):
				prob = tau * poisson(i, home_team_exp) * poisson(j, away_team_exp)
				l.write("Prob%s%s = %s\n" % (i, j, prob))
		
		l.close()
		
		with open('poisson.txt') as f:
			for line in f:
				
				home_goals_m = int(line.split(' = ')[0][4])
				away_goals_m = int(line.split(' = ')[0][5])
				
				prob = float(line.split(' = ')[1])
				
				if home_goals_m > away_goals_m:
					home_win_prob += prob
				elif home_goals_m == away_goals_m:
					draw_win_prob += prob
				elif home_goals_m < away_goals_m:
					away_win_prob += prob

	#CALCULATE VALUE
		bet365odds_h, bet365odds_d, bet365odds_a = float(game[23]), float(game[24]), float(game[25])
		
		ev_h = (home_win_prob * (bet365odds_h - 1)) - (1 - home_win_prob)
		ev_d = (draw_win_prob * (bet365odds_d - 1)) - (1 - draw_win_prob)
		ev_a = (away_win_prob * (bet365odds_a - 1)) - (1 - away_win_prob)
		
		highestEV = max(ev_h, ev_d, ev_a)
		
		if (ev_h == highestEV) and (ev_h > 0):
			team_bet = home_team
			ev_bet = ev_h
			if home_goals > away_goals:
				TOTAL_VALUE += (bet365odds_h - 1)
			else:
				TOTAL_VALUE -= 1
				
		elif (ev_d == highestEV) and (ev_d > 0):
			team_bet = 'Draw'
			ev_bet = ev_d
			if home_goals == away_goals:
				TOTAL_VALUE += (bet365odds_d - 1)
			else:
				TOTAL_VALUE -= 1
		elif (ev_a == highestEV) and (ev_a > 0):
			team_bet = away_team
			ev_bet = ev_a
			if home_goals < away_goals:
				TOTAL_VALUE += (bet365odds_a - 1)
			else:
				TOTAL_VALUE -= 1
		
		if (team_bet != '') and (ev_bet != ''):
			print ("Bet on '%s' (EV = %s)" % (team_bet, ev_bet))	
			print (TOTAL_VALUE)
		
	# UPDATE VARIABLES AFTER MATCH HAS BEEN PLAYED
	dict[home_team]['home_goals'] += home_goals
	dict[home_team]['home_conceded'] += away_goals
	dict[home_team]['home_games'] += 1
	
	dict[away_team]['away_goals'] += away_goals
	dict[away_team]['away_conceded'] += home_goals
	dict[away_team]['away_games'] += 1
	
	GAMES_PLAYED += 1
	
	# CREATE FACTORS
	if GAMES_PLAYED > (WEEKS_WAIT * 10):
		for key, value in dict.items():
			alpha_h = (dict[key]['home_goals'] / dict[key]['home_games']) / avg_home_goals
			beta_h = (dict[key]['home_conceded'] / dict[key]['home_games']) / avg_away_goals

			alpha_a = (dict[key]['away_goals'] / dict[key]['away_games']) / avg_away_goals
			beta_a = (dict[key]['away_conceded'] / dict[key]['away_games']) / avg_home_goals

			dict[key]['alpha_h'] = alpha_h
			dict[key]['beta_h'] = beta_h
			dict[key]['alpha_a'] = alpha_a
			dict[key]['beta_a'] = beta_a

Frequently Asked Questions

Will copying this betting model win me money?

The short answer is no. In it's current state the betting model is very simple and it is highly likely that the bookmakers have taken into account the factors be calculate in here.

However, when we did tests on different data sets and different leagues, we found it to be a slight winner at this simple stage. When we added some more features, like adjust for different states of the game and value more recent scores higher, we saw an increase in the return we would have gotten from the model.

Why use Python?

Simple answer to that is that this is what I know. I have heard languages like R might be better fitted, but I already know and feel comfortable with Python, so that is what I use. You can probably easily translate this to whatever language you prefer, this is not a complicated or heavy script by any means.

If you have any questions, comments or find any errors in this tutorial, please do not hesitate to contact us, we would be happy to be in touch with our readers and help those that are looking to take this a step further.

How To Create a Football Betting Model

Structure for this Guide

Topics

Skills, programs and software needed

Python

Statistical Inference

Data

Sports Betting

Betting on Home Underdogs in the Premier League

Coding the Model

Full Code for our Home Underdogs Model

Using the Poisson Distribution to Predict Football Matches

Theory on Creating the Betting Model

Coding the Model

Results

Discussion

Full Code for the Poisson Betting Model

Frequently Asked Questions

Free Football Betting Tips