You can find a detailed description of the system here. Or just implement it by following the three steps below. The data contains 640 rows, including the game results from October 2, 2019 to January 3, 2020. It has five variables: date, visitors, visitor destinations, home, and home. The next line records the game on December 6, 2019. The Montreal Canadiens (away team) played the New York Rangers (home team) in a final score of 2-1.
First, we create the variable target_difference as the difference between home_goals and visitor_goals.
It is greater than 0 if the home team wins, less than 0 if the home team loses and 0 if the two teams are tied. We also added two flags home_win and home_loss to account for the impact of home advantage on teams. The data looks like this: then we create two arrays of dummy variables df_visitor and df_home that record the guest and host teams. The df_visitor header looks like this: it is an array with the team name as a column and the team dummy variables visitor for each game as a string.
Row 0 contains the Vancouver Canucks column with a value of 1 and other columns with a value of 0.
This indicates that the away team for this particular game is the Vancouver Canucks. The df_home array has a similar structure, but shows the home team in their respective games. We will then transform these two matrices to become the final data set. We subtract df_visitor from df_home to get the final data set called df_model.
Each df_model line shows a guest machine with a value of -1 and a local machine with a value of
1. We also add the target_difference variable from the original df dataset. The final data set df_model looks like this: For example: On line 4, Anaheim Ducks (home team) is playing Arizona Coyotes (away team). The Anaheim Ducks (home team) won by one goal. Therefore, the final data set will contain information on goal difference and home advantage factor. Now we can input data into the model! We are using a ridge regression model as a demonstration.