Predictive Analytics with R in Power BI February 7, 2019 Joseph Yeates Leave a comment https://feathersanalytics.com/wp-content/uploads/2019/02/Data-Connector-Paste-Code-e1549562374192-300x72.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/02/Data-Connector-Paste-Code-e1549562374192-768x186.png 768w" sizes="(max-width: 989px) 100vw, 989px" /> Power BI reports are typically used for descriptive analysis: a historical look of what has happened. But what if your report could visualize what is likely to happen and recommend what you can do about it now? This post will cover how to begin the shift from descriptive to predictive analysis by demonstrating three ways to integrate predictive models from R into Power BI. What is being analyzed? The business case and data set I use to demonstrate this integration is based on Premier League Fantasy Football. The goal of fantasy football is to score as many points as possible over the season. Points are scored by Premier League players based on their performance in real life matches. Fantasy owners must pick a team of fifteen players while being subject to two mains constraints: Price – The total cost of all fifteen players must not exceed £100 at the start of the season Position – Each team must have two goalkeepers, five defenders, five midfielders and three forwards Player price is set by the market within the fantasy game: as more team owners buy a player their price will increase and as owners sell a player their price will decrease. This is an important dynamic to consider, as player price appreciation can result in a higher budget to spend on your team. Each owner is permitted one free transfer a week to update their team throughout the season. Descriptive Analysis To begin the analysis, I created a Power BI report. Building the visuals allowed for a quick exploration and understanding of the data. I decided to focus on three main features of the data set: Points scored – what I am trying to maximize to win the game Player price – one of the main constraints when selecting players Player position – another constraint when selecting players My report contains four main visuals depicting the three features of the data outlined above. I also included some summary statistics to reference when interpreting these visuals. Predictive Analysis & Integration I created three different predictive models in R to gain a better understanding of the relationships and patterns in the data. I used the following techniques: K-Means Clustering Linear Regression Classification Tree Each model will be used to illustrate a method to integrate predictive analytics into Power BI: R Script Data Connector Run R Script – Power Query Editor Custom R Script Visual R Script Data Connector The first model I integrated into the report was the K-Means clustering model. This technique created three clusters and assigned each player to one of these clusters based on the characteristics they exhibit. To connect to the results of this analysis, I navigated to the Get Data button on the ribbon. The R Script Data Connector is available under the Other section of the Get Data window. https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Get-Data-285x300.png 285w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Get-Data-768x807.png 768w" sizes="(max-width: 860px) 100vw, 860px" /> After selecting the R script option, I pasted the following script in the R Script window. ##Load Packages needed to run the R code library(readxl) library(tidyverse) ##Import Data file_path <- ##"file path of the source Excel file/Predictive Analytics with R in Power BI.xlsx" data_set <- read_excel(file_path) %>% select(id, total_points, now_cost, selected_by_percent, minutes, goals_scored, assists, clean_sheets, pos) data_set$pos <- factor(data_set$pos, levels = c("GKP", "DEF", "MID", "FWD"), ordered = TRUE) ##Manipulate Data model_data <- data_set %>% filter(minutes > 180) %>% mutate(points_per_90 = total_points / minutes * 90) %>% select(-minutes, -total_points) ##Create Clusters set.seed(237) fantasy_clusters <- kmeans(model_data[,2:6,8], 3) model_data$cluster <- as.factor(fantasy_clusters$cluster) view raw K-Means Clustering hosted withby GitHub https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Paste-Code-2-300x172.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Paste-Code-2-768x440.png 768w" sizes="(max-width: 989px) 100vw, 989px" /> After pressing OK, the Navigator window loaded. I had the option to connect to the two objects I created in my script. I connected to the model_data table. https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code-300x241.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code-768x617.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code.png 1160w" sizes="(max-width: 1024px) 100vw, 1024px" /> I loaded this table to the report after removing some of the columns. I related my new model_data query to the existing Game_Stats query using the id column from both queries. https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-300x134.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-768x344.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-1600x716.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> I now had a column containing the cluster that each player belonged to. Even though this column was created in R, it is available to include in my visuals along with data from the other sources already in the report. I modified the scatter plot visual by using the K-Means clusters as the legend, replacing player position. It is now easy to view and interpret the results of this model. https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-300x134.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-768x342.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-1600x712.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> Run R Script The second model I integrated into the Power BI report is the Linear Regression model. This technique creates a linear equation to explain how the other variables impact points scored. I went to the Power Query editor and created a new query by referencing Game_Stats. I manipulated the data in preparation for running the model. To create the Linear Regression model in the Power Query editor, I navigated to the Scripts section on the Transform tab and selected the Run R Script button. https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-300x160.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-768x410.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-1600x854.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> This opened the Run R script window. The query defined in the previous Applied Steps was maintained and stored in an object called dataset. I used this as the starting point to run the R script. I pasted the following script into the pane. model_data <- dataset fantasy_lm <- lm(points_per_90 ~ now_cost + goals_scored + assists + clean_sheets, data = model_data) model_data$score_lm <- predict(fantasy_lm, model_data) view raw Linear Regression Model hosted withby GitHub https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Paste-Code-300x188.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Paste-Code-768x481.png 768w" sizes="(max-width: 960px) 100vw, 960px" /> This simple script did two things: Created a linear model Created a new column containing the predicted points scored for each player based on the model https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-300x157.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-768x402.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-1600x837.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> I loaded the query to my report and once again related it to Game_Stats using the id column. To integrate the results into the report, I modified the column chart to identify the ten players who outperformed their predicted points scored. This analysis is helpful in both identifying potential players to transfer into my team and identifying possible ways to improve the performance of the model. https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-300x131.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-768x336.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-1600x701.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> Custom R Script Visual The third (and final) model I integrated into the Power BI report is the Classification Tree model. This technique splits players into different categories based on an optimal set of yes or no questions. I started by removing the table visual from my report and adding an R script visual. This created a blank visual and opened the R script editor window. https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-300x133.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-768x342.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-1600x712.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> The first step before running a script is to add variables to the Values section in the Visualizations pane. This is important to remember, as the visual will not work unless it has data to visualize! I dragged the columns I needed into the visual from the Predictive Analytics query. My next step was to paste the following script into the R script editor. ##Load Packages library(tidyverse) library(rpart) library(rpart.plot) ##Manipulate model_data <- dataset ##Create Categorical pp90_bin <- cut(model_data$points_per_90, breaks = c(0, 2.5, 5.5, 100), labels = c("Low", "Medium", "High"), right = FALSE, include.lowest = TRUE ) model_data$pp90_bin <- pp90_bin ##Decision Tree fantasy_tree <- rpart(pp90_bin ~ now_cost + selected_by_percent + goals_scored + assists + clean_sheets + pos, data = model_data, method = "class") ##visualize rpart.plot(fantasy_tree) view raw Classification Tree Model hosted withby GitHub https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-300x136.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-768x348.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-1600x724.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /> Running the script generated the Classification Tree visual. This visual allows the report user to quickly identify the important splitting points in the data: which variables are important to identify high or low scoring players. Those are three that I integrate predictive analytics from R into Power BI! My session Predictive Analytics with R in Power BI goes into more detail around the creation of the models and the key concepts and best practices to keep in mind when doing so.