Power BI For Data Science

Please login or click SIGN UP FOR FREE to create your PowerBIUG account to join this user group.

Predictive Analytics with R in Power BI (K-Mean + Linear Regression + Decision Tree, in the same analysis)

  • 1.  Predictive Analytics with R in Power BI (K-Mean + Linear Regression + Decision Tree, in the same analysis)

    Top Contributor
    Posted Feb 08, 2019 10:13 AM
    <g class="gr_ gr_63 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation only-ins replaceWithoutSep" id="63" data-gr-id="63">Hello</g> community,

    Let me share with you all a great blog developed by 'Joseph Yeates' where he describes how he used R script within Power BI, combining 3 algorithms to perform Predictive Analytics.

    I hope you enjoy


    <main class="site-main" id="main" role="main">

    Predictive Analytics with <g class="gr_ gr_310 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" id="310" data-gr-id="310">R in</g> Power <g class="gr_ gr_311 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="311" data-gr-id="311">BI</g>

    Power BI reports are typically used for descriptive analysis: a historical look of what has happened. But what if your report could visualize what is likely to happen and recommend what you can do about it now? This post will cover how to begin the shift from descriptive to <g class="gr_ gr_338 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="338" data-gr-id="338">predictive</g> analysis by demonstrating three ways to integrate predictive models from R into Power BI.

    What is being analyzed?

    The business case and data set I use to demonstrate this integration is based on Premier League Fantasy Football.

    The goal of fantasy football is to score as many points as possible over the season. Points are scored by Premier League players based on their performance in real life matches.

    Fantasy owners must pick a team of fifteen players while being subject to two mains constraints:

    • Price – The total cost of all fifteen players must not exceed £100 at the start of the season
    • Position – Each team must have two goalkeepers, five defenders, five midfielders and three forwards

    Player price is set by the market within the fantasy game: as more team owners buy a player their price will increase and as owners sell a player their price will decrease.

    This is an important dynamic to consider, as player price appreciation can result in a higher budget to spend on your team.

    Each owner is permitted one free transfer a week to update their team throughout the season.

    Descriptive Analysis

    To begin the analysis, I created a Power BI report. Building the visuals allowed for a quick exploration and understanding of the data. I decided to focus on three main features of the data set:

    • Points scored – what I am trying to maximize to win the game
    • Player price – one of the main constraints when selecting players
    • Player position – another constraint when selecting players

    My report contains four main visuals depicting the three features of the data outlined above. I also included some summary statistics to reference when interpreting these visuals.

    Predictive Analysis & Integration

    I created three different predictive models in R to gain a better understanding of the relationships and patterns in the data. I used the following techniques:

    • K-Means Clustering
    • Linear Regression
    • Classification Tree

    Each model will be used to illustrate a method to integrate predictive analytics into Power BI:

    1. R Script Data Connector
    2. Run R Script – Power Query Editor
    3. Custom R Script Visual

    R Script Data Connector

    The first model I integrated into the report was the K-Means clustering model. This technique created three clusters and assigned each player to one of these clusters based on the characteristics they exhibit.

    To connect to the results of this analysis, I navigated to the Get Data button on the ribbon. The R Script Data Connector is available under the Other section of the Get Data window.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Get-Data-285x300.png 285w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Get-Data-768x807.png 768w" sizes="(max-width: 860px) 100vw, 860px" />

    After selecting the R script option, I pasted the following script in the R Script window.

    ##Load Packages needed to run the R code
    ##Import Data
    file_path <- ##"file path of the source Excel file/Predictive Analytics with <g class="gr_ gr_308 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" id="308" data-gr-id="308">R in</g> Power BI.xlsx"
    data_set <- read_excel(file_path) %>%
    select(id, total_points, now_cost, selected_by_percent, minutes, goals_scored, assists, clean_sheets, pos)
    data_set$pos <- factor(data_set$pos, levels = c("GKP", "DEF", "MID", "FWD"), ordered = TRUE)
    ##Manipulate Data
    model_data <- data_set %>%
    filter(minutes > 180) %>%
    mutate(points_per_90 = total_points / minutes * 90) %>%
    select(-minutes, -total_points)
    ##Create Clusters
    fantasy_clusters <- kmeans(model_data[,2:6,8], 3)
    model_data$cluster <- as.factor(fantasy_clusters$cluster)

    https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Paste-Code-2-300x172.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Paste-Code-2-768x440.png 768w" sizes="(max-width: 989px) 100vw, 989px" />

    After pressing OK, the Navigator window loaded. I had the option to connect to the two objects I created in my script. I connected to the model_data table.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code-300x241.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code-768x617.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Results-of-Code.png 1160w" sizes="(max-width: 1024px) 100vw, 1024px" />

    I loaded this table to the report after removing some of the columns. I related my new model_data query to the existing Game_Stats query using the id column from both queries.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-300x134.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-768x344.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Loaded-Data-1600x716.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    I now had a column containing the cluster that each player belonged to. Even though this column was created in R, it is available to include in my visuals along with data from the other sources already in the report.

    I modified the scatter plot visual by using the K-Means clusters as the legend, replacing <g class="gr_ gr_324 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" id="324" data-gr-id="324">player</g> position. It is now easy to view and interpret the results of this model.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-300x134.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-768x342.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Data-Connector-Changed-Visual-1600x712.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    Run R Script

    The second model I integrated into the Power BI report is the Linear Regression model. This technique creates a linear equation to explain how the other variables impact points scored.

    I went to the Power Query editor and created a new query by referencing Game_Stats. I manipulated the data in preparation for running the model.

    To create the Linear Regression model in the Power Query editor, I navigated to the Scripts section on the Transform tab and selected the Run R Script button.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-300x160.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-768x410.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Run-R-Script-1600x854.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    This opened the Run R script window. The query defined in the previous Applied Steps was maintained and stored in an object called dataset. I used this as the starting point to run the R script.

    I pasted the following script into the pane.

    model_data <- dataset
    fantasy_lm <- lm(points_per_90 ~ now_cost + goals_scored + assists + clean_sheets, data = model_data)
    model_data$score_lm <- predict(fantasy_lm, model_data)

    https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Paste-Code-300x188.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Paste-Code-768x481.png 768w" sizes="(max-width: 960px) 100vw, 960px" />

    This simple script did two things:

    1. Created a linear model
    2. Created a new column containing the predicted points scored for each player based on the model
    https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-300x157.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-768x402.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-R-Applied-Step-1600x837.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    I loaded the query to my report and once again related it to Game_Stats using the id column.

    To integrate the results into the report, I modified the column chart to identify the ten players who outperformed their predicted points scored.

    This analysis is helpful in both identifying potential players to transfer into my team and identifying possible ways to improve the performance of the model.

    https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-300x131.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-768x336.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/Power-Query-Editor-Changed-Visual-1600x701.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    Custom R Script Visual

    The third (and final) model I integrated into the Power BI report is the Classification Tree model. This technique splits players into different categories based on an optimal set of yes or no questions.

    I started by removing the table visual from my report and adding an R script visual. This created a blank visual and opened the R script editor window.

    https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-300x133.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-768x342.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Create-Visual-1600x712.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    The first step before running a script is to add variables to the Values section in the Visualizations pane. This is important to remember, as the visual will not work unless it has data to visualize!

    I dragged the columns I needed into the visual from the Predictive Analytics query.

    My next step was to paste the following script into the R script editor.

    ##Load Packages
    model_data <- dataset
    ##Create <g class="gr_ gr_319 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="319" data-gr-id="319">Categorical</g>
    pp90_bin <- cut(model_data$points_per_90,
    breaks = c(0, 2.5, 5.5, 100),
    labels = c("Low", "Medium", "High"),
    right = FALSE,
    include.lowest = TRUE
    model_data$pp90_bin <- pp90_bin
    ##Decision Tree
    fantasy_tree <- rpart(pp90_bin ~ now_cost + selected_by_percent + goals_scored + assists + clean_sheets + pos, data = model_data, method = "class")
    view raw Classification Tree Model hosted <g class="gr_ gr_357 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="357" data-gr-id="357">withby</g> GitHub

    https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-300x136.png 300w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-768x348.png 768w, https://feathersanalytics.com/wp-content/uploads/2019/01/R-Visual-Pasted-Code-1600x724.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />

    Running the script generated the Classification Tree visual. This visual allows the report user to quickly identify the important splitting points in the data: which variables are important to identify high or low scoring players.

    Those are three that I integrate predictive analytics from R into Power BI! My session Predictive Analytics with <g class="gr_ gr_309 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" id="309" data-gr-id="309">R in</g> Power BI goes into more detail around the creation of the models and the key concepts and best practices to keep in mind when doing so.


    Pablo Moreno
    Data Intelligence Engineer