Power BI For Data Science

Please login or click SIGN UP FOR FREE to create your PowerBIUG account to join this user group.
  • 1.  Strategic decisions support through business driver analysis

    Top Contributor
    Posted Feb 02, 2018 03:34 PM

    Original blog posted by Christian_Berg 

    Having interviewed and worked with hundreds of business analysts and P&L owners, I am impressed by the capacity of human intuition. People who have been in <g class="gr_ gr_182 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="182" data-gr-id="182">role</g> for more than a couple of years can often balance multiple complex considerations and quickly make a good decision. This frees up time for the employee. Unfortunately, this spare time is more often used to increase the frequency at which they look at the standard reports, rather than deep analysis to try and challenge their already impressive understanding of the business. A common problem with this is confirmation bias and existing misunderstandings of true drivers can persist or a change in underlying fundamental is overlooked. In the next couple of posts, I will share techniques that I have seen successfully employed to counter common heuristics and to increase the chance of continuous improvement.

     

    Below is an example of a dataset similar to that of a retail chain which a regional store manager shared with me. The initial goal of that project was to change the main business report to update hourly instead of weekly. The store manager had made revenue growth and inventory management the top priorities a couple of years earlier<g class="gr_ gr_163 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation multiReplace" id="163" data-gr-id="163">; and</g> believed it had been very successful. Hourly reports on what the stores were ordering <g class="gr_ gr_164 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="164" data-gr-id="164">was</g> expected to lead to more accurate revenue predictions. The regional manager also wanted to be able to intervene in time if the implicit forecast was too low to meet all-up targets.

    An initial look at the business revealed healthy revenue growth and low variations in weeks on hand. A basic analysis however also showed that discounts had increased in size, eroding profits. For descriptive analytics like this, I find it helpful to create a couple of views with one KPI at a time, e.g. marginal profit or revenue and display it for every relevant business hierarchy. For instance, product hierarchy, geography and time. While that is an effective way to show how the business is doing it does little to explain why. For <g class="gr_ gr_178 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="178" data-gr-id="178">that</g> it tends to be more effective to show correlated KPIs for the same dimension, e.g. revenue vs. profit by <g class="gr_ gr_176 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" id="176" data-gr-id="176">store</g> over time.

    As a second step, I therefore frequently add a correlation plot. In addition to being simple to understand correlation plots are quick and easy to do. <g class="gr_ gr_156 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="156" data-gr-id="156">High</g> correlation between two variables means that they move in the same direction, at the same rate. Because it is a relative measure there is rarely a need for normalization. Below is an R correlation plot in Power BI, showing a couple of months' worth of data from the stores.

    cb-1-1.png

    The blue to <g class="gr_ gr_148 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="148" data-gr-id="148">red</g> column to the far right is the legend indicating whether two variables are strongly correlated (dark blue), have no correlation (white), or are inversely correlated (dark red). Large circles in a dark color are thus strongly correlated while small, lightly colored circles indicate weak correlation. The circles appear in the upper-right triangle in the matrix, and their corresponding correlation coefficient <g class="gr_ gr_147 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="147" data-gr-id="147">appear</g> in the lower-left. For instance, price and discount have a perfect negative correlation.

    cb-1-2-1.PNG

    The visual representation of this can be found on the row "Price" (second row) in the "Discount" column (fourth column). The numeric correlation coefficient is in the "Price" column (2nd) on the "Discount" row (4th).

     

    When we analyze the data using this visual a couple of things pop-out. Revenue and price have <g class="gr_ gr_190 gr-alert gr_gramm gr_hide gr_inline_cards gr_run_anim Grammar only-ins multiReplace replaceWithoutSep replaceWithoutSep" id="190" data-gr-id="190">low</g> negative correlation, i.e. a decrease in price will typically coincide with an increase in revenue. Revenue and inventory are strongly correlated so days that the stores carry a lot of inventory <g class="gr_ gr_194 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="194" data-gr-id="194">tend</g> to coincide with high revenue. The key insight for the regional manager <g class="gr_ gr_197 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation replaceWithoutSep" id="197" data-gr-id="197">however</g> was the -0.51 correlation between price and inventory. Before seeing this correlation plot the regional manager was unaware that the stores had, in response to stronger focus on inventory management, begun managing inventory primarily by changing the price. This was an unintended consequence of the directive which had been issued to improve store managers focus on accurate forecasting. Instead, the store managers had found an inventive way to achieve the same goal using price, thus hurting contribution margins and decreasing customer satisfaction. When the store managers found themselves carrying too much inventory they would simply run a large discount. <g class="gr_ gr_205 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="205" data-gr-id="205">Conversely</g> when they were running low they would increase prices. By only comparing revenue and inventory this behavior had gone unnoticed by the leadership team.

     

    Using correlation plots for business analytics

     

    You have probably heard that correlation does not equal causation. That is misleading since two things happening at the same time often have a cause and effect relationship but it is important to remember alternative explanations. Let us say that for your business daily revenue for product A and B shows high correlation, e.g. 0.75. So 75% of the changes in revenue for A is captured in the changes for B. That does not necessarily mean that people who buy A <g class="gr_ gr_168 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="168" data-gr-id="168">has</g> more value from B. Alternative explanations could be:

    • Buyers of B have more value from A,
    • Products A and B have similar seasonality (maybe they are both more popular on warm days),
    • Promotions for A has by chance coincided with promotions for B, etc.

    In other words, high correlation might be a coincidence (unlikely with a large enough set of data), might be because one causes the other, or because there is something else which affects both, e.g. product C. Typically the direction of the interaction is known or someone familiar with the business has a strong hypothesis. If not, a common second step is to investigate a difference in timing. If for <g class="gr_ gr_154 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="154" data-gr-id="154">instance</g> marketing investments happen on average 3 days before you observe an increase in revenue you can be sure that revenue does not drive marketing, but that it is the other way around.

    So, while correlation does not in itself prove causation, running a correlation analysis helps you validate your understanding of a business and tells you where to focus next. I typically let the correlations that might be interesting determine which different KPIs I show together for only a few business dimensions to let the data tell the story. In this case inventory by store and product was paired with <g class="gr_ gr_141 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="141" data-gr-id="141">average</g> price to help the regional manager drive a change in behavior.

     

    How to recreate the report

     

    R visuals will automatically work in Power BI when the report is published to the service but to get them to work in Power BI Desktop you need to have R installed on your computer. R, a third-party program, can be downloaded from CRAN. Once installed, in Power BI Desktop go to File -> Options and settings -> Options -> R Scripting and validate that the R home directory path is the same as what you used to install the base. Please use this link for detailed instructions. You may also want an R dedicated script editor, e.g. RStudio (the Windows installation file is the first link under "Installers for Supported Platforms".

     

    Once you have R installed you can use R visuals just as you would use any other native visual in Power BI with the addition of an R script that you paste in the editor. An example file is attached to this post.

     

    To create the correlation plot R visual:

    1. Open a <g class="gr_ gr_117 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="117" data-gr-id="117">pbix</g> file with the relevant data, e.g. my attached example file and click on the + sign in the bottom row to create a new page.
    2. Because an R visual will not display anything until you have added a script, I start by adding a simple standard visual like a table that I add the columns that I want to analyze <g class="gr_ gr_134 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="134" data-gr-id="134">to</g>. This is achieved by clicking on the desired visual under "Visualizations" and then selecting the columns of interest from the far-right menu.
      cb1.png
    3. Correlations are calculated row by row so getting the granularity right is important. In this <g class="gr_ gr_137 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="137" data-gr-id="137">case</g> I want to understand if a change in inventory for a particular day, product and store is correlated with the average price. To make this level of detail explicit I first added the three columns [Day], [Product] and [StoreId] to the table and for each verified that the aggregation option "Don't summarize" was used.

      cb3.pngcb2.png
    4. Similarly, I selected "Sum" for all the other columns except Price where I used "Average".
    5. Once I'm happy with the data input (this will be the raw data that R receives) I convert the visual to an R visual by selecting the visual in the canvas and then clicking on the R visualization icon
      cb4.pngcb5.png
    6. Add the script using the R script* editor (visible when you select the R visual directly <g class="gr_ gr_123 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="123" data-gr-id="123">in</g> the canvas). Note, you may have to expand it by clicking on the far-right arrow in the R script editor menu below the canvas.
      cb6.png
    7. Once expanded you should see the below, with space for the R script.
      cb7.png
      Copy/paste the script below into the script editor (any line that starts with # or ## is a comment only and is not executed):
      ## Use the corrplot library for the cor() function which calculates correlations and has the corrplot() function to visualize the output 
      library(corrplot)
      ## Get rid of the first three columns in dataset since they, in this example, contain categorical values
      dataset <- dataset[,-1:-3]
      ## Calculate correlations between the remaining columns
      m <- cor(dataset)
      ## Plot the result with a light gray background color
      corrplot.mixed(m, bg="light gray")
    8. Use the first arrow in the R script editor to manually execute the script (any change/interaction with the report will also cause the script to be rerun.
      cb8.png

    To recreate this visual for your own data the steps will be almost identical. It is important however to remember to get the granularity right. If you don't need to do any aggregation you would skip in this case the first three columns [Day], [Product] and [StoreId], set all of the other fields to "Don't summarize" and delete the line of code that excludes the first three columns from the calculation, namely "dataset <- dataset[,-1:-3]" and its preceding comment, i.e. so that the script looks like:

    ## Use the corrplot library for the cor() function which calculates correlations and has the corrplot() function to visualize the output 
    library(corrplot)
    ## Calculate correlations between the remaining columns
    m <- cor(dataset)
    ## Plot the result with a light gray background color
    corrplot.mixed(m, bg="light gray")

    Links and downloads

     

    Understanding the correlation coefficient: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

    Example of correlation plot in the R showcase: http://community.powerbi.com/t5/R-Script-Showcase/Correlation-Plot/m-p/58462

    Additional R installation instructions for Power BI: http://powerbi.tips/2016/09/using-r-visuals-in-power-bi/

     

    * Third-party programs. This software enables you to obtain software applications from other sources. Those applications are offered and distributed by third parties under their own license terms. Microsoft is not developing, distributing or licensing those applications to you, but instead, as a <g class="gr_ gr_131 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-del replaceWithoutSep" id="131" data-gr-id="131">convenience,</g> enables you to use this software to obtain those applications directly from the application providers.

    By using the software, you acknowledge and agree that you are obtaining the applications directly from the third-party providers and under separate license <g class="gr_ gr_119 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-del replaceWithoutSep" id="119" data-gr-id="119">terms,</g> and that it is your responsibility to locate, understand and comply with those license terms. Microsoft grants you no license rights for third-party software or applications that is obtained using this software.



    ------------------------------
    Pablo Moreno
    Business Analyst Snr Advisor
    Quest
    Panama
    69444206
    ------------------------------


  • 2.  RE: Strategic decisions support through business driver analysis

    Posted Apr 01, 2019 05:48 PM
    Buenas tardes Pablo, un gusto! Tendrás a la mano el archivo que utilizaste como prueba. SalesByDay.
    Favor adjuntarlo, gracias!

    ------------------------------
    Jose Wirtz
    52089818
    ------------------------------



  • 3.  RE: Strategic decisions support through business driver analysis

    Top Contributor
    Posted Apr 02, 2019 09:03 PM
    Hola @Jose Wirtz

    En el link: https://community.powerbi.com/t5/R-Script-Showcase/Correlation-Plot/m-p/58462 puedes encontrar el archivo PBI correspondiente.

    Saludos​

    ------------------------------
    Pablo Moreno
    Data Intelligence Engineer
    Panama
    pmorenogonzalez@gmail.com
    ------------------------------