Dublin Power BI User Group

Please login or click SIGN UP FOR FREE to create your PowerBIUG account to join this user group.
Expand all | Collapse all

topic for research

  • 1.  topic for research

    Posted May 02, 2018 04:26 AM
    Hello all,

    A question for the group -

    Is there an area of BI/ML/AI you think would be worth doing deeper research into or a topic you think has been overlooked or neglected?

    I'm interested in discussing such topics with a view towards carrying out deeper research. Be interesting to know if people have a topic of interest.

    Regards

    G

    ------------------------------
    Ger McDonald
    BI Manager
    Ireland
    https://www.sita.aero/
    ------------------------------


  • 2.  RE: topic for research

    Silver Contributor
    Posted May 02, 2018 08:55 AM
    Ger, that's a great topic.
    Surely there are interest to develop in deep this areas.
    Actually  @Ben Watt  ​gave a really interesting session last week at #DataandBISummit​ where he present a IoT project that at a good extend  integrates BI->ML->AI.

    #PUGDUBLIN Anyone ​wants to add?

    ------------------------------
    Jose Almeida
    Data Analytics & Reporting Consultant
    Dublin
    bordalos.com
    ------------------------------



  • 3.  RE: topic for research

    Posted May 03, 2018 05:21 AM
    Data Quality.
    The quality of the data that underlies whatever we do determines the result. This is key in AI and ML but also in any reports or Data Model that we present for Self service to our business users, so that they can effectively run the business.
    I used to work for Musgrave and one Sunday, a till transaction from a store came through with a line that had 500,000+ tins of dog food (barcode in Qty field) with a cost of over 2.5M euro. This flowed through into all reports naturally, such that all volume and profit reports across the company were totally skewed for that week. It was a genuine fact from a trusted source, but we should still have cleaned the data before presenting it to the business.
    In my current company, we can the infrastructure in Data Centers to ID what the company has and what it needs to licence. The number of cores that are reported for the different instances of Oracle and SQL Server can have a huge impact on the annual IT budget. I would very much like to hear any thoughts on how to go about ensuring that the figure reported is sane and accurate, which it usually is, but it is hard for humans to reliably check 100's if not 1000's of data points?
    JK

    ------------------------------
    Joe Kelly
    Data Team Lead
    Sallins
    868384978
    ------------------------------



  • 4.  RE: topic for research

    Posted Jun 19, 2018 08:59 AM

    Hi Joe,

    You could build in validity rules on the back-end database (Example: Column A in Table X should not have value greater than 10000) that run on a schedule and check the quality of data. This can be then presented to the analysts in the form of a report before it goes to senior management so that any issues with loads are corrected beforehand. Happy to illustrate further if needed.

    Thanks,

    Sumit



    ------------------------------
    Sumit Sharma
    Business Intelligence and Analytics Consultant
    Version 1
    Dublin
    ------------------------------



  • 5.  RE: topic for research

    Posted Jun 20, 2018 03:54 AM
    ​Hi Joe,

    With some of the visualizations (e.g. gauge), you can set thresholds and trigger alerts if a threshold is breached. In your example, this could be based on the number of cores reported for a specific type of server. Obviously, it means that the data is already in the report/dashboard, but at least you'd get an alert that something wasn't right and remediate quickly.

    As already mentioned, if you wanted some kind of sanity check before data was consumed by Power BI that would need to be done before loading the data into the model, best place would be in the source application / data prep process. Alternatively, you could set some filters in Power Query to exclude obvious outlier values , but the risk is that you'd miss it altogether and the data would always be excluded which could be just as bad.

    ------------------------------
    Gavin Clark
    Process Manager Data Visualisation
    Pramerica
    Letterkenny
    749197911
    ------------------------------



  • 6.  RE: topic for research

    Posted Jun 20, 2018 06:24 AM
    Hi Sumit, Gavin,

    Tks for the comments. They are defo in the right area and what we tend to do currently in the different ETL processes. The problem being that this is very Dev work and we are hard-coding business rules into SSIS or PowerQuery and it takes dev work to change these and visibility and understanding can get lost in time and them become encrusted in the system and can no longer be safely changed !!  Anyone else inherit code or processes like this ?
    What we need is some way to allow the Business to continue to own and maintain these rules about the data, as they may change and even often, e.g. rules around food stuffs when legislation changes, or HiTi or packaging requirements for Warehouse goods, etc.
    SQL Server DQS makes a stab at this area but is still very basic and Informatica is waaaay too expensive. Anyone know anything in the middle area here?
    JK

    ------------------------------
    Joe Kelly
    Data Team Lead
    iQuate
    ------------------------------