Power BI Exchange

Please login or click SIGN UP FOR FREE to create your PowerBIUG account to join this user group.
Expand all | Collapse all

Remove HTML Tags in Power Query

  • 1.  Remove HTML Tags in Power Query

    Posted Aug 15, 2019 10:20 AM
    Hi! Extremely new to the Power world.

    I have an ODBC connection to my ERP system. One of my columns is coming in with the HTML tags in the cells. I have looked everywhere to understand how to remove the unwanted tags but have not found a solution. See below example, I only need the highlighted text.

    <span CreatedFromTXTextControl="1" style="font-family:Times New Roman;font-size:8pt;">

    <p lang="en-US" style="text-indent:0pt;margin-left:0pt;margin-top:0pt;margin-bottom:6pt;margin-right:0pt;line-height:100%;" xmlns="http://www.w3.org/1999/xhtml"><span style="font-family:'Times New Roman';font-size:8pt;">Trusses Updated KME</span></p>

    <p lang="en-US" style="text-indent:0pt;margin-left:0pt;margin-top:0pt;margin-bottom:6pt;margin-right:0pt;line-height:100%;" xmlns="http://www.w3.org/1999/xhtml"><span style="font-family:'Times New Roman';font-size:8pt;">Converting the QT master to Metal. 032019</span></p>

    <p lang="en-US" style="text-indent:0pt;margin-left:0pt;margin-top:0pt;margin-bottom:6pt;margin-right:0pt;line-height:100%;" xmlns="http://www.w3.org/1999/xhtml"><span style="font-family:'Times New Roman';font-size:8pt;">Waiting on Truss update KME 4-8-19</span></p>

    </span>


    I appreciate the help in advance! Thank you!!



    ------------------------------
    Maja Jusupovic
    Deltek System Administrator & Analyst
    Jacksonville FL
    9048215200
    ------------------------------


  • 2.  RE: Remove HTML Tags in Power Query

    Bronze Contributor
    Posted 30 days ago
    ​Open Query editor. In the Queries list, on the left, do a right mouse button and then new query and then blank query.
    Then paste the code below and then click the function button, the "fx" at the left side of the function bar.


    = let
       TextFromHtml = (HTML as any) =>
    let
        Source = if HTML = null then
            ""
        else
            Text.From(HTML),
        SplitAny = Text.SplitAny(Source,"<>"),
        ListAlternate = List.Alternate(SplitAny,1,1,1),
        ListSelect = List.Select(ListAlternate, each _<>""),
        TextCombine = Text.Combine(ListSelect, "")
    in
        TextCombine
    in
        TextFromHtml
    Then in the data set that the HTML tags are go to add column and Invoke Custom function.  Put in a name, this will be the name of the new column with the tags removed, and select the Function that you just created. Then select the column with the text that has the tags and then you are done.
    This may leave some tags so just take a look at the new column. I replace any I find with a blank and that is that.

    ------------------------------
    Niall Ginnity
    Project Manager
    Kells
    046 925 1297
    ------------------------------