View My Drafts
My primary challenge: data transformation/transpose all data stored in a single column, need to create table/matrix: specifically transposing a certain number of rows to columns, then doing the same for the next iterative "x" rows transposed to columns (in this case x=16), all stacked to create a single table.
Using the Index column as a reference, I would eventually like to have the column of "Cell.Data.Element:Text" transformed to a table:
Row 1 (header) : [0:15]
Row 2 : [16:31]
Row 3 : [32:47]
Row 4 ….and so on
I hope this makes sense…
For consideration/background: Ultimately, I will use this process to loop through a folder of similar structured files (same column headings) all to create a singular table for all the data content in the folder. Part of the wrinkle with this is that while the files created by the source have a ".xls" extension, the underlying data is actually XML, so performing a typical folder read in/input process, PBI/power query gets hung up and claims the excel file is corrupted/etc …because it's not truly Excel or CSV. My thought process is that if I can figure out one of these XML transformations, possibly I could figure out a custom function to cycle through this process for all files in the specified folder.
When opening a source file in Excel, the headers start on row 4 and the last 8 rows are eliminated, however when using the XML data source connector in PBI, the only place I can find the content I'm looking for is in the "Row" Table: all the data I'm interested is in a single column called "Cell.Data.Element:Text". Per my screenshots and example file, I eliminated the top unnecessary rows and introduced an index column. I thought the index might be useful, plus if I can't figure this out directly in PBI I can at least take it out of the environment keeping the order with the index column and use VBA or PY or something else to do my transpose transformations until I figure out the Power Query method.
Also I'm grateful for any advice to my process, I'm willing to think outside the box, so any suggestions to research another method is greatly appreciated, or other resources to understanding XML.
There are so many good nuggets I took away from this. First off, I wasn't aware you can do a modulo transform within M/Power Query. Normally I would have extracted out of PBI and performed this in Excel/Py, then re-imported the transformed data. In addition - the fill-up step, and the grouping trick with concatenation; I wish I knew about that sooner too. I will likely be using all of these in the future, as my goal going forward is to try to contain as much repeatable process within PBI as I know how.
Part of my initial steps which led to these transform processes was because I could not get my VBA batch file/type conversion to work in a way that PBI could read-in a group of files with the original layout or a close approximation. I was having trouble with my batch conversion snippet I had used in the past, and was looking for a modification, or alternate way. I found this forum thread to be particularly useful:
of course, I had to change a few things, this link will help others that need to transform other file types, just swap out 'target' and 'save as' file type references:
While not technically part of the original thread title, the final step in the process I previously outlined; I wanted to batch read-in multiple files of a similar layout and stack in a single table. I have tried this with the .xlsx converted files and .csv converted files, the .csv files seem to work better, or at least at first try there were less steps involved. With regard to the 'buffer' rows, I was able to clean/eliminate the top (3) and bottom (8) rows in the 'transform sample file' steps in the query editor (created when selecting the folder input) so that those steps are applied to all files in the folder I'm reading-in.
I can not thank you enough for educating me, and helping me educate myself.