What's your type?
For a beginner in Power BI and in business intelligence in general, it is not easy to understand data types and differentiate them from data format. And then comes data category... So, what's all that about, really?
It's true that in spreadsheet applications like Excel, data type is not really of a bother to the user, as there is no functionality to define the type of the data present in cells. We just enter values and it's all fine (somehow). In fact, I only started to bother about data types when starting to use Power Query, Power BI and Power Pivot (data modelling tool in Excel).
Power Query is less friendly with typeless (or rather untyped) columns. Try creating a dataflow in the Power BI Service without specifying a data type for each column, and you will get the following message.
Not cool, right? What's wrong with my data? It all looks good to me after all.
So, the Power Query team tried to make it easy by automatically detecting column types from the imported data.
Though, sometimes it is more disturbing than anything else and you end up deleting that step Power Query creates for you. After all, it's my data and I'll give it the type I want.
Back in Excel you can only disguise your data, playing with formatting options. Formatting is just a way of altering your data appearance, without changing anything of its value.
How does Excel represent data types?
Notice, here I have altered the format of numbers, forcing them into text format, and thus naturally aligning to the left. Though I haven't altered its type. Beneath the disguise, they remain numbers, as we can see at the bottom that aggregation like SUM() and AVERAGE() are well performed.
Again, in Excel, data type is not well emphasized and enforced.
So, ok, in Excel we didn't really care about types, and Power Query takes care about it (in its own way), so, really, why bother?
Well, because you want control over your data, yep, total control. So, you get to know about data types.
And because you don't want to end up with errors like this in your data set. Your analysis would be affected.
And just in case you try to load your data into the data model of Excel...
And, actually, when looking at the data model in Power Pivot, well Excel (or Power Pivot, who knows..) got rid of your errors.
In Excel, adding a number to a date would have worked without any problem. So, when used to Excel and its way of handling data, you don't really care about types, but with Power Query, you have to very much care about what type it is.
So then what are data types?
There are several types of data, but let's say the most common ones are:
- Text (or sometimes referred to as string)
- Number (numeric values)
- Date and time
- Boolean, or logical (also referred to as true/false types)
Data type is very important, and most often you are required to declare the data type you are working with.
- Text data type is very simple and straight forward. It's just plain text. Nothing much to say.
- With number data types, it becomes more interesting because with numbers you can use it for aggregation and extract meaningful information. You can sum it, add or subtract a value, calculate averages, round it up or down, and many more things can be done.
- With date and time data type, it's a wealth of information, because you can extract the time (hours and minutes), the date alone, the month, the year, the quarter, the day, the week number, the day of the week... And, whether in Excel, in Power BI or in any other tool you always find date and time specific functions (MONTH(), YEAR(), WEEKDAY(), WEEKNUM()...)
You can add or substract days, months, years, hours to a date.
In fact a date is just a number disguised into a date and time. If you change the format of a date in Excel you will get a bunch of meaningless digits.
The numbers to the right of the decimal represent the time. The numbers to the left of the decimal represent the number of days starting from midnight of the 1 January of 1900. And today is the 43,969 after the start of the 1st day of the first month of 1900.
A lot of information can be extracted from a date an time data type.
NOTE: Dates can be formatted differently depending on your regional settings, American standards are different from European standards for instance. => And this is often source of errors and maybe source of misinterpretation.
- For Boolean values, well, those are pretty simple, it's either TRUE, either FALSE. 👌🏼
Be my type!
In, Power Query, one of the very first functions (and step) you see appearing in the formula bar is
(though you never really write it manually)
For that, Power Query offers you a variety of data types to choose from.
Notice that here we have more than the 4 different data types we discussed earlier.
Numbers are sub-divided into:
- Whole number
- Decimal Number
- Currency (Fixed decimal number in Power Query for Power BI => Yep! Go wonder why 😕)
- and Percentage
Date and Time format is also sub-divided into:
- and Duration
We can also notice a new data type => "Binary" -> 😳 What on earth is that -> Let's keep that for another time maybe. 😋
And, also Using Locale, as if we didn't have enough data types, right? We can review that one when we discuss date and time data types
What about data category?
Wait! What? Data what again? You're kidding right?
Nope, not kidding. It's a real thing 😁
Yep, that's right, the guys of the Excel team decided to name that group "Data Types" => Really confusing, right? Anyway, the majority of users don't even know those exist 😝
That is it for now. That's enough for a good start. Hope you find it useful. Let me know if you have any question, comment, or suggestions.
Thanks for reading