Power BI Exchange

Please login or click SIGN UP FOR FREE to create your PowerBIUG account to join this user group.

Data types, data formats, and data category

By David Gerard posted May 18, 2020 08:24 PM

  

What's your type?


For a beginner in Power BI and in business intelligence in general, it is not easy to understand data types and differentiate them from data format. And then comes data category... So, what's all that about, really?

It's true that in spreadsheet applications like Excel, data type is not really of a bother to the user, as there is no functionality to define the type of the data present in cells. We just enter values and it's all fine (somehow). In fact, I only started to bother about data types when starting to use Power Query, Power BI and Power Pivot (data modelling tool in Excel).

Power Query is less friendly with typeless (or rather untyped) columns. Try creating a dataflow in the Power BI Service without specifying a data type for each column, and you will get the following message.
Warning message in PBI Online dataflows for untyped columns
Not cool, right? What's wrong with my data? It all looks good to me after all.

So, the Power Query team tried to make it easy by automatically detecting column types from the imported data.
Automatic detection of data types in Power Query
Though, sometimes it is more disturbing than anything else and you end up deleting that step Power Query creates for you. After all, it's my data and I'll give it the type I want.
Data type step automatically generated by Power Query for your data.


Back in Excel you can only disguise your data, playing with formatting options. Formatting is just a way of altering your data appearance, without changing anything of its value.
Formatting options in Excel

How does Excel represent data types?
Representation of data types in Excel

In Excel, data are naturally aligned based on their type. It gives a visual clue of the data type we are dealing with.

- Left aligned => text
- Right aligned => numbers and date and time
- Center aligned => Boolean (logical, true/false)

NOTE: It is a good practice to NOT alter the natural alignment of data in Excel.

A number has been formatted as text and appears aligned to the left

Notice, here I have altered the format of numbers, forcing them into text format, and thus naturally aligning to the left. Though I haven't altered its type. Beneath the disguise, they remain numbers, as we can see at the bottom that aggregation like SUM() and AVERAGE() are well performed.

Again, in Excel, data type is not well emphasized and enforced.

Why bother?

So, ok, in Excel we didn't really care about types, and Power Query takes care about it (in its own way), so, really, why bother?
Well, because you want control over your data, yep, total control. So, you get to know about data types.

And because you don't want to end up with errors like this in your data set. Your analysis would be affected.
Error message due to unpropped data type

And just in case you try to load your data into the data model of Excel...
Error detection from Excel data model

And, actually, when looking at the data model in Power Pivot, well Excel (or Power Pivot, who knows..) got rid of your errors.

Power Pivot ignores errors in the data model.
Furthermore, data type errors are common, and you will often find an error message similar to this one

Data type error message in Power Query

In Excel, adding a number to a date would have worked without any problem. So, when used to Excel and its way of handling data, you don't really care about types, but with Power Query, you have to very much care about what type it is.

So then what are data types?

There are several types of data, but let's say the most common ones are:
  • Text (or sometimes referred to as string)
  • Number (numeric values)
  • Date and time
  • Boolean, or logical (also referred to as true/false types)
Data type is very important, and most often you are required to declare the data type you are working with.
  • Text data type is very simple and straight forward. It's just plain text. Nothing much to say.
  • With number data types, it becomes more interesting because with numbers you can use it for aggregation and extract meaningful information. You can sum it, add or subtract a value, calculate averages, round it up or down, and many more things can be done.
  • With date and time data type, it's a wealth of information, because you can extract the time (hours and minutes), the date alone, the month, the year, the quarter, the day, the week number, the day of the week... And, whether in Excel, in Power BI or in any other tool you always find date and time specific functions (MONTH(), YEAR(), WEEKDAY(), WEEKNUM()...)
    You can add or substract days, months, years, hours to a date.
    In fact a date is just a number disguised into a date and time. If you change the format of a date in Excel you will get a bunch of meaningless digits.
    The numbers to the right of the decimal represent the time. The numbers to the left of the decimal represent the number of days starting from midnight of the 1 January of 1900. And today is the 43,969 after the start of the 1st day of the first month of 1900.
    A lot of information can be extracted from a date an time data type.
    NOTE: Dates can be formatted differently depending on your regional settings, American standards are different from European standards for instance. => And this is often source of errors and maybe source of misinterpretation.
  • For Boolean values, well, those are pretty simple, it's either TRUE, either FALSE. 👌🏼

Be my type!

In, Power Query, one of the very first functions (and step) you see appearing in the formula bar is Table.TransformColumnTypes() (though you never really write it manually)

Using a Table.() function to assign your columns a data type.And, whether in Excel, or in Power BI (Power Query, Power Pivot), it is a good practice to NOT mix data types. Don't even dare messing up with types! Kind warning 😋

List of data types available in Power Query

For that, Power Query offers you a variety of data types to choose from.

Notice that here we have more than the 4 different data types we discussed earlier.

Numbers are sub-divided into:

  • Whole number
  • Decimal Number
  • Currency (Fixed decimal number in Power Query for Power BI => Yep! Go wonder why 😕)
  • and Percentage

Date and Time format is also sub-divided into:

  • Date/Time
  • Date
  • Time
  • Date/Time/Timezone
  • and Duration

We can also notice a new data type => "Binary" -> 😳 What on earth is that -> Let's keep that for another time maybe. 😋

And, also Using Locale, as if we didn't have enough data types, right? We can review that one when we discuss date and time data types

What about data category?

Wait! What? Data what again? You're kidding right?

Nope, not kidding. It's a real thing 😁

Data category is found in the Properties group of the Column tool tab.

The list of categories available actually depends on the column data type selected.

The list of data categories available in Power BI


Alright, cool! That's in Power BI. What about Power Pivot, or Excel?
Power Pivot doesn't offer the option to categorize data, but Excel... hmmmm... does try something similar, though it is just bringing more to the confusion.

Categorization of data in Excel
Yep, that's right, the guys of the Excel team decided to name that group "Data Types" => Really confusing, right? Anyway, the majority of users don't even know those exist 😝

Enough!

That is it for now. That's enough for a good start. Hope you find it useful. Let me know if you have any question, comment, or suggestions.
Thanks for reading
2 comments
23 views

Permalink

Comments

Nov 24, 2020 02:36 PM

Thanks Ian for appreciating. Now data types have become a real thing with the latest updates in Excel

Nov 24, 2020 02:12 PM

This is great--Thank you