Everything You Need To Know About DAX Coming From Excel Background
Looking for a guide to help you understand DAX from an Excel background?
While DAX might seem similar to Excel at first, it works in a completely different way.
At its best, DAX is like a crystal ball 🔮 that has the answers to all of your trickiest business problems. But at its worst, it can be really frustrating and confusing. I know, because I've seen both sides of it during my 10 years of writing DAX. With this guide, you will have all the tools you need to effectively use the crystal ball and avoid the frustration. I will first cover some basic concepts you need to understand, followed by multiple examples of those concepts at work, along with a visual representation of what is really happening under the surface.
The first concept you need to understand is the data model.
You cannot write DAX if you don't understand the data model behind it
In DAX, everything is built upon a data model, which is a collection of connected tables that reflect your business.
For example if you buy, store and sell products you have tables like ‘Sales’, ‘Purchase’, ‘Product’, ‘Stock’, ‘Customer’, ‘Supplier’ and ‘Date’ . This is how a data model usually looks like (expand the pictures to see them better):
Tables in a data model are divided into fact tables and dimension tables.
Fact tables contain the measurable metrics (=how many products did we sell), whereas dimension tables contain descriptive data (=what is the name of the customer we sold it to). Each column in a table has a datatype like text, whole number, decimal number or date.
Dimension tables are short and fat 🍕. For example ‘Customer’ is a dimension table, where you have one row for each customer but many columns describing that one customer. ‘Customer’ table might contain columns for customer key (=unique identifier for the customer), customer name, industry, contact person, salesperson, address, country, continent, payment term and many more..
Fact tables on the contrary are tall and fit 💪. For example ‘Sales’ is a fact table, and might contain columns such as invoice date, product key, customer key, sales quantity, sales amount and sales margin. But because each customer has many sales transactions (hopefully), fact tables are longer than dimension tables.
Dimension and fact tables are connected using relationships. These relationships should 99% of times flow only into one direction. If you’re taking one sided directions from your spouse at home like I am, you should have no problem in wrapping your head around this concept.
The lines and arrows ⬇️ you see in the data model represent the flow direction of the relationship (one ‘1’ customer → many ‘*’ sales). Tables are usually connected using “…Key” columns which is just a nerd term for a column that is used for connecting two tables.
For now, this is everything you need to understand about data models.
Stop creating calculated columns before you even start it, seriously
Next, there’s two categories of DAX calculations, calculated columns (which are like statues 🗽) and measures (which are like chameleons 🦎).
That is because calculated columns don’t change after you’ve created them, but measures dynamically change their result depending where you use them. Since calculated columns are physically stored as columns in your data model, they are only updated when you refresh the data in your model. Measures on the other hand, are not physically stored anywhere and are always updated on the fly when you for example change a filter selection.
The common mistake people coming from Excel do is to start adding calculated columns🗽 left and right, but that is not what you want to do ❌. The only times you should use a calculated column is if:
The measure takes forever to calculate dynamically (which is basically never).
Or you need a column for grouping or filtering data and cannot create it using Power Query (which is basically never).
So if it’s still unclear, 99% of times you want to create a measure🦎 and not a calculated column.
DAX functions work in two main ways to produce tables or values
Most DAX functions return a value (scalar) or a table as a result, and get to that result either by iterating or aggregating.
What that awful data jargon 🤓 really means, is that you can perform the calculation row by row (iterate) or by summarizing (aggregate). Coming from Excel mindset, SUM ( A1:A10 ) is an aggregator, A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 is an iterator.
This results into 4 buckets of functions:
Table computed iterating (FILTER, ADDCOLUMNS, TOPN)
Value computed iterating (SUMX, MINX, MAXX, AVERAGEX, RANKX)
Table computed aggregating (VALUES, DISTINCT, SUMMARIZE, ALL)
Value computed aggregating (SUM, MIN, MAX, COUNTROWS, DISTINCTCOUNT)
Out of these buckets, the thing that usually trips new people coming into DAX is table functions. The reason it can be hard to grasp is because you cannot easily display the result of a table function in front of your eyes. That means you need to start thinking in tables and “see” 👀 the result in your mind.
The icing on the cake 🍰 is the CALCULATE & CALCULATETABLE functions. You can use these for example to add, remove, or modify the filters of your calculation.
Measures in DAX are not calculated the way you think they are
Take a look at the highlighted calculation below and guess how it is calculated?
Simple sum of the values you see above it?
Nope.
DAX is blind 🧑🦯 and it’s not aware of the surrounding cells. Instead of cell references like in Excel, in DAX, each data point is calculated independently of each other. The correct way to think about DAX is to first identify which filters affect this specific data point, then apply those filters to the data model, and only in the last step perform the calculation.
Lets try to deconstruct this example in three steps:
Which filters affect the data point highlighted in red? — The filter might come from rows and columns inside the table, but also from the outside like from the filter pane, or other visuals that are cross filtering our data point. In this example, the only filter affecting the highlighted datapoint is the filter on Year = 2022, which comes from the filter pane.
Filter the the data model following relationships — Now your knowledge of the data model becomes useful since you know when you place a filter on ‘Date’[Year] it will travel from ‘Date’ table to ‘Sales’ table following the relationship.
Compute the result — Final step is to aggregate the sum of the remaining rows in column ‘Sales’[SalesAmount]
This might be easier to understand looking at the data model:
Time to start seeing tables in your mind
Next, let’s compute how many unique products have we sold. Again, focus on the highlighted data point on the total row.
The easiest way to solve this problem would be to use the DISTINCTCOUNT function but I will instead use the table function VALUES as I would like to get you thinking in tables 📅.
Let’s break it down following the same steps as with the first example:
Which filters affect the data point highlighted in red? — Same filters, only Year = 2022
Filter the the data model following relationships — ‘Date’ table filters ‘Sales’ table exactly the same way as in first example
Compute the result — To get to the result you need to know how many unique values are in the column ‘Sales’[ProductKey]. VALUES is a function that does just that. The result of VALUES will be a one column table that contains three values (2001, 2002, 2003). But because it is a table, you cannot display that result in a visualization. Now’s the time to “see” that table in your mind 🧐. To get to the final result you need to wrap the result of VALUES into COUNTROWS function that will compute how many rows are in this mystical table you cannot see .
Again, here is an illustration what is happening in the data model:
Manipulate the filters and meet your new best friend, variables
To continue our customer analysis, let’s compute what is the share of sales %, but this time, focusing on a specific customer, Acme Corp.
To get to that result we need to divide the sales of Acme Corp. with the sales for all customers. This is where variables come in handy. Variables are placeholders for intermediate calculation results that are computed once but can be reused many times. Variables can store both tables and scalar values.
Again, follow the same steps, but this time, instead of one result, we have 3 variables to compute:
Variable _TotalSales — The sales for Acme Corp.:
Which filters affect the data point highlighted in red? — You already know there is a filter Year = 2022, but this time there is also a filter coming from the row of the table CustomerName = “Acme Corp.”
Filter the the data model following relationships — Both ‘Date’ and ‘Customer’ tables will filter the ‘Sales’ table following the relationship.
Compute the result — Sum column ‘Sales’[SalesAmount] which will result into 6 105,00. This is value that will be stored in variable _TotalSales
The data model for variable _TotalSales looks like this:
Variable _SalesAllCustomers — The sales for all customers:
Which filters affect the data point highlighted in red? — First filter Year = 2022 and CustomerName = “Acme Corp.” BUT then remove any filter from customer table using CALCULATE with REMOVEFILTERS ( Customer ). This means the only filter left will be the filter on Year. The filter on Customer goes in and out. This is how you manipulate the filters like magic 🪄.
Filter the the data model following relationships — Only ‘Date’ table will filter ‘Sales’ table.
Compute the result — Sum column ‘Sales’[SalesAmount] which will result into 20 226,90. This is value that will be stored in variable _SalesAllCustomers.
The data model diagram for variable _SalesAllCustomers is exactly the same as the picture "Example 1: Total Sales data model. In case you need to refresh your memory, you can go back to that picture.
Varible _ShareOfSales — The share of sales % for Acme Corp.:
Variables are set in stone after calculation 🪨. For that reason there is no need to repeat steps 1, 2 and 3 for variable _ShareOfSales. You can simply reuse already computed variables and divide the _TotalSales variable with the _SalesAllCustomer variable to get to the end result of 30,18 %.
When aggregation does not work, it’s time to go row by row
In the final example of this article the aim is to compute the last stock value in the selected time period.
Since I’ve done a terrible job in transforming the data in Power Query, there is no column for stock value in the ‘Stock’ table 😧. There are only stock balance and stock price columns that you have to multiply to get the actual stock value. You could add a calculated column here, but resist the urge! Just don’t do it ⛔. This can be fixed with a measure!
If you try write a measure like ‘Stock’[StockBalance] * ‘Stock’[StockPrice], DAX is kind enough to give you an error:
“Sir, that is not measure, you need to either aggregate or iterate⚠️”
If you try instead SUM ( ‘Stock’[StockBalance] ) * SUM ( ‘Stock’[StockPrice] ), you will get a result, but it’s complete nonsense 💩. You need to go row by row, and SUMX function allows just that.
Here’s a breakdown of the solution using 2 variables:
Variable _MaxDate — First find what is the last date that has stock values:
Which filters affect the data point highlighted in red? — Filter from filter pane Year = 2022 AND remove any filters from product table using CALCULATE and REMOVEFILTERS ( Product ).
Filter the the data model following relationships — Only ‘Date’ table will filter ‘Stock’ table. (If there would be a filter on Product, CALCULATE would remove it.)
Compute the result — Max of column ‘Stock’[DateKey] which will result into 31.1.2022.
This is how the data model is filtered for variable _MaxDate:
Varible _StockValue - Compute sum of stock value, row by row, for the last date that has stock:
Which filters affect the data point highlighted in red? — Year = 2022 from filter pane AND ‘Date’[DateKey] = _MaxDate using CALCULATE.
Filter the the data model following relationships — Only ‘Date’ table will filter ‘Stock’ table.
Compute the result — Two rows remain in the Stock table, compute ‘Stock’[StockBalance] * ‘Stock’[StockPrice] row by row 🪜and then sum the result
Row 1: 284 * 0,80 = 227,20
Row 2: 5905 * 0,11 = 649,55
Total sum = 227,20 + 649,55 = 876,75
The model behind the _StockValue variable looks like this:
Summary
I hope this guide proved useful in understanding how DAX works.
If there's nothing else you take away from this article it has to be this: always remember DAX is a blind chameleon that has a crystal ball to answer all of your business problems 🧑🦯🦎🔮.