Deep dive with SQL Window Functions

In data and analytics, window functions have become essential tools for handling complex calculations across groups of rows. Unlike traditional aggregate functions, which well.. aggregate multiple rows into a single result, window functions allow you to perform detailed computations while still retaining each individual row. This feature makes them incredibly valuable for diving deep into business data and gaining nuanced insights.

What are Window Functions?

Window functions, sometimes called analytic functions, work over a set of rows that you define using the OVER clause. The key difference between window functions and aggregate functions is that window functions calculate results across a range of rows without collapsing them into a single output. This means you get to see the detailed results for each row while still performing broader calculations.

Window functions come in a few different groups:

  1. Ranking Functions: These functions assign a rank or position to each row within your group.
    • ROW_NUMBER(): Gives each row a unique sequential number within the group.
    • RANK(): Similar to ROW_NUMBER(), but if two rows are tied, they get the same rank, which might leave gaps in the sequence.
    • DENSE_RANK(): Like RANK(), but without the gaps—you get a continuous sequence.
  2. Aggregate Functions: These functions calculate a single result from a set of rows.
    • Common ones like SUM(), AVG(), and COUNT() can be used as window functions to create running totals, moving averages, and more.
  3. Value Functions: These return a value from another row in the set.
    • LAG(): Lets you look at data from a previous row.
    • LEAD(): Lets you look at data from a subsequent row.
    • FIRST_VALUE() and LAST_VALUE(): Give you the first or last value in your specified window.
  4. Analytical Functions: These are for more advanced statistical calculations.
    • CUME_DIST(): Calculates how a value ranks cumulatively within the group.
    • PERCENT_RANK(): Computes a value’s relative rank within the group.

Window Functions with Matillion

Setting up a window function in Matillion is straightforward, thanks to its dedicated components including RankLead/Lag, and Window Calculation, which make handling analytical tasks simple and efficient.

Let’s dive into a practical example. In the screenshot below, we start with a table named ‘profit_by_account’ that tracks profit data by account, date, and item. Our objective is to generate a monthly profit ranking for each account. This ranking will be a valuable resource for our data analysts, offering insights they can leverage in their BI tools to better understand account performance and analyze historical trends across accounts.

Analytic data pipeline with a Window Function in Matillion

We start by aggregating the data to the appropriate level for our rank calculation—removing item-level details and grouping the data by account and order month. After that, we bring in a Rank component (named "Account Rank by Month" in the screenshot).

Here’s where it gets easy: you just need to select the column(s) to partition by (in our case, ‘order_month’), decide how to order those partitions, and choose the rank function to apply. For this example, we’ll rank the ‘sum_profit’ within each monthly partition, giving us a clear view of account performance month by month. The screenshot below shows the configuration settings:

Window Function configuration in Matillion

Within Matillion's designer interface, you can take a data sample at any point to check how the transformation is working. Here's a screenshot of the data sample:

Window Function data sample in Matillion

Now it's very easy to find the top performing accounts, by month, in the downstream reporting and analytics.

Conclusion

Window functions are powerful tools in data analytics. They allow you to do complex calculations across sets of rows while retaining individual row details. In Matillion, these functions are made accessible through user-friendly components like Rank, Lead/Lag, and Window Calculation.

In our walkthrough, we applied the Rank component to generate a monthly profit ranking for each account. By aggregating the data and strategically using the Rank function within our partitions, we gained valuable insights into account performance over time. This process not only illustrates the ease of setting up window functions in Matillion but also highlights how these functions can provide deeper, more nuanced insights into your business data. Whether you're ranking, aggregating, or analyzing trends, window functions offer a versatile way to enhance your data analysis capabilities.

Try out Matillion for yourself with a free trial!

Zachary Ennis
Zachary Ennis

Sales Engineer Manager

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.