Key Takeaways:

Hi, I’m Steve with GrowthLoop. Over the last 2 years, I’ve used dbt daily to build data models on Redshift, Snowflake, and BigQuery.

This is the start of a series of articles on tips, best practices, and general knowledge about dbt and how to use it to build complex data models in your Cloud Data Warehouse. This series will focus on use in BigQuery but is generally applicable to your data warehouse of choice.

‍

First, what is dbt?

dbt (data build tool) is an open-source command-line tool that uses compilable SQL select statements to allow data analysts to transform the data in their data warehouses more effectively and with the best practices of software development. It the middle layer between your raw data, in whatever form you get it in, and your transformed data in the format you need.

‍

‍

There are all kinds of useful features that ship with dbt including an automatic DAG generator, automatic documentation platform, and many macros. For more on dbt, how to set it up, and what it is exactly, check out the official dbt website, or our upcoming posts.

‍

Let's Start with the Basics

Before diving into dbt and advanced SQL tricks, let’s get our SQL basics down. One important but easily overlooked aspect of SQL is the order in which a statement actually executes.

Like the PEMDAS monstrosity above, the way you write SQL can cause confusion for readers including yourself. To reduce confusion, keep the following logical processing order in mind.

FROM & JOIN: What tables to get the base data
WHERE: Filters on the base data
GROUP BY: Aggregates the base data
HAVING: Filters on the aggregate data
WINDOW Functions : Row-level aggregation
SELECT: Fetch data by column name and alias
DISTINCT: Deduplicate the fetched data
UNION: Append additional query's results
ORDER BY: Sort data
LIMIT & OFFSET: Discard all rows but X rows starting at Y

‍

So what does that mean?

There are several implications to the execution order and knowing them will make you a better SQL user.

Data reductions are best done at the top of the funnel ie JOIN and WHERE

Nine times out of ten if your query is inefficient, you have an issue in your JOINs or you have not filtered as much as you should in your WHEREs. JOINs, in particular, can be dangerous because, if you are not careful, you can inadvertently have multiple records satisfy your ON clauses that will cause duplicates.

Combat this by choosing the most appropriate JOIN type, vigorously reviewing your ON statements and defining exactly what information you need to include in your WHERE filters.

For example, if you wished to analyze recent customer payment trends, limit your query to be the smallest subset of data you can use to answer your question.

‍

👎 DON'T

select
c.name
, o.amount_paid
from customers as c
outer join orders as o
on c.id = o.customer_id

‍

👍 DO

select
c.name
, o.amount_paid
from customers as c
inner join orders as o
on c.id = o.customer_id
and c.start_date
where c.start_date > '2021-01-01'

‍

Aliased columns are not addressable in WHERE clauses

Because aliasing does not occur until step 6, you cannot use the new names in your WHERE statements. This has been the cause of many frustrations and long repeated lines in queries.

Combat this by aliasing in an upstream CTE or dbt model (more on this later).

Aliased columns are addressable in ORDER BY clauses

On the other hand, aliased columns are available in ORDER BY clauses because they are the penultimate operation to run.

A final note.

This order is generally the order in which queries will be run by your database engine however the engine’s optimizer will change this order if it detects an improvement in possible efficiency. CAST functions or WHERE clauses on columns used in joins often will be run first to save on compute resources.

There are many other optimizations that database engines use in production but, I am honestly not an expert in the deeper layers of optimization.

Look out for the next article where we’ll be diving into using dbt. Thanks for reading!

✌️ Steve

‍

Published On:

March 18, 2021

Updated On:

August 14, 2024

Read Time:

5 min

Want to learn more?

Book a Demo

Looking for guidance on your Data Warehouse?

Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.

Get Demo

Unlock the full value of your customer data

Get in touch with our team to learn how you can use GrowthLoop to activate data from your data warehouse to drive more revenue.

Schedule a free demo

Back to Blog

August 14, 2024

March 18, 2021

Cloud Data Warehouse

Coffee with Steve - dbt

Steve Pisani

Hi, I’m Steve with GrowthLoop. Over the last 2 years, I’ve used dbt daily to build data models on Redshift, Snowflake, and BigQuery.

‍

First, what is dbt?

‍

Let's Start with the Basics

Before diving into dbt and advanced SQL tricks, let’s get our SQL basics down. One important but easily overlooked aspect of SQL is the order in which a statement actually executes.

Like the PEMDAS monstrosity above, the way you write SQL can cause confusion for readers including yourself. To reduce confusion, keep the following logical processing order in mind.

FROM & JOIN: What tables to get the base data
WHERE: Filters on the base data
GROUP BY: Aggregates the base data
HAVING: Filters on the aggregate data
WINDOW Functions : Row-level aggregation
SELECT: Fetch data by column name and alias
DISTINCT: Deduplicate the fetched data
UNION: Append additional query's results
ORDER BY: Sort data
LIMIT & OFFSET: Discard all rows but X rows starting at Y

‍

So what does that mean?

There are several implications to the execution order and knowing them will make you a better SQL user.

Data reductions are best done at the top of the funnel ie JOIN and WHERE

Combat this by choosing the most appropriate JOIN type, vigorously reviewing your ON statements and defining exactly what information you need to include in your WHERE filters.

For example, if you wished to analyze recent customer payment trends, limit your query to be the smallest subset of data you can use to answer your question.

‍

👎 DON'T

select
c.name
, o.amount_paid
from customers as c
outer join orders as o
on c.id = o.customer_id

‍

👍 DO

select
c.name
, o.amount_paid
from customers as c
inner join orders as o
on c.id = o.customer_id
and c.start_date
where c.start_date > '2021-01-01'

‍

Aliased columns are not addressable in WHERE clauses

Because aliasing does not occur until step 6, you cannot use the new names in your WHERE statements. This has been the cause of many frustrations and long repeated lines in queries.

Combat this by aliasing in an upstream CTE or dbt model (more on this later).

Aliased columns are addressable in ORDER BY clauses

On the other hand, aliased columns are available in ORDER BY clauses because they are the penultimate operation to run.

A final note.

There are many other optimizations that database engines use in production but, I am honestly not an expert in the deeper layers of optimization.

Look out for the next article where we’ll be diving into using dbt. Thanks for reading!

✌️ Steve

‍

Looking for guidance on your Data Warehouse?

Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.

Get Demo

Coffee with Steve - dbt

Key Takeaways:

Table of Contents

First, what is dbt?

Let's Start with the Basics

So what does that mean?

A final note.

‍

More from the Blog

The 5 habits of highly effective marketing teams

4 steps marketers can take to embrace AI and ease the pressure

How marketers use autonomous AI (and where they still want control)

Looking for guidance on your Data Warehouse?

Unlock the full value of your customer data

Coffee with Steve - dbt

First, what is dbt?

Let's Start with the Basics

So what does that mean?

A final note.

‍

More from the Blog

The 5 habits of highly effective marketing teams

4 steps marketers can take to embrace AI and ease the pressure

How marketers use autonomous AI (and where they still want control)

Looking for guidance on your Data Warehouse?