Table of Contents

Hi, I’m Steve with GrowthLoop. Over the last 2 years, I’ve used dbt daily to build data models on Redshift, Snowflake, and BigQuery.

This is the start of a series of articles on tips, best practices, and general knowledge about dbt and how to use it to build complex data models in your Cloud Data Warehouse. This series will focus on use in BigQuery but is generally applicable to your data warehouse of choice.

First, what is dbt?

dbt (data build tool) is an open-source command-line tool that uses compilable SQL select statements to allow data analysts to transform the data in their data warehouses more effectively and with the best practices of software development. It the middle layer between your raw data, in whatever form you get it in, and your transformed data in the format you need.

The "T" in ELT

There are all kinds of useful features that ship with dbt including an automatic DAG generator, automatic documentation platform, and many macros. For more on dbt, how to set it up, and what it is exactly, check out the official dbt website, or our upcoming posts.

Let's Start with the Basics

Before diving into dbt and advanced SQL tricks, let’s get our SQL basics down. One important but easily overlooked aspect of SQL is the order in which a statement actually executes.


things can get confusing

Like the PEMDAS monstrosity above, the way you write SQL can cause confusion for readers including yourself. To reduce confusion, keep the following logical processing order in mind.

  1. FROM & JOIN: What tables to get the base data
  2. WHERE: Filters on the base data
  3. GROUP BY: Aggregates the base data
  4. HAVING: Filters on the aggregate data
  5. WINDOW Functions : Row-level aggregation
  6. SELECT: Fetch data by column name and alias
  7. DISTINCT: Deduplicate the fetched data
  8. UNION: Append additional query's results
  9. ORDER BY: Sort data
  10. LIMIT & OFFSET: Discard all rows but X rows starting at Y

So what does that mean?

There are several implications to the execution order and knowing them will make you a better SQL user.

Data reductions are best done at the top of the funnel ie JOIN and WHERE


Nine times out of ten if your query is inefficient, you have an issue in your JOINs or you have not filtered as much as you should in your WHEREs. JOINs, in particular, can be dangerous because, if you are not careful, you can inadvertently have multiple records satisfy your ON clauses that will cause duplicates.

Combat this by choosing the most appropriate JOIN type, vigorously reviewing your ON statements and defining exactly what information you need to include in your WHERE filters.

For example, if you wished to analyze recent customer payment trends, limit your query to be the smallest subset of data you can use to answer your question.


👎 DON'T

select
 c.name
 , o.amount_paid
from customers as c
outer join orders as o
 on c.id = o.customer_id

👍 DO

select
 c.name
 , o.amount_paid
from customers as c
inner join orders as o
 on c.id = o.customer_id
 and c.start_date
where c.start_date > '2021-01-01'

Aliased columns are not addressable in WHERE clauses

Because aliasing does not occur until step 6, you cannot use the new names in your WHERE statements. This has been the cause of many frustrations and long repeated lines in queries.

Combat this by aliasing in an upstream CTE or dbt model (more on this later).

Aliased columns are addressable in ORDER BY clauses

On the other hand, aliased columns are available in ORDER BY clauses because they are the penultimate operation to run.


A final note.

This order is generally the order in which queries will be run by your database engine however the engine’s optimizer will change this order if it detects an improvement in possible efficiency. CAST functions or WHERE clauses on columns used in joins often will be run first to save on compute resources.

There are many other optimizations that database engines use in production but, I am honestly not an expert in the deeper layers of optimization.

Look out for the next article where we’ll be diving into using dbt. Thanks for reading!

✌️ Steve

Published On:
March 18, 2021
Updated On:
August 14, 2024
Read Time:
5 min
Want to learn more?
Book a Demo
You May also like

More from the Blog

Customer stories
How Priceline democratized its data to power more personalized customer experiences

How Priceline democratized its data to power more personalized customer experiences

Learn how Priceline unified its data pipelines to democratize customer data and empower teams to deliver personalized experiences across channels.

Customer stories
Fast cars, fast marketers: How NASCAR builds rapid, targeted campaigns

Fast cars, fast marketers: How NASCAR builds rapid, targeted campaigns

Learn how NASCAR has built more than 1,900 targeted audiences (many in under a minute) while improving fan engagement across campaigns.

Looking for guidance on your Data Warehouse?

Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.

Get Demo

Unlock the full value of your customer data

Get in touch with our team to learn how you can use GrowthLoop to activate data from your data warehouse to drive more revenue.

Schedule a free demo
Back to Blog
Cloud Data Warehouse

Coffee with Steve - dbt

Steve Pisani

Steve Pisani

Hi, I’m Steve with GrowthLoop. Over the last 2 years, I’ve used dbt daily to build data models on Redshift, Snowflake, and BigQuery.

This is the start of a series of articles on tips, best practices, and general knowledge about dbt and how to use it to build complex data models in your Cloud Data Warehouse. This series will focus on use in BigQuery but is generally applicable to your data warehouse of choice.

First, what is dbt?

dbt (data build tool) is an open-source command-line tool that uses compilable SQL select statements to allow data analysts to transform the data in their data warehouses more effectively and with the best practices of software development. It the middle layer between your raw data, in whatever form you get it in, and your transformed data in the format you need.

The "T" in ELT

There are all kinds of useful features that ship with dbt including an automatic DAG generator, automatic documentation platform, and many macros. For more on dbt, how to set it up, and what it is exactly, check out the official dbt website, or our upcoming posts.

Let's Start with the Basics

Before diving into dbt and advanced SQL tricks, let’s get our SQL basics down. One important but easily overlooked aspect of SQL is the order in which a statement actually executes.


things can get confusing

Like the PEMDAS monstrosity above, the way you write SQL can cause confusion for readers including yourself. To reduce confusion, keep the following logical processing order in mind.

  1. FROM & JOIN: What tables to get the base data
  2. WHERE: Filters on the base data
  3. GROUP BY: Aggregates the base data
  4. HAVING: Filters on the aggregate data
  5. WINDOW Functions : Row-level aggregation
  6. SELECT: Fetch data by column name and alias
  7. DISTINCT: Deduplicate the fetched data
  8. UNION: Append additional query's results
  9. ORDER BY: Sort data
  10. LIMIT & OFFSET: Discard all rows but X rows starting at Y

So what does that mean?

There are several implications to the execution order and knowing them will make you a better SQL user.

Data reductions are best done at the top of the funnel ie JOIN and WHERE


Nine times out of ten if your query is inefficient, you have an issue in your JOINs or you have not filtered as much as you should in your WHEREs. JOINs, in particular, can be dangerous because, if you are not careful, you can inadvertently have multiple records satisfy your ON clauses that will cause duplicates.

Combat this by choosing the most appropriate JOIN type, vigorously reviewing your ON statements and defining exactly what information you need to include in your WHERE filters.

For example, if you wished to analyze recent customer payment trends, limit your query to be the smallest subset of data you can use to answer your question.


👎 DON'T

select
 c.name
 , o.amount_paid
from customers as c
outer join orders as o
 on c.id = o.customer_id

👍 DO

select
 c.name
 , o.amount_paid
from customers as c
inner join orders as o
 on c.id = o.customer_id
 and c.start_date
where c.start_date > '2021-01-01'

Aliased columns are not addressable in WHERE clauses

Because aliasing does not occur until step 6, you cannot use the new names in your WHERE statements. This has been the cause of many frustrations and long repeated lines in queries.

Combat this by aliasing in an upstream CTE or dbt model (more on this later).

Aliased columns are addressable in ORDER BY clauses

On the other hand, aliased columns are available in ORDER BY clauses because they are the penultimate operation to run.


A final note.

This order is generally the order in which queries will be run by your database engine however the engine’s optimizer will change this order if it detects an improvement in possible efficiency. CAST functions or WHERE clauses on columns used in joins often will be run first to save on compute resources.

There are many other optimizations that database engines use in production but, I am honestly not an expert in the deeper layers of optimization.

Look out for the next article where we’ll be diving into using dbt. Thanks for reading!

✌️ Steve

Share on social media: 

More from the Blog

Customer stories
How Priceline democratized its data to power more personalized customer experiences

How Priceline democratized its data to power more personalized customer experiences

Learn how Priceline unified its data pipelines to democratize customer data and empower teams to deliver personalized experiences across channels.

Customer stories
Fast cars, fast marketers: How NASCAR builds rapid, targeted campaigns

Fast cars, fast marketers: How NASCAR builds rapid, targeted campaigns

Learn how NASCAR has built more than 1,900 targeted audiences (many in under a minute) while improving fan engagement across campaigns.

Looking for guidance on your Data Warehouse?

Supercharge your favorite marketing and sales tools with intelligent customer audiences built in BigQuery, Snowflake, or Redshift.

Get Demo