FrameworkStyle Guide

Style Guide

A comprehensive guide covering SQL best practices, from formatting and naming conventions to query structure and performance optimization. Focuses on readability, consistency, and maintainability in data engineering.

WIP
This article is a work in progress

Core Principles

  • Readability: Write clear, understandable SQL that's easy to maintain
  • Consistency: Follow uniform formatting and naming patterns
  • Simplicity: Choose straightforward solutions over clever ones
  • Performance: Optimize when it matters, without sacrificing clarity

Code Formatting

SQLFluff Configuration

SQL style guides became less important after SQLFluff was created. SQLFluff is an opinionated linter that makes it easy to format and improve your SQL. This style guide assumes that you use SQLFluff and only discusses things that complement its rules.

We recommend ignoring these rules:

  1. layout.long_lines (LT05)
    Long lines can improve readability, especially when longer names are necessary.

  2. structure.column_order (ST06)
    Sometimes custom column ordering can improve readability.

These rules can be disabled in your .sqlfluffignore file.

Naming Conventions

Use snake_case

Use snake_cas for all identifiers as it provides optimal readability and aligns with analytical community standards.

-- Do
customer_account_id
order_status_type
transaction_date
 
-- Don't
CustomerAccountId    -- Pascal case
customer_account_ID  -- Mixed case
customeraccountid    -- No word separation
CUSTOMER_ACCOUNT_ID  -- Screaming snake case
CUSTOMERACCOUNTID    -- All caps

Snowflake Naming Considerations

Snowflake converts unquoted identifiers to uppercase by default.

Recommendations:

  • Write SQL in lowercase
  • Accept Snowflake's uppercase conversion for internal objects
  • Only use double quotes for lowercase when external systems require it
-- Do
select
   amount,
   created_at
from transactions
 
-- Do: Only when external systems need lowercase
select
   amount as "amount",
   created_at as "created_at"
from transactions

Query Structure

Join Patterns

Only use left join and cross join (the latter only in specific cases).

If you find yourself needing other join types, it's usually due to a Level of Detail issue.

Even when other joins might produce correct results, they often:

  • Mask underlying data problems
  • Silently remove duplicates without addressing their root cause
-- Do: Standard left join pattern
select
   orders.order_id,
   customers.customer_name
from orders
left join customers
   on orders.customer_id = customers.customer_id

cross join is useful when you need to expand a table's level of detail. For example, when creating one row for each combination of city and month:

-- Do: Cross join for data expansion
select
   city.city_name,
   month.month_name
from city
cross join month

Query Organization

CTEs Over Subqueries

Use CTEs (WITH clauses) instead of subqueries. CTEs provide:

  • Better readability
  • Easier debugging
  • Step-by-step query logic

Building queries with CTEs makes debugging easier since you can verify the level of detail at each step.

-- Do
with monthly_sales as (
   select
       date_trunc('month', order_date) as month,
       sum(amount) as total_sales
   from orders
   group by date_trunc('month', order_date)
)
select
   month,
   total_sales,
   lag(total_sales) over (order by month) as previous_month_sales
from monthly_sales
 
-- Don't
select
   month,
   total_sales,
   lag(total_sales) over (order by month) as previous_month_sales
from (
   select
       date_trunc('month', order_date) as month,
       sum(amount) as total_sales
   from orders
   group by date_trunc('month', order_date)
) monthly_sales

Exception: Simple in clauses can use subqueries:

-- Do: Simple IN clause is fine
with filtered_customers as (
  select customer_id
  from customers
  where customer_type = 'Demo'
)
select
  invoice_id,
  invoice_number
from invoices
where customer_id not in (select customer_id from filtered_customers)

Logical Operators

Use NOT for Negation

Prefer NOT for its clarity:

-- Do
where not user_id = 1
where not status in ('deleted', 'archived')
 
-- Don't
where user_id != 1
where user_id <> 1

Documentation

Code Comments

Always comment to explain:

  • Hardcoded values magic numbers
  • Business logic
  • Complex calculations
  • Performance decisions
  1. Never leave commented-out code
  • Use version control for code history
  • Keeps files clean and maintainable
  • Prevents confusion about current logic
-- Do
select
   user_id,
   status
from users
where not user_id in (4, 7)  -- Exclude internal test accounts
   and created_at >= '2023-01-01'  -- System migration date
 
-- Don't
select
   user_id,
   status
from users
where not user_id in (4, 7)
   -- and created_at >= '2023-01-01'

Generated Tables

Instead of using generators inline, create dedicated tables for commonly needed data:

create table all_numbers as (
    select row_number() over (order by seq4()) as number
    from
        table(generator(rowcount => 1000))
)
create table all_dates as
select dateadd('day', seq4(), '2022-01-01')::date as all_date
from
    table(generator(rowcount => 10000)) -- using a large number to cover the date range
where
    dateadd('day', seq4(), '2022-01-01')::date between '2022-01-01' and '2025-12-31';
create table all_first_of_month as
select date_trunc('month', dateadd('month', seq4(), date '2022-01-01'))::date as first_of_month
from
    table(generator(rowcount => 1000)) -- using a large number to cover the date range
where
    date_trunc('month', dateadd('month', seq4(), date '2022-01-01'))::date <= date '2025-12-31';

Snowflake Performance

  1. Keep queries simple
  2. Consider splitting large (many lines of SQL) models in to seperate models
  3. Pre-generate commonly needed lookup tables
  4. Consider materializating tables, itstead of using views

DBT considerations

DBT excels at transformation and modeling. Use it for its core strengths:

  • Dependency management
  • Testing
  • Documentation

Key guidelines:

  1. Add not_null and unique tests on nearly all models
  2. If possible, prefer generated tables over seeds
  3. Avoid macros
    • They make debugging difficult
    • Jinja macro code is hard to read, write, understand, and debug
    • Write plain SQL instead - it's easier to maintain
  4. Don't use DBT for
    • Extract and load, or other data engineering tasks
    • Database permission management