Style Guide
A comprehensive guide covering SQL best practices, from formatting and naming conventions to query structure and performance optimization. Focuses on readability, consistency, and maintainability in data engineering.
Core Principles
- Readability: Write clear, understandable SQL that's easy to maintain
- Consistency: Follow uniform formatting and naming patterns
- Simplicity: Choose straightforward solutions over clever ones
- Performance: Optimize when it matters, without sacrificing clarity
Code Formatting
SQLFluff Configuration
SQL style guides became less important after SQLFluff was created. SQLFluff is an opinionated linter that makes it easy to format and improve your SQL. This style guide assumes that you use SQLFluff and only discusses things that complement its rules.
We recommend ignoring these rules:
-
layout.long_lines
(LT05)
Long lines can improve readability, especially when longer names are necessary. -
structure.column_order
(ST06)
Sometimes custom column ordering can improve readability.
These rules can be disabled in your .sqlfluffignore file.
Naming Conventions
Use snake_case
Use snake_cas
for all identifiers as it provides optimal readability and aligns with analytical community standards.
Snowflake Naming Considerations
Snowflake converts unquoted identifiers to uppercase by default.
Recommendations:
- Write SQL in lowercase
- Accept Snowflake's uppercase conversion for internal objects
- Only use double quotes for lowercase when external systems require it
Query Structure
Join Patterns
Only use left join
and cross join
(the latter only in specific cases).
If you find yourself needing other join types, it's usually due to a Level of Detail issue.
Even when other joins might produce correct results, they often:
- Mask underlying data problems
- Silently remove duplicates without addressing their root cause
cross join
is useful when you need to expand a table's level of detail. For example, when creating one row for each combination of city and month:
Query Organization
CTEs Over Subqueries
Use CTEs (WITH clauses) instead of subqueries. CTEs provide:
- Better readability
- Easier debugging
- Step-by-step query logic
Building queries with CTEs makes debugging easier since you can verify the level of detail at each step.
Exception: Simple in
clauses can use subqueries:
Logical Operators
Use NOT for Negation
Prefer NOT for its clarity:
Documentation
Code Comments
Always comment to explain:
- Hardcoded values magic numbers
- Business logic
- Complex calculations
- Performance decisions
- Never leave commented-out code
- Use version control for code history
- Keeps files clean and maintainable
- Prevents confusion about current logic
Generated Tables
Instead of using generators inline, create dedicated tables for commonly needed data:
Snowflake Performance
- Keep queries simple
- Consider splitting large (many lines of SQL) models in to seperate models
- Pre-generate commonly needed lookup tables
- Consider materializating tables, itstead of using views
DBT considerations
DBT excels at transformation and modeling. Use it for its core strengths:
- Dependency management
- Testing
- Documentation
Key guidelines:
- Add
not_null
andunique
tests on nearly all models - If possible, prefer generated tables over seeds
- Avoid macros
- They make debugging difficult
- Jinja macro code is hard to read, write, understand, and debug
- Write plain SQL instead - it's easier to maintain
- Don't use DBT for
- Extract and load, or other data engineering tasks
- Database permission management