Roadmap / SQL Deep Dive
A comprehensive SQL roadmap covering SELECT basics, filtering, aggregation, JOINs, subqueries, CTEs, window functions, schema design, indexing, transactions, views, stored procedures, JSON support, and query performance — the foundational data skill every developer needs.
Step 1 • Setup
SQL is a standard — the syntax is 90% the same across PostgreSQL, MySQL, SQLite, and SQL Server. PostgreSQL is the best choice for learning and production (open source, most feature-rich). Install options: Postgres.app (macOS), Docker (cross-platform — docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres), or use a free Neon/Supabase cloud database. GUI tools: pgAdmin (official, feature-rich), TablePlus (cleaner UI, recommended), DBeaver (supports all databases). Connect with psql (command line — \l to list databases, \c to connect, \dt to list tables). SQLite is excellent for learning SQL offline — no server needed (sqlite3 CLI or DB Browser for SQLite).
Step 2 • fundamentals
Use Data Definition Language to create and modify database objects. CREATE TABLE with column definitions and inline constraints. ALTER TABLE to ADD COLUMN, DROP COLUMN, RENAME COLUMN, and modify data types. DROP TABLE and TRUNCATE TABLE with their differences explained. CREATE/DROP SCHEMA. Use IF EXISTS / IF NOT EXISTS for idempotent migration scripts.
Step 3 • Foundations
SELECT is the most important SQL statement. Anatomy: SELECT columns FROM table WHERE condition. SELECT * (all columns — avoid in production, fetches unnecessary data), SELECT specific columns, column aliases (AS), computed columns (price * quantity AS total). WHERE clause filters rows — evaluated before SELECT. DISTINCT removes duplicate rows from results. LIMIT (PostgreSQL/MySQL) or TOP (SQL Server) restricts the number of rows returned. Execution order matters for understanding SQL — FROM is processed first, then WHERE, then SELECT, then ORDER BY, then LIMIT. Write SQL in the evaluation order, not the syntax order, to reason about it correctly.
Step 4 • Foundations
WHERE filters rows with comparison operators (=, !=, <, >, <=, >=), logical operators (AND, OR, NOT — AND has higher precedence than OR, use parentheses), BETWEEN x AND y (inclusive), IN (value1, value2, ...) for membership, LIKE 'pattern%' for string matching (% any sequence, _ any single char), ILIKE for case-insensitive LIKE in PostgreSQL, IS NULL / IS NOT NULL (never use = NULL — always IS NULL), and SIMILAR TO / ~ for regex. ORDER BY col ASC/DESC, NULLS FIRST/NULLS LAST (NULLs sort last in ASC by default in PostgreSQL). Multiple sort columns: ORDER BY last_name, first_name. Filtering and sorting are the foundation of all queries.
Step 5 • Foundations
Built-in functions transform column values. String: UPPER/LOWER, LENGTH, TRIM/LTRIM/RTRIM, SUBSTRING(str FROM pos FOR len), POSITION, REPLACE, CONCAT or || operator, SPLIT_PART (PostgreSQL). Numeric: ROUND(val, decimals), CEIL, FLOOR, ABS, MOD, RANDOM(). Date/time: NOW(), CURRENT_DATE, EXTRACT(year FROM date), AGE(date), DATE_TRUNC('month', timestamp), INTERVAL arithmetic (created_at + INTERVAL '7 days'). Conditional: COALESCE(val1, val2) returns first non-NULL — use for default values, NULLIF(a, b) returns NULL if a=b (useful for avoiding divide-by-zero), CASE WHEN condition THEN result ELSE default END (SQL's if/else — can be used anywhere an expression is valid).
Step 6 • Foundations
Aggregate functions collapse multiple rows into a single value. COUNT(*) counts all rows, COUNT(column) counts non-NULL values, COUNT(DISTINCT column) counts unique values. SUM, AVG, MIN, MAX work on numeric (and date) columns. GROUP BY groups rows by one or more columns — every non-aggregate column in SELECT must appear in GROUP BY. HAVING filters groups after aggregation (WHERE filters rows before aggregation). Common pattern: SELECT category, COUNT(*) as n, AVG(price) as avg_price FROM products GROUP BY category HAVING COUNT(*) > 10 ORDER BY avg_price DESC. Aggregate functions ignore NULLs (except COUNT(*)).
Step 7 • Core Skills
JOINs combine rows from two or more tables based on a related column. INNER JOIN returns only rows with matches in both tables (most common). LEFT OUTER JOIN returns all rows from the left table, with NULLs for unmatched rows on the right (critical for 'find customers who never rented'). RIGHT JOIN is the same as LEFT JOIN with tables swapped (rarely used — just swap the tables instead). FULL OUTER JOIN returns all rows from both tables. CROSS JOIN is a cartesian product — every combination (use carefully — 1000x1000 rows = 1M rows). Self join — joining a table to itself (managers and their employees in the same table). Multiple joins: chain them, each ON clause matches one relationship.
Step 8 • Core Skills
A subquery is a query inside another query. Scalar subqueries return a single value — usable anywhere an expression is valid (SELECT, WHERE, HAVING). List subqueries with IN: WHERE id IN (SELECT customer_id FROM ...). Correlated subqueries reference the outer query — they run once per outer row (can be slow — often rewritten as JOINs). EXISTS/NOT EXISTS check if a subquery returns any rows (often faster than IN for large sets). Derived tables in FROM: SELECT * FROM (SELECT ...) AS alias. Lateral joins (LATERAL or CROSS JOIN LATERAL) allow a subquery to reference columns from the preceding table — like a correlated subquery in FROM. Subqueries make complex logic readable but can be slow — always check EXPLAIN.
Step 9 • Core Skills
Common Table Expressions (CTEs) are named temporary result sets defined with WITH — they make complex queries readable by breaking them into named steps. WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Multiple CTEs: WITH cte1 AS (...), cte2 AS (...) SELECT ... — each CTE can reference previous ones. CTEs are not materialized by default in PostgreSQL (they're inlined — use WITH MATERIALIZED to force caching). Recursive CTEs (WITH RECURSIVE) enable traversing hierarchical data — org charts, category trees, path finding. Recursive structure: base case UNION ALL recursive case (referencing the CTE itself). Use a depth counter to prevent infinite loops. CTEs replace complex nested subqueries.
Step 10 • Advanced Queries
Window functions compute a value across a set of rows related to the current row — without collapsing them like GROUP BY. OVER (PARTITION BY ... ORDER BY ...) defines the window. Ranking: ROW_NUMBER() (unique ranks), RANK() (gaps for ties), DENSE_RANK() (no gaps), NTILE(n) (divide into n buckets). Offset: LAG(col, n) (value from n rows before), LEAD(col, n) (value from n rows after) — perfect for period-over-period comparison. Aggregate: SUM(col) OVER (...) for running totals, AVG with a frame (ROWS BETWEEN 7 PRECEDING AND CURRENT ROW for a 7-day moving average). Window functions execute after WHERE and GROUP BY — they see the final filtered/aggregated rows.
Step 11 • Database Design
Schema design determines how maintainable and performant a database is. Data types: choose carefully — INTEGER vs BIGINT (2B vs 9 quintillion), TEXT vs VARCHAR(n) (PostgreSQL treats them identically internally — TEXT is fine), TIMESTAMP WITH TIME ZONE vs WITHOUT (always use WITH TIME ZONE), DECIMAL(p,s) for money (never FLOAT — rounding errors), UUID for distributed primary keys. Constraints enforce data integrity: PRIMARY KEY, FOREIGN KEY with ON DELETE CASCADE/SET NULL/RESTRICT, UNIQUE, NOT NULL, CHECK (age > 0), DEFAULT. Normalization — 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies). Denormalization for read performance. Junction tables for many-to-many relationships.
Step 12 • Schema
Constraints enforce data integrity at the database level. PRIMARY KEY — unique, not null. FOREIGN KEY — referential integrity with ON DELETE CASCADE, SET NULL, or RESTRICT. UNIQUE — one or more columns must be unique (multiple NULLs are allowed). NOT NULL. CHECK — arbitrary boolean expression (CHECK (age >= 0), CHECK (status IN ('active','inactive'))). DEFAULT. Name your constraints: CONSTRAINT fk_orders_user FOREIGN KEY (user_id) REFERENCES users(id) — named constraints give better error messages. Deferrable constraints (DEFERRABLE INITIALLY DEFERRED) — useful for circular foreign key references during bulk inserts. Understand column-level vs table-level constraint syntax for composite keys.
Step 13 • Performance
An index is a separate data structure that lets the database find rows without a full table scan. B-tree index (default) — supports =, <, >, BETWEEN, LIKE 'prefix%', ORDER BY. Hash index — equality only (=), faster than B-tree for that case. Composite index (a, b) — can satisfy queries on (a), (a, b), but NOT (b) alone. Partial index — index only rows matching a condition (CREATE INDEX ON orders(user_id) WHERE status = 'pending') — smaller, faster. Covering index (INCLUDE clause) — index includes extra columns to satisfy a query without touching the table. CREATE INDEX CONCURRENTLY to add an index without locking the table. EXPLAIN ANALYZE shows whether an index is used (Index Scan vs Seq Scan). Every index slows writes — don't over-index.
Step 14 • Reliability
A transaction groups SQL statements so they all succeed or all fail — atomically. BEGIN; ... COMMIT; or ROLLBACK; if something goes wrong. ACID: Atomicity (all or nothing), Consistency (constraints always satisfied), Isolation (concurrent transactions don't see each other's intermediate state), Durability (committed data survives crashes). Isolation levels trade correctness for performance — READ UNCOMMITTED (dirty reads possible), READ COMMITTED (default in PostgreSQL — no dirty reads), REPEATABLE READ (snapshot at transaction start), SERIALIZABLE (strongest — full isolation, may get serialization errors requiring retry). SELECT ... FOR UPDATE locks selected rows for the transaction duration — prevents concurrent modification. SAVEPOINT for nested rollbacks.
Step 15 • Database Objects
A view is a saved SELECT query — CREATE VIEW customer_summary AS SELECT ... — that looks like a table to the application. Views simplify complex joins, enforce row/column security (grant access to a view but not the underlying tables), and provide a stable API even when underlying tables change. Simple views are often updatable (INSERT/UPDATE/DELETE on the view modifies the base table). Materialized views store the query result physically — CREATE MATERIALIZED VIEW ... AS SELECT — and must be refreshed (REFRESH MATERIALIZED VIEW CONCURRENTLY to refresh without locking reads). Use materialized views for expensive aggregations that are queried frequently but don't need real-time data (e.g., dashboard stats refreshed every 5 minutes).
Step 16 • Advanced Features
PostgreSQL has first-class JSON support. JSON stores text and validates structure on insert. JSONB stores parsed binary — faster queries, can be indexed, loses whitespace and key order. Operators: -> (get field as JSON), ->> (get field as text — most useful), #> (path access — '{a,b}'), #>>. Functions: jsonb_array_elements() to expand a JSON array into rows, jsonb_each() to expand an object into key-value rows, jsonb_set() to update a nested field, json_agg() to aggregate rows into a JSON array, jsonb_build_object() to construct JSON in SELECT. Index JSONB with GIN for contains (@>) and exists (?) operators. Use JSONB for truly dynamic attributes — not as a replacement for proper relational schema.
Step 17 • Performance
EXPLAIN shows the query plan, EXPLAIN ANALYZE executes and shows actual timing and row counts (compare estimated vs actual rows — large discrepancies indicate stale statistics). Read plans from inside out: Seq Scan (bad for large tables), Index Scan (uses an index), Index Only Scan (covering index — best), Bitmap Heap Scan (multiple index lookups combined), Hash Join/Nested Loop/Merge Join (join strategies). VACUUM reclaims dead tuple space (PostgreSQL MVCC leaves dead rows after UPDATEs/DELETEs — autovacuum handles this automatically but tune its aggressiveness). ANALYZE updates table statistics used by the query planner. pg_stat_user_tables shows table bloat. pg_stat_statements extension for tracking slow queries across all executions.
Privacy choices
We use optional analytical tools only if you accept. You can change this later from "Privacy settings" in the footer.