As databases grow in size and complexity, performance optimization becomes a critical skill for database developers, administrators, and analysts. Poorly written queries and inefficient schema designs can lead to slow response times, high resource usage, and poor user experience. In this chapter, we’ll explore how to analyze, optimize, and tune SQL performance across various scenarios.
Table of Contents
We will cover:
- Understanding performance bottlenecks
- Indexing strategies
- Query optimization techniques
- Schema design considerations
- Tools for profiling queries
- Real-world examples and best practices
Identifying Performance Bottlenecks
Before optimizing, we must first detect performance issues.
Symptoms:
- Slow-running queries
- High CPU or memory usage
- Lock contention and deadlocks
- Delayed report generation or API responses
Tools to Analyze Performance:
EXPLAIN
,EXPLAIN ANALYZE
(PostgreSQL, MySQL)- Query Store (SQL Server)
- Execution Plans (graphical in SQL Server, Oracle)
pg_stat_statements
in PostgreSQL- MySQL’s
slow_query_log
Example:
EXPLAIN ANALYZE
SELECT * FROM Orders WHERE CustomerID = 1001;
This provides insight into the query plan and estimated cost.
Indexing – The Foundation of Speed
What Are Indexes?
Indexes are data structures (typically B-trees) that store references to table rows for fast lookup.
When to Use Indexes:
- Columns used in WHERE, JOIN, or ORDER BY clauses
- Foreign keys
- Frequently filtered or sorted columns
Types of Indexes:
Type | Description |
---|---|
Single-column | Index on one column |
Composite | Index on multiple columns (order matters) |
Unique | Ensures no duplicate values |
Full-text | Used for text search |
Partial | Indexes only rows that meet a condition |
Covering Index | Includes all columns needed by a query |
Creating Indexes:
CREATE INDEX idx_customer_id ON Orders(CustomerID);
Composite Index Example:
CREATE INDEX idx_order_customer_date ON Orders(CustomerID, OrderDate);
Avoiding Over-Indexing:
Too many indexes can:
- Slow down INSERT/UPDATE/DELETE operations
- Increase storage size
- Cause maintenance overhead
Use pg_stat_user_indexes
or sys.dm_db_index_usage_stats
to analyze usage.
Query Optimization Techniques
1. Avoid SELECT *
-- Avoid
SELECT * FROM Employees;
-- Better
SELECT Name, HireDate FROM Employees;
Retrieving unnecessary columns adds I/O and memory overhead.
2. Use EXISTS Instead of IN (for subqueries)
-- Avoid
SELECT * FROM Customers WHERE ID IN (SELECT CustomerID FROM Orders);
-- Better
SELECT * FROM Customers WHERE EXISTS (
SELECT 1 FROM Orders WHERE Orders.CustomerID = Customers.ID
);
3. Use JOINs Wisely
Avoid joining large tables unnecessarily. Always join on indexed columns.
4. Filter Early (Push Down Predicates)
Apply WHERE filters as early as possible in nested queries.
5. Use LIMIT to Control Output
SELECT * FROM Logs ORDER BY Timestamp DESC LIMIT 100;
This improves responsiveness for web UIs.
6. Use UNION ALL Instead of UNION
-- UNION removes duplicates (expensive)
-- Use UNION ALL if duplicates aren't a concern
SELECT Name FROM Employees
UNION ALL
SELECT Name FROM Contractors;
7. Avoid Function Calls in WHERE
-- Avoid
WHERE UPPER(Name) = 'JOHN'
-- Better
WHERE Name = 'John'
Using functions disables index usage.
8. Optimize GROUP BY and ORDER BY
- Use indexed columns
- Avoid grouping or sorting large datasets if not required
Schema Design Considerations
Schema design has a profound impact on performance.
Normalization vs Denormalization:
- Normalize to reduce redundancy and improve consistency
- Denormalize for reporting or performance-critical joins
Use Appropriate Data Types:
- Use
INT
instead ofBIGINT
if possible - Use
VARCHAR(100)
instead ofTEXT
if size is predictable
Use Constraints:
CHECK
,NOT NULL
,FOREIGN KEY
enforce consistency and allow better query planning
Partitioning:
Split large tables by range, list, or hash to reduce I/O and improve performance.
CREATE TABLE Orders_2024 PARTITION OF Orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Caching and Materialized Views
Materialized Views:
Store query results and refresh periodically.
CREATE MATERIALIZED VIEW MonthlySales AS
SELECT ProductID, SUM(Amount) FROM Sales
GROUP BY ProductID;
Benefits:
- Ideal for dashboards and reporting
- Reduce repeated heavy computation
Use REFRESH MATERIALIZED VIEW MonthlySales;
to update.
Monitoring and Profiling Tools
PostgreSQL:
pg_stat_statements
auto_explain
pgBadger
(log analyzer)
MySQL:
EXPLAIN
,SHOW PROFILE
- Performance Schema
- Slow query log
SQL Server:
- Query Store
- SQL Profiler
- Execution Plans (graphical)
Real-World Case Study: Optimizing Order Analytics
A reporting query was running slowly:
SELECT CustomerID, COUNT(*) FROM Orders
WHERE OrderDate >= '2023-01-01'
GROUP BY CustomerID;
Step 1: Analyze
EXPLAIN ANALYZE SELECT ...
Showed sequential scan on Orders.
Step 2: Add Index
CREATE INDEX idx_order_date ON Orders(OrderDate);
Step 3: Materialize Result for Frequent Use
CREATE MATERIALIZED VIEW OrderCountByCustomer AS
SELECT CustomerID, COUNT(*) FROM Orders
WHERE OrderDate >= '2023-01-01'
GROUP BY CustomerID;
This improved runtime from 4.5s to 100ms.
Best Practices
- Use
EXPLAIN
to diagnose slow queries - Keep transactions short to avoid blocking
- Prefer
EXISTS
overIN
for large datasets - Avoid
SELECT *
and fetch only necessary columns - Normalize for OLTP; denormalize for reporting
- Profile regularly and monitor usage stats
- Schedule refreshes for materialized views
- Archive historical data into separate tables or partitions
Common Mistakes to Avoid
Mistake | Impact |
---|---|
No indexes on WHERE/JOIN columns | Full table scans, slow performance |
SELECT * | Fetches unnecessary data |
Not analyzing query plans | Misses obvious inefficiencies |
Over-normalization in analytics schema | Slows down reporting |
Ignoring statistics and outliers | Poor cost estimation by query planner |
Lack of caching | Repeats heavy computations unnecessarily |
Summary
Performance optimization in SQL involves:
- Understanding query execution through tools like
EXPLAIN
- Smart use of indexes and query rewriting
- Careful schema design and thoughtful use of materialized views
- Monitoring and periodic tuning
A well-optimized SQL system delivers faster insights, better scalability, and reduced infrastructure costs.
In the next chapter, we’ll explore SQL in Real-World Projects, showcasing how SQL powers business intelligence, reporting, data pipelines, and integrations in real enterprise systems.