When you need to summarize data and find patterns in your database, the SQL GROUP BY clause is your most powerful tool. Whether you’re counting customers, calculating sales totals, or analyzing trends across different categories, GROUP BY transforms rows of detailed data into meaningful summaries.
Throughout this guide, you’ll find interactive SQL exercises that let you practice GROUP BY queries in your browser—no database setup required. Write queries, get instant feedback, and see if your results match the expected output.
The GROUP BY clause is fundamental to data analysis and reporting. It lets you organize rows with similar values into groups, then apply aggregate functions like COUNT, SUM, and AVG to calculate statistics for each group. Think of it as the foundation of business intelligence and data summarization.
What is GROUP BY?
GROUP BY is a SQL clause that groups rows based on one or more columns that have the same values. Once grouped, you can use aggregate functions to perform calculations on each group. The basic syntax is:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
The key concept is that GROUP BY creates separate “buckets” for each unique value, and then aggregate functions operate on each bucket independently.
Basic GROUP BY with COUNT
Let’s start with the most common use case: counting how many records exist for each category. Imagine you have a sales table and want to know how many products were sold in each category:
SELECT
category,
COUNT(*) as product_count
FROM products
GROUP BY category;
This query does three things:
-
Groups rows by the
categorycolumn -
Counts rows in each group using
COUNT(*) - Returns one row for each unique category with its count
The COUNT(*) function counts all rows in each group, regardless of NULL values. You could also use COUNT(column_name) to count only non-NULL values in a specific column.
Try it yourself:
[[ testData.title ]]
Query the sales table to count how many sales were made for each category. Return the category and the sales_count.
[[ col ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Your Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Expected Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ error ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ testData.solution ]]
Available Tables
[[ table.name ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Common Aggregate Functions with GROUP BY
GROUP BY becomes truly powerful when combined with aggregate functions. Here are the most commonly used ones:
- COUNT() - Counts rows or non-NULL values
- SUM() - Adds up numeric values
- AVG() - Calculates the average
- MIN() - Finds the minimum value
- MAX() - Finds the maximum value
You can use multiple aggregate functions in the same query to get comprehensive statistics:
SELECT
department,
COUNT(*) as total_employees,
AVG(salary) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
GROUP BY department;
This single query gives you a complete salary breakdown for each department. It’s much more efficient than running separate queries for each statistic.
Try it yourself:
[[ testData.title ]]
Query the orders table to get order statistics for each customer_id. Return customer_id, total_orders (count), total_spent (sum of amount), and avg_order_value (average amount), rounded to 2 decimal places.
[[ col ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Your Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Expected Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ error ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ testData.solution ]]
Available Tables
[[ table.name ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Grouping by Multiple Columns
Sometimes you need to group by more than one column to get the right level of detail. For example, you might want to see sales figures broken down by both region and month:
SELECT
region,
EXTRACT(MONTH FROM sale_date) as month,
SUM(amount) as total_sales
FROM sales
GROUP BY region, EXTRACT(MONTH FROM sale_date);
When you group by multiple columns, SQL creates a group for each unique combination of values. If you have 3 regions and 12 months, you could potentially have up to 36 groups (though some combinations might not exist in your data).
Important note: Any column you select that isn’t inside an aggregate function must be included in the GROUP BY clause. This is a fundamental rule of SQL - you can’t select ungrouped columns alongside grouped ones.
Try it yourself:
[[ testData.title ]]
Query the inventory table to get the total quantity for each combination of warehouse and product_type. Return warehouse, product_type, and total_quantity.
[[ col ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Your Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Expected Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ error ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ testData.solution ]]
Available Tables
[[ table.name ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Filtering Groups with HAVING
What if you want to filter your grouped results? You might think to use WHERE, but WHERE filters individual rows before grouping happens. To filter groups after they’ve been created, you use the HAVING clause.
Here’s the difference:
- WHERE filters rows before grouping
- HAVING filters groups after aggregation
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total_spent
FROM orders
WHERE order_date >= '2024-01-01' -- Filter rows first
GROUP BY customer_id
HAVING COUNT(*) >= 5; -- Then filter groups
This query finds customers who placed at least 5 orders since January 1st, 2024. The WHERE clause eliminates old orders before grouping, and HAVING keeps only customers with 5+ orders.
Common HAVING use cases:
- Find top customers (HAVING SUM(amount) > 1000)
- Identify frequent buyers (HAVING COUNT(*) > 10)
- Locate outliers (HAVING AVG(score) > 95)
Try it yourself:
[[ testData.title ]]
Query the employees table to find departments with more than 2 employees. Return department and employee_count, but only for departments where the count is greater than 2.
[[ col ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Your Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
Expected Results:
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ error ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
[[ testData.solution ]]
Available Tables
[[ table.name ]]
| [[ col ]] |
|---|
| [[ formatCell(cell) ]] |
ORDER BY with GROUP BY
The order of grouped results can be important, especially for reports. You can use ORDER BY with both grouped columns and aggregate functions:
SELECT
category,
COUNT(*) as product_count,
AVG(price) as avg_price
FROM products
GROUP BY category
ORDER BY avg_price DESC, product_count DESC;
This orders results by average price (highest first), then by product count for categories with the same average price.
Pro tip: Always include ORDER BY in production queries with GROUP BY. Different database engines might return grouped results in different orders, and explicit ordering ensures consistency.
Performance Considerations
GROUP BY operations can be resource-intensive, especially on large datasets. Here are key optimization strategies:
Indexing Strategy
Create indexes on columns you frequently group by:
-- If you often group by category
CREATE INDEX idx_products_category ON products(category);
-- For multi-column grouping
CREATE INDEX idx_sales_region_date ON sales(region, sale_date);
The order of columns in multi-column indexes matters. Put the most selective column first, or the one you most often filter by in WHERE clauses.
Limit Result Sets
Use WHERE clauses to reduce the data being processed:
-- Good: Filter first, then group
SELECT category, COUNT(*)
FROM products
WHERE created_date >= '2024-01-01'
GROUP BY category;
-- Less efficient: Group everything, then filter
SELECT category, COUNT(*)
FROM products
GROUP BY category
HAVING MIN(created_date) >= '2024-01-01';
Consider Partial Aggregation
For very large datasets, consider pre-aggregating data in summary tables:
-- Instead of grouping millions of transaction rows every time
-- Create a daily_sales summary table and query that instead
SELECT region, SUM(daily_total)
FROM daily_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region;
Common GROUP BY Patterns
Ranking and Top N
Find the top categories by sales:
SELECT
category,
SUM(amount) as total_sales
FROM sales
GROUP BY category
ORDER BY total_sales DESC
LIMIT 5;
Percentage Calculations
Calculate what percentage each category represents:
SELECT
category,
COUNT(*) as category_count,
ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM products), 2) as percentage
FROM products
GROUP BY category
ORDER BY category_count DESC;
Time-Based Grouping
Group sales by month and year:
SELECT
EXTRACT(YEAR FROM sale_date) as year,
EXTRACT(MONTH FROM sale_date) as month,
COUNT(*) as sales_count,
SUM(amount) as total_sales
FROM sales
GROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;
GROUP BY vs Subqueries
Sometimes you can achieve similar results with subqueries, but GROUP BY is usually more efficient:
-- Using GROUP BY (efficient)
SELECT
department,
COUNT(*) as emp_count
FROM employees
GROUP BY department;
-- Using subqueries (less efficient)
SELECT DISTINCT
department,
(SELECT COUNT(*) FROM employees e2 WHERE e2.department = e1.department) as emp_count
FROM employees e1;
The GROUP BY version processes the data once, while the subquery version runs a separate query for each department.
Working with GROUP BY in Beekeeper Studio
When working with complex GROUP BY queries, having a good SQL editor makes development much easier. Beekeeper Studio provides excellent support for GROUP BY operations:
Features that help with GROUP BY:
- Query formatting: Automatically formats complex GROUP BY queries for readability
- Autocomplete: Suggests column names and aggregate functions as you type
- Result visualization: Clear display of aggregated results with proper formatting
- Query explain: Shows execution plans to help optimize performance
- Multiple database support: Consistent GROUP BY syntax across PostgreSQL, MySQL, SQLite, and more
The free version includes all the essential features for mastering GROUP BY, making it perfect for learning and everyday database analysis.
Common Pitfalls and How to Avoid Them
The SELECT Rule
Error: Selecting columns not in GROUP BY
-- This will fail in most databases
SELECT name, category, COUNT(*)
FROM products
GROUP BY category;
Fix: Only select grouped columns or aggregates
-- Correct version
SELECT category, COUNT(*)
FROM products
GROUP BY category;
NULL Values in Groups
NULL values are treated as a single group:
-- If some products have NULL category, they'll be grouped together
SELECT category, COUNT(*)
FROM products
GROUP BY category;
-- Result might include: NULL, 5
Handle NULLs explicitly if needed:
SELECT
COALESCE(category, 'Uncategorized') as category,
COUNT(*)
FROM products
GROUP BY category;
Empty Groups
GROUP BY only creates groups for data that exists. If you need to show all possible categories (even with zero counts), use a LEFT JOIN with a reference table:
-- Show all categories, even those with no products
SELECT
c.category_name,
COUNT(p.id) as product_count
FROM categories c
LEFT JOIN products p ON c.category_name = p.category
GROUP BY c.category_name;
Advanced GROUP BY Techniques
ROLLUP for Subtotals
Some databases support ROLLUP for creating subtotals:
-- PostgreSQL, SQL Server
SELECT
region,
category,
SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(region, category);
This creates subtotals for each region and a grand total.
Conditional Aggregation
Use CASE statements inside aggregates for conditional counting:
SELECT
department,
COUNT(*) as total_employees,
COUNT(CASE WHEN salary > 50000 THEN 1 END) as high_earners,
COUNT(CASE WHEN salary <= 50000 THEN 1 END) as regular_earners
FROM employees
GROUP BY department;
This counts employees in different salary brackets within each department.
Key Takeaways
The SQL GROUP BY clause is essential for data analysis and reporting. Here’s what to remember:
- GROUP BY creates buckets of rows with matching values, then applies aggregates to each bucket
- Any selected column must be either in GROUP BY or inside an aggregate function
- Use HAVING to filter groups after aggregation (not WHERE)
- Multiple columns create groups for each unique combination
- Index your grouping columns for better performance on large datasets
- ORDER BY ensures consistent result ordering across different databases
- NULL values form their own group - handle them explicitly if needed
- WHERE filters before grouping, HAVING filters after grouping
Master these concepts, and you’ll be able to transform raw data into meaningful insights. GROUP BY is the bridge between detailed transaction data and high-level business reporting.
Ready to practice more? Try the interactive examples above, or explore more SQL tutorials to continue building your database skills.
Beekeeper Studio Is A Free & Open Source Database GUI
Best SQL query & editor tool I have ever used. It provides everything I need to manage my database. - ⭐⭐⭐⭐⭐ Mit
Beekeeper Studio is fast, intuitive, and easy to use. Beekeeper supports loads of databases, and works great on Windows, Mac and Linux.
What Users Say About Beekeeper Studio
"Beekeeper Studio completely replaced my old SQL workflow. It's fast, intuitive, and makes database work enjoyable again."
"I've tried many database GUIs, but Beekeeper strikes the perfect balance between features and simplicity. It just works."