🧚 Listen! Beekeeper Studio is a fast, modern, and open source database GUI Download
January 8, 2026 By Matthew Rathbone

When you need to summarize data and find patterns in your database, the SQL GROUP BY clause is your most powerful tool. Whether you’re counting customers, calculating sales totals, or analyzing trends across different categories, GROUP BY transforms rows of detailed data into meaningful summaries.

Throughout this guide, you’ll find interactive SQL exercises that let you practice GROUP BY queries in your browser—no database setup required. Write queries, get instant feedback, and see if your results match the expected output.

The GROUP BY clause is fundamental to data analysis and reporting. It lets you organize rows with similar values into groups, then apply aggregate functions like COUNT, SUM, and AVG to calculate statistics for each group. Think of it as the foundation of business intelligence and data summarization.

What is GROUP BY?

GROUP BY is a SQL clause that groups rows based on one or more columns that have the same values. Once grouped, you can use aggregate functions to perform calculations on each group. The basic syntax is:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

The key concept is that GROUP BY creates separate “buckets” for each unique value, and then aggregate functions operate on each bucket independently.

Basic GROUP BY with COUNT

Let’s start with the most common use case: counting how many records exist for each category. Imagine you have a sales table and want to know how many products were sold in each category:

SELECT 
  category,
  COUNT(*) as product_count
FROM products
GROUP BY category;

This query does three things:

  1. Groups rows by the category column
  2. Counts rows in each group using COUNT(*)
  3. Returns one row for each unique category with its count

The COUNT(*) function counts all rows in each group, regardless of NULL values. You could also use COUNT(column_name) to count only non-NULL values in a specific column.

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the sales table to count how many sales were made for each category. Return the category and the sales_count.

Interactive Example ✓ Completed
Expected columns: [[ col ]]
Correct! Your query produced the expected results.
[[ col ]]
[[ formatCell(cell) ]]
Not quite right. Your query ran but the results don't match.
Your Results:
[[ col ]]
[[ formatCell(cell) ]]
Expected Results:
[[ col ]]
[[ formatCell(cell) ]]
SQL Error:
[[ error ]]
💡 Hint: [[ testData.hint ]]
📊 Expected Result:
[[ col ]]
[[ formatCell(cell) ]]
✨ Solution:
[[ testData.solution ]]
Available Tables
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

Common Aggregate Functions with GROUP BY

GROUP BY becomes truly powerful when combined with aggregate functions. Here are the most commonly used ones:

  • COUNT() - Counts rows or non-NULL values
  • SUM() - Adds up numeric values
  • AVG() - Calculates the average
  • MIN() - Finds the minimum value
  • MAX() - Finds the maximum value

You can use multiple aggregate functions in the same query to get comprehensive statistics:

SELECT 
  department,
  COUNT(*) as total_employees,
  AVG(salary) as avg_salary,
  MIN(salary) as min_salary,
  MAX(salary) as max_salary
FROM employees
GROUP BY department;

This single query gives you a complete salary breakdown for each department. It’s much more efficient than running separate queries for each statistic.

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the orders table to get order statistics for each customer_id. Return customer_id, total_orders (count), total_spent (sum of amount), and avg_order_value (average amount), rounded to 2 decimal places.

Interactive Example ✓ Completed
Expected columns: [[ col ]]
Correct! Your query produced the expected results.
[[ col ]]
[[ formatCell(cell) ]]
Not quite right. Your query ran but the results don't match.
Your Results:
[[ col ]]
[[ formatCell(cell) ]]
Expected Results:
[[ col ]]
[[ formatCell(cell) ]]
SQL Error:
[[ error ]]
💡 Hint: [[ testData.hint ]]
📊 Expected Result:
[[ col ]]
[[ formatCell(cell) ]]
✨ Solution:
[[ testData.solution ]]
Available Tables
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

Grouping by Multiple Columns

Sometimes you need to group by more than one column to get the right level of detail. For example, you might want to see sales figures broken down by both region and month:

SELECT 
  region,
  EXTRACT(MONTH FROM sale_date) as month,
  SUM(amount) as total_sales
FROM sales
GROUP BY region, EXTRACT(MONTH FROM sale_date);

When you group by multiple columns, SQL creates a group for each unique combination of values. If you have 3 regions and 12 months, you could potentially have up to 36 groups (though some combinations might not exist in your data).

Important note: Any column you select that isn’t inside an aggregate function must be included in the GROUP BY clause. This is a fundamental rule of SQL - you can’t select ungrouped columns alongside grouped ones.

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the inventory table to get the total quantity for each combination of warehouse and product_type. Return warehouse, product_type, and total_quantity.

Interactive Example ✓ Completed
Expected columns: [[ col ]]
Correct! Your query produced the expected results.
[[ col ]]
[[ formatCell(cell) ]]
Not quite right. Your query ran but the results don't match.
Your Results:
[[ col ]]
[[ formatCell(cell) ]]
Expected Results:
[[ col ]]
[[ formatCell(cell) ]]
SQL Error:
[[ error ]]
💡 Hint: [[ testData.hint ]]
📊 Expected Result:
[[ col ]]
[[ formatCell(cell) ]]
✨ Solution:
[[ testData.solution ]]
Available Tables
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

Filtering Groups with HAVING

What if you want to filter your grouped results? You might think to use WHERE, but WHERE filters individual rows before grouping happens. To filter groups after they’ve been created, you use the HAVING clause.

Here’s the difference:

  • WHERE filters rows before grouping
  • HAVING filters groups after aggregation
SELECT 
  customer_id,
  COUNT(*) as order_count,
  SUM(amount) as total_spent
FROM orders
WHERE order_date >= '2024-01-01'  -- Filter rows first
GROUP BY customer_id
HAVING COUNT(*) >= 5;  -- Then filter groups

This query finds customers who placed at least 5 orders since January 1st, 2024. The WHERE clause eliminates old orders before grouping, and HAVING keeps only customers with 5+ orders.

Common HAVING use cases:

  • Find top customers (HAVING SUM(amount) > 1000)
  • Identify frequent buyers (HAVING COUNT(*) > 10)
  • Locate outliers (HAVING AVG(score) > 95)

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the employees table to find departments with more than 2 employees. Return department and employee_count, but only for departments where the count is greater than 2.

Interactive Example ✓ Completed
Expected columns: [[ col ]]
Correct! Your query produced the expected results.
[[ col ]]
[[ formatCell(cell) ]]
Not quite right. Your query ran but the results don't match.
Your Results:
[[ col ]]
[[ formatCell(cell) ]]
Expected Results:
[[ col ]]
[[ formatCell(cell) ]]
SQL Error:
[[ error ]]
💡 Hint: [[ testData.hint ]]
📊 Expected Result:
[[ col ]]
[[ formatCell(cell) ]]
✨ Solution:
[[ testData.solution ]]
Available Tables
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

ORDER BY with GROUP BY

The order of grouped results can be important, especially for reports. You can use ORDER BY with both grouped columns and aggregate functions:

SELECT 
  category,
  COUNT(*) as product_count,
  AVG(price) as avg_price
FROM products
GROUP BY category
ORDER BY avg_price DESC, product_count DESC;

This orders results by average price (highest first), then by product count for categories with the same average price.

Pro tip: Always include ORDER BY in production queries with GROUP BY. Different database engines might return grouped results in different orders, and explicit ordering ensures consistency.

Performance Considerations

GROUP BY operations can be resource-intensive, especially on large datasets. Here are key optimization strategies:

Indexing Strategy

Create indexes on columns you frequently group by:

-- If you often group by category
CREATE INDEX idx_products_category ON products(category);

-- For multi-column grouping
CREATE INDEX idx_sales_region_date ON sales(region, sale_date);

The order of columns in multi-column indexes matters. Put the most selective column first, or the one you most often filter by in WHERE clauses.

Limit Result Sets

Use WHERE clauses to reduce the data being processed:

-- Good: Filter first, then group
SELECT category, COUNT(*)
FROM products
WHERE created_date >= '2024-01-01'
GROUP BY category;

-- Less efficient: Group everything, then filter
SELECT category, COUNT(*)
FROM products
GROUP BY category
HAVING MIN(created_date) >= '2024-01-01';

Consider Partial Aggregation

For very large datasets, consider pre-aggregating data in summary tables:

-- Instead of grouping millions of transaction rows every time
-- Create a daily_sales summary table and query that instead
SELECT region, SUM(daily_total)
FROM daily_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region;

Common GROUP BY Patterns

Ranking and Top N

Find the top categories by sales:

SELECT 
  category,
  SUM(amount) as total_sales
FROM sales
GROUP BY category
ORDER BY total_sales DESC
LIMIT 5;

Percentage Calculations

Calculate what percentage each category represents:

SELECT 
  category,
  COUNT(*) as category_count,
  ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM products), 2) as percentage
FROM products
GROUP BY category
ORDER BY category_count DESC;

Time-Based Grouping

Group sales by month and year:

SELECT 
  EXTRACT(YEAR FROM sale_date) as year,
  EXTRACT(MONTH FROM sale_date) as month,
  COUNT(*) as sales_count,
  SUM(amount) as total_sales
FROM sales
GROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;

GROUP BY vs Subqueries

Sometimes you can achieve similar results with subqueries, but GROUP BY is usually more efficient:

-- Using GROUP BY (efficient)
SELECT 
  department,
  COUNT(*) as emp_count
FROM employees
GROUP BY department;

-- Using subqueries (less efficient)
SELECT DISTINCT
  department,
  (SELECT COUNT(*) FROM employees e2 WHERE e2.department = e1.department) as emp_count
FROM employees e1;

The GROUP BY version processes the data once, while the subquery version runs a separate query for each department.

Working with GROUP BY in Beekeeper Studio

When working with complex GROUP BY queries, having a good SQL editor makes development much easier. Beekeeper Studio provides excellent support for GROUP BY operations:

Features that help with GROUP BY:

  • Query formatting: Automatically formats complex GROUP BY queries for readability
  • Autocomplete: Suggests column names and aggregate functions as you type
  • Result visualization: Clear display of aggregated results with proper formatting
  • Query explain: Shows execution plans to help optimize performance
  • Multiple database support: Consistent GROUP BY syntax across PostgreSQL, MySQL, SQLite, and more

The free version includes all the essential features for mastering GROUP BY, making it perfect for learning and everyday database analysis.

Common Pitfalls and How to Avoid Them

The SELECT Rule

Error: Selecting columns not in GROUP BY

-- This will fail in most databases
SELECT name, category, COUNT(*)
FROM products
GROUP BY category;

Fix: Only select grouped columns or aggregates

-- Correct version
SELECT category, COUNT(*)
FROM products
GROUP BY category;

NULL Values in Groups

NULL values are treated as a single group:

-- If some products have NULL category, they'll be grouped together
SELECT category, COUNT(*)
FROM products
GROUP BY category;
-- Result might include: NULL, 5

Handle NULLs explicitly if needed:

SELECT 
  COALESCE(category, 'Uncategorized') as category,
  COUNT(*)
FROM products
GROUP BY category;

Empty Groups

GROUP BY only creates groups for data that exists. If you need to show all possible categories (even with zero counts), use a LEFT JOIN with a reference table:

-- Show all categories, even those with no products
SELECT 
  c.category_name,
  COUNT(p.id) as product_count
FROM categories c
LEFT JOIN products p ON c.category_name = p.category
GROUP BY c.category_name;

Advanced GROUP BY Techniques

ROLLUP for Subtotals

Some databases support ROLLUP for creating subtotals:

-- PostgreSQL, SQL Server
SELECT 
  region,
  category,
  SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(region, category);

This creates subtotals for each region and a grand total.

Conditional Aggregation

Use CASE statements inside aggregates for conditional counting:

SELECT 
  department,
  COUNT(*) as total_employees,
  COUNT(CASE WHEN salary > 50000 THEN 1 END) as high_earners,
  COUNT(CASE WHEN salary <= 50000 THEN 1 END) as regular_earners
FROM employees
GROUP BY department;

This counts employees in different salary brackets within each department.

Key Takeaways

The SQL GROUP BY clause is essential for data analysis and reporting. Here’s what to remember:

  • GROUP BY creates buckets of rows with matching values, then applies aggregates to each bucket
  • Any selected column must be either in GROUP BY or inside an aggregate function
  • Use HAVING to filter groups after aggregation (not WHERE)
  • Multiple columns create groups for each unique combination
  • Index your grouping columns for better performance on large datasets
  • ORDER BY ensures consistent result ordering across different databases
  • NULL values form their own group - handle them explicitly if needed
  • WHERE filters before grouping, HAVING filters after grouping

Master these concepts, and you’ll be able to transform raw data into meaningful insights. GROUP BY is the bridge between detailed transaction data and high-level business reporting.

Ready to practice more? Try the interactive examples above, or explore more SQL tutorials to continue building your database skills.

Beekeeper Studio Is A Free & Open Source Database GUI

Best SQL query & editor tool I have ever used. It provides everything I need to manage my database. - ⭐⭐⭐⭐⭐ Mit

Beekeeper Studio is fast, intuitive, and easy to use. Beekeeper supports loads of databases, and works great on Windows, Mac and Linux.

Beekeeper's Linux version is 100% full-featured, no cut corners, no feature compromises.

What Users Say About Beekeeper Studio

★★★★★
"Beekeeper Studio completely replaced my old SQL workflow. It's fast, intuitive, and makes database work enjoyable again."
— Alex K., Database Developer
★★★★★
"I've tried many database GUIs, but Beekeeper strikes the perfect balance between features and simplicity. It just works."
— Sarah M., Full Stack Engineer

Ready to Improve Your SQL Workflow?

download Download Free