🧚 Atenção! Beekeeper Studio é uma GUI de banco de dados rápida, moderna e de código aberto Download
January 7, 2026 Por Matthew Rathbone

Let me share my experience with handling duplicate data in SQL databases. I’ve dealt with this issue many times, and I’ll show you some practical ways to clean up those pesky duplicates!

Throughout this guide, you’ll find interactive SQL exercises where you can practice identifying and handling duplicate records directly in your browser. These hands-on examples will help you master using DISTINCT, GROUP BY with HAVING, and ROW_NUMBER() to clean up your data effectively.

🔖 Quick Reference: Check out our SQL Language Cheat Sheet for quick lookups on SELECT DISTINCT, GROUP BY, HAVING, and other essential SQL commands for data cleaning.

If you’re looking for a user-friendly SQL editor to practice these techniques, Beekeeper Studio’s SQL editor provides helpful features like syntax highlighting and query execution.

Why Duplicates Are a Headache

Let me tell you why I take duplicate data seriously. Imagine having multiple copies of the same customer record in your database – it’s like having multiple copies of the same contact in your phone! Here’s why this drives me crazy:

  • It messes up your reports: When I’m trying to count unique customers, duplicates make my numbers incorrect
  • It slows everything down: My queries take longer to run because they’re processing the same data multiple times
  • It wastes storage space: Why pay for extra storage when you’re just storing the same information repeatedly?

How I Spot Duplicates

Before I clean up duplicates, I need to find them first. Here are my go-to methods:

The Simple Way: Using DISTINCT

This is the easiest method I use when I just want to see unique values:

SELECT DISTINCT name, email, phone
FROM customers;

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the customers table to return only unique combinations of name and email. The table has duplicate entries that need to be identified.

Exemplo Interativo ✓ Concluído
Colunas esperadas: [[ col ]]
ℹ️ Este exercício reinicia o banco de dados a cada execução. Escreva sua solução completa em uma única submissão.
[[ i18n.correct ]] [[ validationResult ? i18n.all_checks_passed : i18n.query_success ]]
[[ detail.message ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.not_quite_right ]] [[ patternError ]] [[ validationResult.error ]] [[ i18n.results_dont_match ]]
[[ detail.passed ? '✓' : '✗' ]] [[ detail.message ]]
[[ i18n.your_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.expected_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.sql_error ]]
[[ error ]]
💡 [[ i18n.hint_label ]] [[ testData.hint ]]
📊 [[ i18n.expected_result_label ]]
[[ col ]]
[[ formatCell(cell) ]]
✨ [[ i18n.solution_label ]]
[[ testData.solution ]]
Tabelas Disponíveis
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

The Detective Way: Finding Exact Duplicates

When I want to actually see which records are duplicated:

SELECT name, email, phone, COUNT(*) as duplicate_count
FROM customers
GROUP BY name, email, phone
HAVING COUNT(*) > 1;

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the products table to find which product_name/category combinations appear more than once. Show the duplicated values and how many times they appear.

Exemplo Interativo ✓ Concluído
Colunas esperadas: [[ col ]]
ℹ️ Este exercício reinicia o banco de dados a cada execução. Escreva sua solução completa em uma única submissão.
[[ i18n.correct ]] [[ validationResult ? i18n.all_checks_passed : i18n.query_success ]]
[[ detail.message ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.not_quite_right ]] [[ patternError ]] [[ validationResult.error ]] [[ i18n.results_dont_match ]]
[[ detail.passed ? '✓' : '✗' ]] [[ detail.message ]]
[[ i18n.your_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.expected_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.sql_error ]]
[[ error ]]
💡 [[ i18n.hint_label ]] [[ testData.hint ]]
📊 [[ i18n.expected_result_label ]]
[[ col ]]
[[ formatCell(cell) ]]
✨ [[ i18n.solution_label ]]
[[ testData.solution ]]
Tabelas Disponíveis
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

My Favorite Ways to Remove Duplicates

Let me share the methods I use most often to clean up duplicate data:

1. The Quick Way: Using ROW_NUMBER()

This is my favorite method because it’s clean and efficient:

WITH DuplicateCTE AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY name, email, phone
            ORDER BY id
        ) as row_num
    FROM customers
)
DELETE FROM DuplicateCTE
WHERE row_num > 1;

Try it yourself:

[[ expanded ? '▼' : '▶' ]]

[[ testData.title ]]

Query the transactions table using ROW_NUMBER() to identify which duplicate transactions should be kept (row_num = 1) and which should be removed (row_num > 1). Show all records with their row numbers.

Exemplo Interativo ✓ Concluído
Colunas esperadas: [[ col ]]
ℹ️ Este exercício reinicia o banco de dados a cada execução. Escreva sua solução completa em uma única submissão.
[[ i18n.correct ]] [[ validationResult ? i18n.all_checks_passed : i18n.query_success ]]
[[ detail.message ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.not_quite_right ]] [[ patternError ]] [[ validationResult.error ]] [[ i18n.results_dont_match ]]
[[ detail.passed ? '✓' : '✗' ]] [[ detail.message ]]
[[ i18n.your_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.expected_results ]]
[[ col ]]
[[ formatCell(cell) ]]
[[ i18n.sql_error ]]
[[ error ]]
💡 [[ i18n.hint_label ]] [[ testData.hint ]]
📊 [[ i18n.expected_result_label ]]
[[ col ]]
[[ formatCell(cell) ]]
✨ [[ i18n.solution_label ]]
[[ testData.solution ]]
Tabelas Disponíveis
[[ table.name ]]
[[ col ]]
[[ formatCell(cell) ]]

This interactive example shows how ROW_NUMBER() identifies which records to keep (row_num = 1) versus which would be removed (row_num > 1). In production, you’d then delete where row_num > 1.

2. The Safe Way: Creating a Clean Table

When I want to be extra careful with important data:

-- First, I create a new table with only unique records
SELECT DISTINCT name, email, phone
INTO customers_clean
FROM customers;

-- Then, if everything looks good, I can replace the old table
DROP TABLE customers;
RENAME TABLE customers_clean TO customers;

Tips From My Experience

Here’s what I’ve learned from cleaning up lots of duplicate data:

  • Always backup first: I can’t stress this enough – make a backup before deleting anything!
  • Start small: I always test my duplicate removal queries on a small subset of data first
  • Check your work: After removing duplicates, I double-check that I kept the right records
  • Prevent future duplicates: I add unique constraints to prevent duplicates from coming back

Keeping Your Data Clean

Here’s how I prevent duplicates from happening in the first place:

  1. Use primary keys: I always add a unique identifier for each record
  2. Add unique constraints: I put these on columns that should never have duplicates
  3. Regular checks: I run duplicate checks regularly to catch issues early

Remember, cleaning up duplicates might seem scary at first, but with these methods, it becomes much more manageable. The key is to be careful, test thoroughly, and always have a backup plan!

Beekeeper Studio É Uma GUI de Banco de Dados Gratuita e de Código Aberto

A melhor ferramenta de consultas SQL e editor que já usei. Fornece tudo que preciso para gerenciar meu banco de dados. - ⭐⭐⭐⭐⭐ Mit

Beekeeper Studio é rápido, intuitivo e fácil de usar. Beekeeper suporta muitos bancos de dados e funciona muito bem no Windows, Mac e Linux.

A versão Linux do Beekeeper é 100% completa, sem cortes e sem compromissos de recursos.

O Que Os Usuários Dizem Sobre o Beekeeper Studio

★★★★★
"O Beekeeper Studio substituiu completamente meu antigo fluxo de trabalho com SQL. É rápido, intuitivo e torna o trabalho com banco de dados agradável novamente."
— Alex K., Desenvolvedor de Banco de Dados
★★★★★
"Já experimentei muitas GUIs de banco de dados, mas o Beekeeper encontra o equilíbrio perfeito entre recursos e simplicidade. Simplesmente funciona."
— Sarah M., Engenheira Full Stack

Pronto para Melhorar seu Fluxo de Trabalho com SQL?

download Download Gratuito