Back up / test first. This deletes data. Run the inner
SELECTon its own to see exactly which rows will be removed before you run theDELETE.
Number duplicates with ROW_NUMBER and delete everything past the first. This works in MySQL 8+, PostgreSQL, and DuckDB (the table needs a unique row id):
DELETE FROM users
WHERE id IN (
SELECT id FROM (
SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
) t
WHERE rn > 1
);
-- SQLite: keep the lowest built-in rowid per key
DELETE FROM users
WHERE rowid NOT IN (SELECT MIN(rowid) FROM users GROUP BY email);
-- PostgreSQL: shortcut using the system ctid (no id column needed)
DELETE FROM users a
USING users b
WHERE a.email = b.email AND a.ctid > b.ctid;
-- BigQuery / Snowflake: rewrite the table with only the keepers
CREATE OR REPLACE TABLE users AS
SELECT * EXCEPT (rn) FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
)
WHERE rn = 1;
The MySQL “target table” trap. MySQL refuses to
DELETEfrom a table you also read in a subquery — “You can’t specify target table ‘users’ for update in FROM clause”. The double subquery above (SELECT id FROM (SELECT … ) t) forces MySQL to materialize the inner result first, which sidesteps it. (SELECT *is fine in BigQuery/Snowflake but BigQuery has noctid/rowid, hence the rewrite form.)