|
| 1 | + |
| 2 | +# **196. Delete Duplicate Emails** |
| 3 | + |
| 4 | +## **Problem Statement** |
| 5 | +You are given a table called `Person`, which stores email addresses. |
| 6 | + |
| 7 | +### **Person Table** |
| 8 | +``` |
| 9 | ++-------------+---------+ |
| 10 | +| Column Name | Type | |
| 11 | ++-------------+---------+ |
| 12 | +| id | int | |
| 13 | +| email | varchar | |
| 14 | ++-------------+---------+ |
| 15 | +``` |
| 16 | +- `id` is the **primary key**. |
| 17 | +- Each row contains an **email address**. |
| 18 | +- All emails are in **lowercase**. |
| 19 | + |
| 20 | +### **Task:** |
| 21 | +Delete all **duplicate emails**, keeping only **one unique email** with the **smallest id**. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## **Example 1:** |
| 26 | +### **Input:** |
| 27 | +#### **Person Table** |
| 28 | +``` |
| 29 | ++----+------------------+ |
| 30 | +| id | email | |
| 31 | ++----+------------------+ |
| 32 | +| 1 | john@example.com | |
| 33 | +| 2 | bob@example.com | |
| 34 | +| 3 | john@example.com | |
| 35 | ++----+------------------+ |
| 36 | +``` |
| 37 | +### **Output:** |
| 38 | +``` |
| 39 | ++----+------------------+ |
| 40 | +| id | email | |
| 41 | ++----+------------------+ |
| 42 | +| 1 | john@example.com | |
| 43 | +| 2 | bob@example.com | |
| 44 | ++----+------------------+ |
| 45 | +``` |
| 46 | +### **Explanation:** |
| 47 | +- `john@example.com` appears **twice**. |
| 48 | +- We keep the row with the **smallest `id`** (`id = 1`). |
| 49 | +- The duplicate (`id = 3`) is **deleted**. |
| 50 | + |
| 51 | +--- |
| 52 | + |
| 53 | +## **Solution Approaches** |
| 54 | + |
| 55 | +### **SQL Solution (Using Self Join)** |
| 56 | +```sql |
| 57 | +DELETE p2 FROM Person p1 |
| 58 | +JOIN Person p2 |
| 59 | +ON p1.email = p2.email AND p1.id < p2.id; |
| 60 | +``` |
| 61 | +**Explanation:** |
| 62 | +- `p1` and `p2` refer to the **same table** (`Person`). |
| 63 | +- We **join** them on `email` to find duplicates. |
| 64 | +- If `p1.id < p2.id`, we delete `p2`, keeping the row with the **smallest id**. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +### **SQL Solution (Using Subquery)** |
| 69 | +```sql |
| 70 | +DELETE FROM Person |
| 71 | +WHERE id NOT IN ( |
| 72 | + SELECT MIN(id) FROM Person GROUP BY email |
| 73 | +); |
| 74 | +``` |
| 75 | +**Explanation:** |
| 76 | +- We **group** by `email` and **select the smallest `id`** for each email. |
| 77 | +- The `DELETE` statement removes rows **not in** this list. |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +### **Pandas Solution** |
| 82 | +```python |
| 83 | +import pandas as pd |
| 84 | + |
| 85 | +def delete_duplicate_emails(person: pd.DataFrame) -> None: |
| 86 | + # Keep only the first occurrence of each email (smallest id) |
| 87 | + person.drop_duplicates(subset=['email'], keep='first', inplace=True) |
| 88 | +``` |
| 89 | +**Explanation:** |
| 90 | +- `drop_duplicates(subset=['email'], keep='first', inplace=True)`: |
| 91 | + - Keeps only **the first occurrence** of each email. |
| 92 | + - Ensures **modification happens in place**. |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## **File Structure** |
| 97 | +``` |
| 98 | +📂 LeetCode196 |
| 99 | +│── 📜 problem_statement.md |
| 100 | +│── 📜 sql_self_join_solution.sql |
| 101 | +│── 📜 sql_subquery_solution.sql |
| 102 | +│── 📜 pandas_solution.py |
| 103 | +│── 📜 README.md |
| 104 | +``` |
| 105 | +- `problem_statement.md` → Contains the problem description. |
| 106 | +- `sql_self_join_solution.sql` → Contains the SQL solution using **JOIN**. |
| 107 | +- `sql_subquery_solution.sql` → Contains the SQL solution using **Subquery**. |
| 108 | +- `pandas_solution.py` → Contains the Pandas solution for Python users. |
| 109 | +- `README.md` → Provides an overview of the problem and solutions. |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## **Useful Links** |
| 114 | +- [LeetCode Problem 196](https://leetcode.com/problems/delete-duplicate-emails/) |
| 115 | +- [SQL DELETE Statement](https://www.w3schools.com/sql/sql_delete.asp) |
| 116 | +- [Pandas drop_duplicates()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html) |
| 117 | + |
| 118 | +--- |
0 commit comments