With the help of the Filter menu and a helper column, it’s quite easy to filter duplicates in Google Sheets. There is one clear advantage of using the Filter menu to filter duplicates. What’s that?
Filtering rows in a table that contains duplicates are the best option to delete duplicates in a lot.
Yes! Just select the filtered rows which are duplicates then right click and delete. All the duplicates have gone!
You should treat the word Duplicate in a much broader sense in Google Sheets. The reason it depends on the formatting of data in the table.
Different Types of Duplicates in Spreadsheets
In this tutorial, you can see different ways of duplicates handling. Let’s begin with duplicates in a single column.
Single Column Duplicates in Sheets
When a value repeats in the second time in a column, that second appearance is considered as a duplicate.
That means if a value occurs multiple times, all the occurrences other than the first one are duplicates.
In the screenshot above the rows marked with the text “duplicates” are the duplicate rows and I am going to filter out other rows in the example after few paras below.
Multi-Column Duplicates in Sheets
As per my example below, the data is spread across three columns. See the marked rows 2 and 6.
Row # 6 is a duplicate row as it’s exactly the copy of row # 2. The Title, Year and Author are the same in both the rows.
If the value in any of the columns, for example, the year in B6 is different, then there are no duplicates in this table.
Random Column Duplicates in Sheets
Here again, my dataset contains three columns. But I want to look into column A and C for duplicates.
In that sense all the rows leaving the first one are duplicates.
I am using a helper column to support the above three types of filtering in Google Sheets.
In the following examples, I am providing you the relevant formulas to enter in the helper column (a supporting column) to filter duplicate rows in Google Sheets.
Filtering Duplicates in Google Sheets and Delete Them
As I have said, I am going to filter duplicates in Google Sheets using the Filter menu.
Filtering Duplicates in a Single Column and Delete in Google Sheets
Steps:
1. Enter the below formula in cell B2.
=query(iferror(sort({row(A2:A)-row(A2)+2-match(sort(A2:A),sort(A2:A),0),SORT(ROW(A2:A)+2-row(A2)+2,A2:A,1)},2,1),),"Select Col1")
As an alternative, you can use the below Countifs formula that also follows the running count logic.
=ARRAYFORMULA(if(A2:A<>"",COUNTIFS(A2:A,A2:A,ROW(A2:A),"<="&ROW(A2:A)),))
It would populate some numbers as below.
2. Select the columns A and B. Then go to the menu Data and select Create a filter.
3. Click the drop-down in cell B1. Then click on Filter by condition and choose greater than and there enter the number 1.
Follow this method to filter duplicates in Google Sheets that in a single column.
You May Like: Google Sheets Function Guide.
Filtering Duplicates in Multiple Columns and Delete in Google Sheets
To filter duplicates in Google Sheets that involves multiple columns, the procedure is the same as the above except the formula.
Steps:
1. The formula is as follows.
=query({A2:A,iferror(sort({row(A2:A)-row(A2)+2-match(sort(A2:A&B2:B&C2:C),sort(A2:A&B2:B&C2:C),0),SORT(ROW(A2:A)+2-row(A2)+2,A2:A&B2:B&C2:C,1)},2,1),)},"Select Col2 where Col1 is not null")
Here is more clean and simple alternative formula (Countifs running count).
=ARRAYFORMULA(if(A2:A<>"",COUNTIFS(A2:A&B2:B&C2:C,A2:A&B2:B&C2:C,ROW(A2:A),"<="&ROW(A2:A)),))
Enter either of the above formulas in cell D2.
2. Select the columns A to D and filter. Then click the drop-down in cell D1. Same as the single column filter click on the Filter by condition and choose greater than and there enter # 1.
I want to modify this formula to add/remove columns. Can you explain to me how to do that?
The below example includes the answer to this question.
Filtering Duplicates in Google Sheets in Random Columns and Delete
See a different scenario.
Suppose, I have the just above formula in cell D2. If you apply Filter in this, it will only filter the last row.
That means other rows are unique. Because the value in the second column is different except for the first and last row.
The first row is not duplicate as it’s the first occurrence. Since the last row is filtered as the duplicate.
You can exclude the second column (column B) from this formula. See how to do that below.
Just remove the highlighted column references in the formula. The modified formula would be as follows.
=query({A2:A,iferror(sort({row(A2:A)-row(A2)+2-match(sort(A2:A&C2:C),sort(A2:A&C2:C),0),SORT(ROW(A2:A)+2-row(A2)+2,A2:A&C2:C,1)},2,1),)},"Select Col2 where Col1 is not null")
If you use the Countifs one, in that also remove the said column B as below.
=ARRAYFORMULA(if(A2:A<>"",COUNTIFS(A2:A&C2:C,A2:A&C2:C,ROW(A2:A),"<="&ROW(A2:A)),))
Apply this formula in cell D2. Then filter as per the above examples. For this formula all the rows are duplicates. So the filter will only filter out the first row. Now you can delete the visible rows which are duplicates.
This is because we are considering only column A and C and in those columns, all the values are exactly matching.
You can add or remove any number of columns in this formula based on your dataset. This way you can quickly filter duplicates in Google Sheets and delete them in a flash.
Update:
Google Sheets now has a new built-in tool (command) to eliminate duplicates. See that guide – How to Use Remove Duplicates Menu Command in Google Sheets.
Related Topics: