Here you can learn how to remove duplicate rows based on selected columns in Google Sheets. It’s a quick walkthrough.
I am sure that until now you were in the dark about this type of duplicate elimination in Google Sheets.
Let me make this concept clearer to you.
In a multi-column table, with the help of this tutorial, you can remove duplicates based on one column, two columns, etc.
You can use SORTN instead of the popular UNIQUE function in this case.
Yup! You have heard it right. I am talking about the SORTN formula, not SORT.
=UNIQUE(A2:C6)
The above UNIQUE would return all the rows as it’s.
When using UNIQUE, the highlighted rows are duplicate entries only if the published years are the same. But they are different.
So, one way is to omit the last column and use the formula below.
=UNIQUE(A2:B6)
But it’s not ideal if you don’t want to shrink the width of the table in the result.
I want a formula that treats the above two rows as duplicates based on the author’s name and book title.
I want to ignore the third column that contains the year of publication but must return all three columns in the result.
You can remove duplicate rows based on selected columns using the SORTN function in Google Sheets. Here is how.
Removing Duplicate Rows Based on Selected Columns
See the examples below to understand how to remove duplicates in selected columns in Google Sheets.
There are two examples below, and easy following even for newbies to Sheets.
As I mentioned, we will use the SORTN function for this.
The display_ties_mode # 2 (an argument) of this function helps us to achieve our goal.
1. Based on One Column
Here is a two-column data set where the first column (A) is for book titles, and the second column (B) is for the year of publication.
I want to find and remove duplicates based on the book title in column A.
The title “A Pocket Full of Rye” occurs twice. How to remove one of them?
The following Google Sheets formula will keep only the first occurrence of the titles from column A.
Single-Column Formula (First Occurrences):
=sortn(A2:B6,5,2,A2:A6,FALSE)
or
=sortn(A2:B6,5,2,1,FALSE)
Syntax: SORTN(range, [n], [display_ties_mode], [sort_column], [is_ascending], [sort_column2, …], [is_ascending2, …])
range
: A2:B6
n
: 5
It’s the total number of rows in the range. Usually, experts use 9^9, an arbitrarily large number, because of unsure about the number of rows after removing duplicates based on the selected column.
display_ties_mode
: 2
sort_column
: A2:A6 or 1
It’s the column that decides the duplicates. Here we can also use the column index number, which is #1 here.
is_ascending
: FALSE
What will we do if we want to keep the last occurrence of the row?
We require to flip the data before applying the SORTN.
We can use the CHOOSEROWS function for that.
So the SORTN range
will be chooserows(A2:B6,sequence(rows(A2:A6),1,-1,-1))
.
Single-Column Formula (Last Occurrences):
=sortn(chooserows(A2:B6,sequence(rows(A2:A6),1,-1,-1)),5,2,1,FALSE)
2. Removing Duplicate Rows Based on Selected Two Columns
Removing duplicate rows based on selected two or more columns is not rocket science in Google Sheets.
It’s simple if you have followed the above examples with patience.
We want to remove duplicates based on two selected columns, and they are A2:A6 (authors) and B2:B6 (book titles).
Earlier, we used A2:A6 as the sort_column. Since we have two columns here, join them using the ampersand.
Two-Column Formula (First Occurrences):
=sortn(A2:C6,5,2,A2:A6&B2:B6,FALSE)
It keeps the first occurrence.
To keep the last occurrence, we will flip the data as earlier and use it as the range
in SORTN.
Further, we may require to extract the ‘selected’ columns (A2:A6 and B2:B6) from flipped data using CHOOSECOLS to combine.
So the arguments will be as follows.
range
: chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1))
n
: 5 (or use 9^9)
display_ties_mode
: 2
sort_column
: ArrayFormula(CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),1)&CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),2))
is_ascending
: FALSE
Two-Column Formula (Last Occurrences):
=sortn(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),5,2,ArrayFormula(CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),1)&CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),2)),FALSE)
We can improve the performance of the above formula by avoiding repetitive calculations for flipping data. Let’s use the LET function for that.
Here is the formula to remove the first occurrences of the duplicates based on the selected two columns in Google Sheets.
=let(range,chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),sortn(range,5,2,ArrayFormula(CHOOSECOLS(range,1)&CHOOSECOLS(range,2)),FALSE))
“Function SORTN parameter 4 expects number values. But ‘sheetname!E2:E’ is a text and cannot be coerced to a number”.
Hi, Lee,
Please explain the data you are trying to manipulate using SORTN. Also, the formula in use.
Then only I may be able to find the problem.
Can I use the SORTN to wrap the Query function?
Hi, Hendry Teyanto,
Yes! You Can.