HomeGoogle DocsSpreadsheetRemove Duplicate Rows Based on Selected Columns in Google Sheets

Remove Duplicate Rows Based on Selected Columns in Google Sheets

Published on

Here you can learn how to remove duplicate rows based on selected columns in Google Sheets. It’s a quick walkthrough.

I am sure that until now you were in the dark about this type of duplicate elimination in Google Sheets.

Let me make this concept clearer to you.

In a multi-column table, with the help of this tutorial, you can remove duplicates based on one column, two columns, etc.

You can use SORTN instead of the popular UNIQUE function in this case. 

Yup! You have heard it right. I am talking about the SORTN formula, not SORT.

multi column duplicates and SORTN instead of Unique
=UNIQUE(A2:C6)

The above UNIQUE would return all the rows as it’s.

When using UNIQUE, the highlighted rows are duplicate entries only if the published years are the same. But they are different.

So, one way is to omit the last column and use the formula below.

=UNIQUE(A2:B6)

But it’s not ideal if you don’t want to shrink the width of the table in the result.

I want a formula that treats the above two rows as duplicates based on the author’s name and book title.

I want to ignore the third column that contains the year of publication but must return all three columns in the result.

You can remove duplicate rows based on selected columns using the SORTN function in Google Sheets. Here is how.

Removing Duplicate Rows Based on Selected Columns

See the examples below to understand how to remove duplicates in selected columns in Google Sheets.

There are two examples below, and easy following even for newbies to Sheets.

As I mentioned, we will use the SORTN function for this.

The display_ties_mode # 2 (an argument) of this function helps us to achieve our goal.

1. Based on One Column

Here is a two-column data set where the first column (A) is for book titles, and the second column (B) is for the year of publication.

I want to find and remove duplicates based on the book title in column A.

The title “A Pocket Full of Rye” occurs twice. How to remove one of them?

one column based duplicates in google sheets

The following Google Sheets formula will keep only the first occurrence of the titles from column A.

Single-Column Formula (First Occurrences):

=sortn(A2:B6,5,2,A2:A6,FALSE)
or
=sortn(A2:B6,5,2,1,FALSE)
Syntax: SORTN(range, [n], [display_ties_mode], [sort_column], [is_ascending], [sort_column2, …], [is_ascending2, …])

range: A2:B6

n: 5

It’s the total number of rows in the range. Usually, experts use 9^9, an arbitrarily large number, because of unsure about the number of rows after removing duplicates based on the selected column.

display_ties_mode: 2

sort_column: A2:A6 or 1

It’s the column that decides the duplicates. Here we can also use the column index number, which is #1 here.

is_ascending: FALSE

What will we do if we want to keep the last occurrence of the row?

We require to flip the data before applying the SORTN.

We can use the CHOOSEROWS function for that.

So the SORTN range will be chooserows(A2:B6,sequence(rows(A2:A6),1,-1,-1)).

Single-Column Formula (Last Occurrences):

=sortn(chooserows(A2:B6,sequence(rows(A2:A6),1,-1,-1)),5,2,1,FALSE)

2. Removing Duplicate Rows Based on Selected Two Columns

Removing duplicate rows based on selected two or more columns is not rocket science in Google Sheets.

It’s simple if you have followed the above examples with patience.

Remove Duplicates Based on Selected Columns in Google Sheets

We want to remove duplicates based on two selected columns, and they are A2:A6 (authors) and B2:B6 (book titles).

Earlier, we used A2:A6 as the sort_column. Since we have two columns here, join them using the ampersand.

Two-Column Formula (First Occurrences):

=sortn(A2:C6,5,2,A2:A6&B2:B6,FALSE)

It keeps the first occurrence.

To keep the last occurrence, we will flip the data as earlier and use it as the range in SORTN.

Further, we may require to extract the ‘selected’ columns (A2:A6 and B2:B6) from flipped data using CHOOSECOLS to combine.

So the arguments will be as follows.

range: chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1))

n: 5 (or use 9^9)

display_ties_mode: 2

sort_column: ArrayFormula(CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),1)&CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),2))

is_ascending: FALSE

Two-Column Formula (Last Occurrences):

=sortn(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),5,2,ArrayFormula(CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),1)&CHOOSECOLS(chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),2)),FALSE)

We can improve the performance of the above formula by avoiding repetitive calculations for flipping data. Let’s use the LET function for that.

Here is the formula to remove the first occurrences of the duplicates based on the selected two columns in Google Sheets.

=let(range,chooserows(A2:C6,sequence(rows(A2:A6),1,-1,-1)),sortn(range,5,2,ArrayFormula(CHOOSECOLS(range,1)&CHOOSECOLS(range,2)),FALSE))
Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

SUMPRODUCT Differences: Excel vs. Google Sheets

The core purpose of the SUMPRODUCT function remains consistent across both Excel and Google...

Sum Values by Month and Category in Excel

With the help of EOMONTH, UNIQUE, and SUMIFS functions in Excel, you can efficiently...

Sum Values by Categories in Excel

In Excel, the easiest way to sum values in a range by categories is...

Filter Data from the Previous Month Using a Formula in Excel

Filtering data from the previous month may be useful for comparative analysis, generating reports,...

More like this

Interactive Random Task Assigner in Google Sheets

You have multiple tasks and multiple people. Here's a fun way to randomly assign...

Google Sheets Bar and Column Chart with Target Coloring

To color each data point in the column or bar chart according to the...

Analyzing Column A Between Non-Adjacent Values in Column B: Google Sheets

This tutorial addresses a common scenario encountered in Google Sheets: how to sum, average,...

4 COMMENTS

  1. “Function SORTN parameter 4 expects number values. But ‘sheetname!E2:E’ is a text and cannot be coerced to a number”.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.