Removing Duplicates in Google Sheets: Built-In Tool & Formulas

The easiest way to remove duplicates in Google Sheets is to either use the built-in duplicate removal tool or rely on formulas.

Formulas have their advantages and disadvantages. The main advantage is that they don’t alter your source data. However, the disadvantage is that you need to know the correct formula to use, where we can assist.

In this tutorial, we will explore different formula options and also demonstrate how to use the built-in duplicate removal tool to remove duplicate rows in Google Sheets.

You don’t need to use Apps Script to remove duplicates in Google Sheets because there’s a straightforward built-in tool available.

Pivot tables are also not suitable for removing duplicates; they are designed for summarizing and analyzing data by grouping and aggregating.

Removing Duplicates Using the Built-In Tool

Google Sheets includes a powerful data removal tool found under the Data menu. Here’s how to use it:

Sample Data:

Student IDNameClassSubjectGrade
1001Jesse Hill10AMathsB
1001Jesse Hill10AMathsB
1001Jesse Hill10AScienceB
1002Melissa Powell10AScienceA
1003Teresa Russell10BHistoryA-
1003Teresa Russell10BHistoryA-
1004Bob Brown10CMathsB+
  1. Prepare Your Data:
    • Assume your student data is organized in columns A to E, with field labels in A1:E1.
  2. Select Data:
    • Select the range A1:E containing your data.
  3. Access the Tool:
    • Click on Data > Data clean-up > Remove duplicates.
  4. Configure Settings:
    • Check the option “Data has a header row” since your data includes headers.
    • Make sure all columns are selected. You can manage column selection using the “Select All” checkbox.
  5. Remove Duplicates:
    • Click the Remove duplicates button to initiate the process.
Built-in Remove Duplicates tool

Example Scenario:

In your sample data, there are three records for the student Jesse Hill:

1001Jesse Hill10AMathsB
1001Jesse Hill10AMathsB
1001Jesse Hill10AScienceB

After duplicate removal, the records for Jesse Hill will be:

1001Jesse Hill10AMathsB
1001Jesse Hill10AScienceB

This outcome occurs because the rows differ in the fourth column (Subject).

Note:

If you wish to remove duplicates regardless of the Subject column, you should uncheck the Subject column (column D) within the Remove Duplicates window.

In short, ensure that you only select the columns you want to analyze for duplicates in the Duplicate Removal window under the “Columns to analyze” label.

The Remove Duplicates tool only removes values in the duplicate rows and moves the rows up. It doesn’t delete any rows in your sheet.

Removing Duplicates Using Formulas

There are specifically two functions in Google Sheets that can remove duplicates: UNIQUE and SORTN. While most of you may be familiar with the UNIQUE function, SORTN might be less common in other spreadsheet applications. However, SORTN offers powerful duplicate removal capabilities with its tie-mode #2.

We’ll explore both methods, keeping in mind that UNIQUE is a single-purpose function known for its simplicity.

Handling Multiple Occurrences of Rows with UNIQUE

To remove duplicates in your student data across columns A to E in Sheet1, you can use the following formula:

=UNIQUE(Sheet1!A1:E)

I suggest using this formula in a new sheet within the same file, for example, in cell A1 in Sheet2.

Removing duplicate rows (entire rows) using the UNIQUE function

Similar to the built-in tool, it will remove duplicate records for Jesse Hill. Unlike the tool, you cannot specify to exclude the Subject column during formula execution.

In this context, the UNIQUE function simply removes multiple occurrences of rows based on all columns specified. It does not include a ‘Columns to analyze’ feature.

Handling Duplicates by Key Column Using SORTN

When using SORTN to remove duplicates, you should consider two key aspects:

  1. Exclude the header row when specifying the range and sort column ranges.
  2. Since SORTN is a sort function, the output will be sorted.

Similar to the built-in duplicate removal tool, to exclude the Subject column (D) while removing duplicates, you can use the following SORTN formula:

=SORTN(Sheet1!A2:E, 9^9, 2, Sheet1!A2:A&Sheet1!B2:B&Sheet1!C2:C&Sheet1!E2:E, 1)
Remove duplicates by analyzing specific columns using SORTN

Let’s break down the formula. The syntax is:

SORTN(range, [n], [display_ties_mode], [sort_column], [is_ascending])

Where:

  • range: A2:A (the data range excluding the header row)
  • n: 9^9 (an arbitrarily large number specifies the maximum number of rows in the result)
  • display_ties_mode: 2 (removes duplicate rows)
  • sort_column: Sheet1!A2:A&Sheet1!B2:B&Sheet1!C2:C&Sheet1!E2:E (sorts and removes duplicates based on these columns, excluding column D)
  • is_ascending: 1 (sorts in ascending order; specify 0 for descending order)

When applying this formula to a different dataset, you should only modify the range and sort_column parameters. The sort_column parameter acts as the key column, or what we can refer to as the ‘Columns to analyze’, for identifying and removing duplicates.

Removing Duplicate Rows Based on Minimum or Maximum Value

When using the built-in data removal tool or the SORTN function, which can analyze specific columns for duplicates, you might want to retain records with the minimum or maximum values.

To achieve this, you need to sort the data accordingly. For the Duplicate Removal tool, use the Data menu’s Sort range option to sort the data.

However, with SORTN, you can incorporate the SORT function within it. This poses a challenge because SORTN may not recognize ranges in the sort_column parameter.

Sample Data:

ItemVendor NamePrice
LaptopABC Electronics1000
LaptopXYZ Tech980
PrinterXYZ Tech200
PrinterABC Electronics210
PrinterPQR Inc.210

In this sample, column A (Item) is the ‘Column to analyze’ for duplicates. If you want to keep the least-priced items while removing duplicates, use the following solutions:

Using Duplicate Removal Tool:

  1. Select range A1:C and click Data > Sort range > Advanced range sorting options.
  2. Check “Data has header row”.
  3. Sort by Item (A to Z) and then Price (A to Z).
  4. Click Sort.
  5. Keep the data selected.
  6. Click on Data > Data cleanup > Remove duplicates.
  7. Check “Data has header row”.
  8. Uncheck all columns and check only “Item” under “Columns to analyze”.
  9. Click Remove duplicates.

This will give you the unique least-priced items.

ItemVendor NamePrice
LaptopXYZ Tech980
PrinterXYZ Tech200

Using SORTN:

=SORTN(SORT(Sheet1!A2:C, 1, 1, 3, 1), 9^9, 2, 1, 1)

Here, the SORT function sorts the range A2:C (excluding the header row) based on column 1 (Item) in ascending order and then by column 3 (Price) in ascending order.

Syntax:

SORT(range, sort_column, is_ascending, [sort_column2, …], [is_ascending2, …])
  • range: A2:C
  • sort_column: 1 (Item)
  • is_ascending: 1 (ascending order)
  • sort_column2: 3 (Price)
  • is_ascending2: 1 (ascending order)
  • The sorted result from SORT is then used as the range in SORTN.

In the SORTN sort_column parameter (Column to analyze for duplicates), instead of specifying A2:A (Item), we use column index 1. This is because the range is sorted, and specifying A2:A directly would reference the unsorted data.

What about Highest Priced Items?

To obtain the highest priced item:

  • When using the Duplicate Removal tool, sort by Item in ascending order and Price in descending order.
  • When using the SORT and SORTN combo, sort the data similarly within the SORT function.

Example using SORT and SORTN:

=SORTN(SORT(A2:C, 1, 1, 3, 0), 9^9, 2, 1, 1)

Removing Duplicate Values Without Deleting Rows

If you want to remove duplicate values within rows without deleting the rows themselves in your source data, follow these steps:

Step-by-Step Instructions:

  1. Enter the Generic Formula:

Use the following generic formula to identify duplicate values within rows:

=ArrayFormula(COUNTIFS(ROW(first_column_range), "<="&ROW(first_column_range), columns_to_analyze, columns_to_analyze))

Example Formula:

Assuming your student data is in A1:E where A1:E1 contains the header row, enter the formula in cell F2 (the column next to the last column and the row below the header row):

=ArrayFormula(COUNTIFS(ROW(A2:A), "<="&ROW(A2:A), A2:A&B2:B&C2:C&E2:E, A2:A&B2:B&C2:C&E2:E))
Remove duplicate values while keeping the rows
  • first_column_range: A2:A (the first column to analyze)
  • columns_to_analyze: A2:A&B2:B&C2:C&E2:E (concatenated columns to analyze for duplicates)

This formula counts occurrences where concatenated values across columns A to E are duplicated within each row.

  1. Filter and Remove Duplicates:
  • Select column F and click Data > Create a filter.
  • Open the filter drop-down in cell F1, uncheck “1” (indicating non-duplicate rows), and click OK.
  • Delete all visible values in columns A to E except the header row in the student data.
  • Click Data > Remove filter to revert to the original view.
  • Finally, remove the formula in cell F2.

This process effectively removes duplicate values within rows while preserving the row positions in your dataset.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Finding Most Frequent Text in Excel with Dynamic Array Formulas

Looking to identify the most frequently occurring text in Excel? You can do this...

Hierarchical Numbering Sequences in Excel

Creating hierarchical numbering sequences in an Excel spreadsheet can significantly improve the way you...

How to Easily Repeat a Sequence of Numbers in Excel

Excel offers multiple ways to accomplish tasks, and the simplicity of each approach depends...

Create a Sequence of Dates at Every Nth Row in Excel (Dynamic Array)

Would you like to create a sequence of dates in every nth row in...

More like this

XMATCH Row by Row: Finding Values Across a Range in Google Sheets

Using the BYROW function with XMATCH in Google Sheets allows us to match values...

Limit Formula Expansion to a Specific Row in Google Sheets

In this tutorial, I’ll explain how to limit the expansion of an array formula...

3-D Referencing Structured Data Tables in Google Sheets

When you have several tables within a single sheet—not across multiple sheets in a...

2 COMMENTS

  1. I’m not seeing where or how this is removing/deleting duplicates, I see it finds duplicates. But even using your sample pages I can’t get it to remove/delete duplicates.

    What am I missing?

    Thanks

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.