Find and Eliminate Duplicates Using QUERY in Google Sheets

Published on

Finding and eliminating duplicates using the QUERY formula in Google Sheets involves creating a new range without duplicates.

To clarify, if you want to remove duplicates directly from the source data, Google Sheets already provides a built-in tool for that. If you prefer using formulas, you can employ functions like UNIQUE or SORTN depending on your needs.

What sets the QUERY approach apart, and when should you use it for duplicate removal?

Consider the table below, where columns A to D display items, area, quantity (Qty), and amount:

ItemsAreaQtyAmount
AppleNorth1254
OrangeSouth732
AppleNorth1777
BananaEast1045
OrangeSouth523
AppleWest1463

The goal is to remove duplicates from columns A (Items) and B (Area) while retaining the values from duplicate rows in other columns by summing them up in the retained row.

For example, for “Apple – North”:

Combine quantities: 12 + 17 = 29
Combine amounts: 54 + 77 = 131

You can extract unique items and return the row with the minimum quantity, maximum quantity, or both using the QUERY function.

This comprehensive approach covers what is meant by finding and eliminating duplicates using the QUERY function in Google Sheets.

Example 1: Unique Text Columns and Sum Numeric Columns

The following QUERY formula will unique columns A and B (items and area) and sum the qty and amount columns:

=QUERY(A1:D, "SELECT A, B, SUM(C), SUM(D) WHERE A<>'' GROUP BY A, B", 1)

This will produce a table with unique items with respect to items and area.

Unique Text Columns and Sum Numeric Columns

Explanation:

QUERY Syntax: QUERY(data, query, [headers])

  • data: A1:D (The range of cells to perform the query on)
  • query: "SELECT A, B, SUM(C), SUM(D) WHERE A<>'' GROUP BY A, B"
    • SELECT A, B – selects columns A and B
    • SUM(C), SUM(D) – sums values in columns C and D
    • WHERE A<>'' – filters out any rows where column A is blank
    • GROUP BY A, B – groups by columns A and B
  • headers: 1 (the number of header rows in the ‘data’)

This approach demonstrates using the QUERY function to find and eliminate duplicates based on columns A and B while summing numeric values in columns C and D.

Note:

If you have only one text column, i.e., column A, SELECT A, B will become SELECT A and GROUP BY A, B will become GROUP BY A. Needless to say, SUM(C), SUM(D) will become SUM(B), SUM(C) since the data is in columns A1 to C.

Example 2: Find Max or Min and Eliminate Duplicates Using QUERY

This approach allows you to retain only the items from different areas with the maximum yield:

=QUERY(A1:D, "SELECT A, B, MAX(C) WHERE A<>'' GROUP BY A, B", 1)
Find Max or Min and Eliminate Duplicates Using QUERY

Replace MAX(C) with MIN(C) in the formula if you want to eliminate duplicates based on the minimum quantity rows.

QUERY is not the only function that can eliminate duplicates this way. You can use a combination of SORT and SORTN as follows:

=LET(data, SORT(A2:C, 1, 1, 3, 0, 2, 1), SORTN(data, 9^9, 2, CHOOSECOLS(data, 1)&CHOOSECOLS(data, 2), 1)) // Max
=LET(data, SORT(A2:C, 1, 1, 3, 1, 2, 1), SORTN(data, 9^9, 2, CHOOSECOLS(data, 1)&CHOOSECOLS(data, 2), 1)) // Min

This might be complicated to understand. Please check my corresponding tutorials in my function guide for more detailed explanations.

Another option is unique items and areas while keeping the minimum and maximum quantities. The following formula will do that:

=QUERY(A1:D, "SELECT A, B, MIN(C), MAX(C) WHERE A<>'' GROUP BY A, B", 1)
Move Max and Min to Same Row and Remove Rows In Between

Here, the SORT and SORTN combination won’t meet your requirements.

Wrap-Up

Google Sheets has a built-in tool in the Data menu (Data Clean-up) to remove duplicates from the source data.

If you want the unique rows in a new range, the suggested options are UNIQUE, SORTN, and QUERY.

While QUERY is not specifically designed to remove duplicates, it is an advanced function running a Google Visualization API Query Language query on your data. Its aggregation capabilities allow you to eliminate duplicate rows in ways that UNIQUE or SORTN cannot match.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Expert Prashanth KV is a Diamond Product Expert in Google Sheets, officially recognized by Google for his contributions to the Docs Editors Help Community and featured in the Google Product Experts Directory. Explore his blog to learn advanced formulas, automation tips, and problem-solving techniques to elevate your spreadsheet skills.

Mode of Comma-Separated Numbers in Excel (Dynamic Array)

There is no dedicated function in Excel to directly find the mode of comma-separated...

How to Find Mode of Comma-Separated Numbers in Google Sheets

Working with comma-separated numbers inside a single cell is a common scenario in Google...

How to Count Merged Cells in Google Sheets (and Get Their Size)

Sometimes, you may have blocks of merged cells in a column or row. But...

Count Consecutive Workday Absences in Google Sheets

This tutorial offers a powerful formula-based solution to count consecutive workday absences in Google...

More like this

How to Find Mode of Comma-Separated Numbers in Google Sheets

Working with comma-separated numbers inside a single cell is a common scenario in Google...

How to Count Merged Cells in Google Sheets (and Get Their Size)

Sometimes, you may have blocks of merged cells in a column or row. But...

Count Consecutive Workday Absences in Google Sheets

This tutorial offers a powerful formula-based solution to count consecutive workday absences in Google...

3 COMMENTS

  1. Hey, from Br,
    thanks for the article.

    Did you try in a big sheet, like 10 thousand rows?
    I did the same thing using INDEX and FREQUENCY, but it’s very low to calculate entire sheet in each access.

    regards

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.