Find and Eliminate Duplicates Using QUERY in Google Sheets

Published on

Finding and eliminating duplicates using the QUERY formula in Google Sheets involves creating a new range without duplicates.

To clarify, if you want to remove duplicates directly from the source data, Google Sheets already provides a built-in tool for that. If you prefer using formulas, you can employ functions like UNIQUE or SORTN depending on your needs.

What sets the QUERY approach apart, and when should you use it for duplicate removal?

Consider the table below, where columns A to D display items, area, quantity (Qty), and amount:

ItemsAreaQtyAmount
AppleNorth1254
OrangeSouth732
AppleNorth1777
BananaEast1045
OrangeSouth523
AppleWest1463

The goal is to remove duplicates from columns A (Items) and B (Area) while retaining the values from duplicate rows in other columns by summing them up in the retained row.

For example, for “Apple – North”:

Combine quantities: 12 + 17 = 29
Combine amounts: 54 + 77 = 131

You can extract unique items and return the row with the minimum quantity, maximum quantity, or both using the QUERY function.

This comprehensive approach covers what is meant by finding and eliminating duplicates using the QUERY function in Google Sheets.

Example 1: Unique Text Columns and Sum Numeric Columns

The following QUERY formula will unique columns A and B (items and area) and sum the qty and amount columns:

=QUERY(A1:D, "SELECT A, B, SUM(C), SUM(D) WHERE A<>'' GROUP BY A, B", 1)

This will produce a table with unique items with respect to items and area.

Unique Text Columns and Sum Numeric Columns

Explanation:

QUERY Syntax: QUERY(data, query, [headers])

  • data: A1:D (The range of cells to perform the query on)
  • query: "SELECT A, B, SUM(C), SUM(D) WHERE A<>'' GROUP BY A, B"
    • SELECT A, B – selects columns A and B
    • SUM(C), SUM(D) – sums values in columns C and D
    • WHERE A<>'' – filters out any rows where column A is blank
    • GROUP BY A, B – groups by columns A and B
  • headers: 1 (the number of header rows in the ‘data’)

This approach demonstrates using the QUERY function to find and eliminate duplicates based on columns A and B while summing numeric values in columns C and D.

Note:

If you have only one text column, i.e., column A, SELECT A, B will become SELECT A and GROUP BY A, B will become GROUP BY A. Needless to say, SUM(C), SUM(D) will become SUM(B), SUM(C) since the data is in columns A1 to C.

Example 2: Find Max or Min and Eliminate Duplicates Using QUERY

This approach allows you to retain only the items from different areas with the maximum yield:

=QUERY(A1:D, "SELECT A, B, MAX(C) WHERE A<>'' GROUP BY A, B", 1)
Find Max or Min and Eliminate Duplicates Using QUERY

Replace MAX(C) with MIN(C) in the formula if you want to eliminate duplicates based on the minimum quantity rows.

QUERY is not the only function that can eliminate duplicates this way. You can use a combination of SORT and SORTN as follows:

=LET(data, SORT(A2:C, 1, 1, 3, 0, 2, 1), SORTN(data, 9^9, 2, CHOOSECOLS(data, 1)&CHOOSECOLS(data, 2), 1)) // Max
=LET(data, SORT(A2:C, 1, 1, 3, 1, 2, 1), SORTN(data, 9^9, 2, CHOOSECOLS(data, 1)&CHOOSECOLS(data, 2), 1)) // Min

This might be complicated to understand. Please check my corresponding tutorials in my function guide for more detailed explanations.

Another option is unique items and areas while keeping the minimum and maximum quantities. The following formula will do that:

=QUERY(A1:D, "SELECT A, B, MIN(C), MAX(C) WHERE A<>'' GROUP BY A, B", 1)
Move Max and Min to Same Row and Remove Rows In Between

Here, the SORT and SORTN combination won’t meet your requirements.

Wrap-Up

Google Sheets has a built-in tool in the Data menu (Data Clean-up) to remove duplicates from the source data.

If you want the unique rows in a new range, the suggested options are UNIQUE, SORTN, and QUERY.

While QUERY is not specifically designed to remove duplicates, it is an advanced function running a Google Visualization API Query Language query on your data. Its aggregation capabilities allow you to eliminate duplicate rows in ways that UNIQUE or SORTN cannot match.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

How to Extract Numbers from Text in Excel with Regex

You can use the REGEXEXTRACT or REGEXREPLACE functions to easily extract numbers from text...

Using OFFSET and MATCH Together in Google Sheets: Advanced Tips

One powerful and flexible way to look up values is by combining the OFFSET...

How to Use OFFSET and XMATCH Functions Together in Excel

We often use the OFFSET and XMATCH functions together to match a value in...

How to Calculate Maximum Drawdown in Excel and Google Sheets

You can use the following dynamic array formula to calculate maximum drawdown (MDD) in...

More like this

Using OFFSET and MATCH Together in Google Sheets: Advanced Tips

One powerful and flexible way to look up values is by combining the OFFSET...

Running Count with Structured References in Google Sheets

Running a count with structured references is achievable in Google Sheets tables using the...

Running Total with Structured Table References in Google Sheets

You can use two types of formulas to create a running total with structured...

3 COMMENTS

  1. Hey, from Br,
    thanks for the article.

    Did you try in a big sheet, like 10 thousand rows?
    I did the same thing using INDEX and FREQUENCY, but it’s very low to calculate entire sheet in each access.

    regards

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.