HomeGoogle DocsSpreadsheetCumulative Count of Distinct Values in Google Sheets (How-To)

Cumulative Count of Distinct Values in Google Sheets (How-To)

Published on

It seems the DISTINCT clause is not part of Google Sheets Query. Then how do we get a cumulative count of distinct values in Google Sheets?

I have a formula for you to use in Google Spreadsheets (not in Excel Spreadsheets).

Let me first make you understand the cumulative distinct count of values. For that, I am using the below table in B1:C.

Example to Cumulative Count of Distinct Values in Google Sheets

There are few dates in sequence in the first column. The second column contains some strings (company names).

My sample data is in B2:C11 and the cumulative distinct count from this data is in E2:F5. I have calculated the cumulative distinct count of companies (column B) as follows.

If a company (I have simply used some alphabets to represent companies, you can use real company names) repeats at a later date, it should be ignored in the running count.

For example, the companies “A” and “B” are repeating in cells C5 and C6 (02/05/2020). It’s already in C2 and C3 (01/05/2020).

So the distinct running count will be the same (3) on 01/05/2020 and 02/05/2020.

If a company repeats on the same date (not in our sample data), for example, “A” twice on 01/05/2020, it will only be counted one time.

Please refer to the below screenshot to understand how the cumulative count of distinct values is calculated. Column D contains distinct count and E contains the cumulative distinct count.

Understand Distinct Count of Values and Cumulative Count of Distinct Values

Coding Cumulative (Running) Distinct Count Formula in Google Sheets

Here, in a few numbers of steps, you can learn to write an array formula in Google Sheets for cumulative aka running distinct count.

There are several functions involved. Try to focus on the output instead of the functions. You can learn the functions involved later from my Google Sheets function guide.

First, open a new file (Google Sheets). Refer to the above image for the sample data in B1:C11. Create it in the same range in your’s sheet.

Then follow the steps explained one by one under different subtitles below.

Remove Non-distinct Rows Using Sortn and Inserting a Number Column

I have highlighted distinct values for your reference in the above table. See column C in the table.

We should remove the rows 5, 6, and 10 to make the table distinct. Let’s do that with a SORTN and ROW combination formula.

Here is step # 1 formula for generating cumulative / running count of distinct values in Google Sheets.

=SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1)

Insert this SORTN formula that removes non-distinct values in cell E2 (it inserts a number column too).

CDC - Step # 1 formula

Formula Explained

In this {B2:B,row(B2:B)^0,C2:C} creates a new table that contains an additional column with number values (number 1 in every row). The ROW formula is for returning that additional column in the middle of columns B and C.

The above is the range in SORTN.

SORTN Syntax (just for your quick reference): SORTN(range, [n], [display_ties_mode], [sort_column], [is_ascending], [sort_column2, …], [is_ascending2, …])

In the formula just ignore 9^9 (a large number that represents ‘n’) and 2 (tie mode) and the 1 (sort order called is_ascending). It will be the same even if your data range is different.

But you must understand the use of 3 (sort column). It actually refers to column C that contains the company names. So it should be 2, right?

Nope! Since we have used the range as {B2:B,row(B2:B)^0,C2:C}, column C became the third column.

Now to the second step of coding the formula for the cumulative count of distinct values in Google Sheets.

Array_Constrain to Constrain the First Two Columns

In this second step, we will remove the third column (company names) from the above output. Just use Array_Constrain function as below. I’m modifying the E2 formula itself.

Step # 2 formula for cumulative distinct count:

=array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2)

We have now the date column and number column. The company names got removed by the Array_Constrain.

Preparing Data for Generating Cumulative Count of Distinct Values in Google Sheets

In the first step, we have removed some unwanted rows (duplicate rows) to make the data unique. We want those deleted rows added below the above step # 2 formula output in a ‘unique’ way.

Unique way?

Yes! The first column should contain the dates from the deleted rows and we want 0 against each deleted dates in the second column. You will understand the purpose of this in the next step.

Here we can follow the below logic using two UNIQUE formulas.

Extract all the unique dates from the first column B2:B using unique(B2:B) and insert an error value column as the second column of this output using unique(B2:B)/0.

Use IFERROR to convert error values to 0. Here is that step # 3 formula in cell G2.

=ArrayFormula(iferror({unique(B2:B),unique(B2:B)/0},0))
CDC - Step # 3 formula

Note: G2:G will contain date values instead of dates. Just ignore it. I’ve formatted it to dates by selecting G2:G and clicking Format > Number > Date.

Actually, we just want to extract the dates from row # 5, 6, and 10. But the formula returns all the dates. It’s not an issue as the values against all the dates are 0.

The next two steps are very important. What we want is the cumulative count of distinct values. For that, first, we should generate a count of distinct values table. How?

Query Grouping to Generate Distinct Count

In the above steps, we have generated the required data for returning the running count of distinct values (using Query) in Google Sheets.

Let’s combine the two tables E2:F and G2:H (I mean the step 2 formula and step # 3 formula).

=ArrayFormula(
     {array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2);
     iferror({unique(B2:B),unique(B2:B)/0},0)}
)
Sample Data Preparation for Cumulative Distinct Count

Then group it using Query.

There are two columns. The first column is a date column and the second column is the numeric column that contains 1 or 0. Let’s group the first column and count the second column.

So we will get the distinct count. Here is that step # 4 formula (modify E2 formula as below).

=ArrayFormula(
     query(
        {array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2);
        iferror({unique(B2:B),unique(B2:B)/0},0)},
        "Select Col1, sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"
     )
)

So we will get distinct count, not running or cumulative distinct count.

CDC - Step # 4 formula

Google Sheets Formula for Cumulative Count of Distinct Values

We have almost completed it! The running sum of the values in F2:F will be the so-called cumulative count of the distinct values of the data in B1:C.

That means column F2:F (distinct count) will remain as a helper column. But don’t worry! Later we can remove that too.

We can use MMULT or SUMIF for running sum. Since Sumif won’t allow us to use an expression as sum_range (later we will use an expression instead of F2:F), I am choosing MMULT.

Note (Out of topic): Due to the above sum_range issue Sumif can’t be used as a custom formula in Pivot Table as well as in Filter command. I think Google Sheets’s development team (engineers) should address this to give wings to Sumif.

In cell G2, use the below running/cumulative sum formula which will generate the running count of distinct values (step # 5 formula).

=ArrayFormula(
     if(len(E2:E),
        MMULT(
           IF(ROW(E2:E)>=TRANSPOSE(ROW(E2:E))=TRUE,1,0),
           n(F2:F)
        ),
     )
)

For this formula explanation, please refer to the “Resource” section below. Pick the last tutorial. It would say Excel. Don’t worry. The formula in that tutorial is applicable to Google Sheets too.

Time to show you the output of the above formula.

CDC - Step # 5 formula

We have our required output now! Now I am going to remove the distinct count column F and only to keep the cumulative / running distinct count column G.

Removing Additional Column from the Running Distinct Count Formula (Final Formula)

The E2 formula (step # 4 formula) returns two columns – date column and distinct count column. Let’s separate or we can say split it to two columns.

First Column (in E2) – Column 1 Query

I’ve used one more Query with step # 4 formula to extract the first column (alternatively, we can use Array_Constrain).

=query(
     ARRAYFORMULA(
        query(
           {array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2);
           iferror({unique(B2:B),unique(B2:B)/0},0)},
           "Select Col1, sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"
        )
     ),
     "Select Col1"
)

Just understand that we can extract the second column by just changing the last part of the above formula form Col1 to Col2 as below.

Second Column – Column 2 Query:

=query(
     ARRAYFORMULA(
        query(
           {array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2);
           iferror({unique(B2:B),unique(B2:B)/0},0)},
           "Select Col1, sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"
        )
     ),
     "Select Col2"
)

Replace F2:F in the running sum formula in cell G2 with the above Query. You will get #VALUE errors because of incompatible matrix sizes.

It’s because we have used E2:E in the G2 formula which is an open range but the formula that replaced F2:F returns a limited/closed range.

So replace E2:E with indirect("E2:E"&counta(unique(B2:B))+1) (twice in the G2 formula – see the Cyan color highlighting on formula image below).

Also remove the If(Len(E2:E) too as it’s not required in a closed range. Please refer the below image (Yellow highlighting).

Formula for Cumulative Count of Distinct Values in Google Sheets

Final Formula to Generate Cumulative Count of Distinct Values in Google Sheets:

=ArrayFormula(MMULT(IF(ROW(indirect("E2:E"&counta(unique(B2:B))+1))>=TRANSPOSE(ROW(indirect("E2:E"&counta(unique(B2:B))+1)))=TRUE,1,0),n(query(ARRAYFORMULA(query({array_constrain(SORTN({B2:B,row(B2:B)^0,C2:C},9^9,2,3,1),9^9,2);iferror({unique(B2:B),unique(B2:B)/0},0)},"Select Col1, sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''")),"Select Col2"))))

Sample_Sheet_7820

Resources:

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Appointment Schedule Template in Google Sheets

An appointment schedule template in Google Sheets can assist you in efficiently managing your...

Creating Sequential Dates in Equally Merged Cells in Google Sheets

Do you know how to create sequential dates in equally merged cells across a...

Running Total By Month in Excel

This tutorial demonstrates how to calculate the running total by month in a range...

SORT and SORTBY – Excel Vs Google Sheets

While Excel offers the SORT and SORTBY functions for sorting a table, Google Sheets...

More like this

Appointment Schedule Template in Google Sheets

An appointment schedule template in Google Sheets can assist you in efficiently managing your...

Creating Sequential Dates in Equally Merged Cells in Google Sheets

Do you know how to create sequential dates in equally merged cells across a...

Interactive Random Task Assigner in Google Sheets

You have multiple tasks and multiple people. Here's a fun way to randomly assign...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.