How to Randomly Extract a Certain Percentage of the Rows in Google Sheets

Published on

I have a formula based solution to randomly extract a certain percentage of the rows in Google Sheets. This will be useful in cases like; select 10% of data from a large number of a Form submission or something similar.

Actually, I do have two solutions. One is fully formula oriented but the other is not so.

I am going to provide you with both the solutions and both have its own pros and cons.

The Formula to Randomly Extract Certain Percentage of the Rows in Sheets

This is the fully formula based solution. The formula logic is as follows.

  1. Generate a set of random numbers using the RANDBETWEEN function in a unique way.
  2. Use that random numbers to Sort the data.
  3. Find the percentage of rows to extract.
  4. Constrain the number of rows using Array_Constrain or Query.

I’ll take you thru’ each and every step below. But first what we want is a sample data to work with. Here is one.

randomize 'n' percentage of rows

Step # 1:

How to Generate a Set of Unique Random Numbers in Google Sheets

To generate unique random numbers in all the non-blank rows, you can use the below combo formula in Google Spreadsheets.

=ArrayFormula(if(len(A2:A),randbetween(row(A1:A),9^9),))

Since our sample data contains data in 13 rows leaving the field label, this formula would generate 13 random numbers.

This formula works like this.

Here is the RANDBETWEEN sytax for your quick reference.

RANDBETWEEN(low, high)
  • The LEN formula checks the non-blank cells in the data range.
  • The ROW function in RANDBETWEEN feeds 20 numbers to the “low” argument, i.e. from 1 to 20.
  • The “high” argument is the number 387420489 (9^9).

So the above formula generates 20 unique numbers between 1 to 387420489, 2 to 387420489 … 20 to 387420489.

Step # 2:

How to SORT the Data in Random Order in Google Sheets

I am going to SORT the above data randomly. Use the above unique random numbers as the sort column in the function SORT.

When using the function SORT you can omit the ArrayFormula.

=sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0)

This way we can randomly sort the data in Google Sheets.

Steps # 3:

How to Find Certain Percentage of Total Data in Google Sheets

Consider the total number of rows in the dataset as 100%. You can find the total number of rows in a dataset in Google Sheets using the COUNTA function.

Multiplying that number of rows with the percent value. For example, I want to find the 50% of the rows in the range A2:B14. I can use the below formula.

=round(counta(A2:A)*0.5)

As a side note, you can also use the PERCENTILE function here as below. This is for your reference only.

=ArrayFormula(PERCENTILE(if(LEN(A2:A),row(A1:A),),0.5))

Step # 4:

How to Constrain the Number of Rows to Certain Percentage

Actually, this is our final formula. I am combining all the above formulas here.

Syntax of ARRAY_CONSTRAIN Function:

ARRAY_CONSTRAIN(input_range, num_rows, num_cols)

Replace the “input_range” with the formula in Step # 2 above and “num_rows” with the formula in Step # 3. The “num_cols” is of course 2, i.e. the number of columns in our sample data.

Final Formula:

=array_constrain(sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0),round(counta(A2:A)*0.5),2)

This way you can randomly extract a certain percentage of the rows in Google Sheets.

Randomly Extract a Certain Percentage of the Rows

Now time to discuss about the pros and cons of the above formula.

Pros:

With the help of this formula, you can randomly copy ‘n’ percentage of rows in Google Sheets.

Cons:

The formula output will be randomized each time when you make changes to your sheet. Because the RANDBETWEEN is a volatile function.

You May Like: Frequency of Recalculation of Google Sheets NOW, TODAY, RAND, RANDBETWEEN Functions.

So you may need to copy and paste value the formula output to avoid changes.

Randomize Range and Copy ‘n’ Percentage of the Rows in Sheets

This time I am using the “Randomize Range” option in the data menu instead of the RANDBETWEEN function. Other steps will be similar as above.

The formula is so simple. This is the formula that you want to use to randomly copy ‘n’ percentage of the rows.

=array_constrain(A2:B,round(counta(A2:A)*0.5),2)

Here is the steps to use this formula.

  1. Select the data A2:B14.
  2. Go to the menu Data and select “Randomize range”.
  3. Apply the above formula in any blank cell.

That’s all. You can adopt this method also to randomly extract a certain percentage of the rows in Google Sheets. Here are the pros and cons.

Pros:

  1. Improve the performance of the Sheets.
  2. The formula output won’t change automatically. The changes will only happen if you make changes in the source data.

Cons:

  1. It’s semi-automatic. The user needs to manually randomize the range.

That’s all for now. Enjoy!

Additional Resources:

  1. How to Randomly Select N Numbers from a Column in Google Sheets.
  2. Google Sheets: Macro-based Random Name Picker.
  3. How to Shuffle Rows in Google Sheets Without Plugin.
  4. How to Pick a Random Name from a Long List in Google Sheets.
Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Lookup Values Under Categories in Excel and Google Sheets

We can use a combination of XLOOKUP and VLOOKUP in both Excel and Google...

Extract Items Under a Category from the Same Column in Excel

In Excel, you can use a combination of the XLOOKUP and DROP functions to...

How to Create a Searchable Table in Excel Using the FILTER Function

Finding specific records, or rows containing the required information, is straightforward in Excel using...

Time Sequences in Excel by Minute, Hour, or Second Increments

Creating time sequences, whether by hour, minute, or second increments in Excel, can be...

More like this

XMATCH Row by Row: Finding Values Across a Range in Google Sheets

Using the BYROW function with XMATCH in Google Sheets allows us to match values...

Limit Formula Expansion to a Specific Row in Google Sheets

In this tutorial, I’ll explain how to limit the expansion of an array formula...

3-D Referencing Structured Data Tables in Google Sheets

When you have several tables within a single sheet—not across multiple sheets in a...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.