How to Randomly Extract a Certain Percentage of the Rows in Google Sheets

Published on

I have a formula based solution to randomly extract a certain percentage of the rows in Google Sheets. This will be useful in cases like; select 10% of data from a large number of a Form submission or something similar.

Actually, I do have two solutions. One is fully formula oriented but the other is not so.

I am going to provide you with both the solutions and both have its own pros and cons.

The Formula to Randomly Extract Certain Percentage of the Rows in Sheets

This is the fully formula based solution. The formula logic is as follows.

  1. Generate a set of random numbers using the RANDBETWEEN function in a unique way.
  2. Use that random numbers to Sort the data.
  3. Find the percentage of rows to extract.
  4. Constrain the number of rows using Array_Constrain or Query.

I’ll take you thru’ each and every step below. But first what we want is a sample data to work with. Here is one.

randomize 'n' percentage of rows

Step # 1:

How to Generate a Set of Unique Random Numbers in Google Sheets

To generate unique random numbers in all the non-blank rows, you can use the below combo formula in Google Spreadsheets.

=ArrayFormula(if(len(A2:A),randbetween(row(A1:A),9^9),))

Since our sample data contains data in 13 rows leaving the field label, this formula would generate 13 random numbers.

This formula works like this.

Here is the RANDBETWEEN sytax for your quick reference.

RANDBETWEEN(low, high)
  • The LEN formula checks the non-blank cells in the data range.
  • The ROW function in RANDBETWEEN feeds 20 numbers to the “low” argument, i.e. from 1 to 20.
  • The “high” argument is the number 387420489 (9^9).

So the above formula generates 20 unique numbers between 1 to 387420489, 2 to 387420489 … 20 to 387420489.

Step # 2:

How to SORT the Data in Random Order in Google Sheets

I am going to SORT the above data randomly. Use the above unique random numbers as the sort column in the function SORT.

When using the function SORT you can omit the ArrayFormula.

=sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0)

This way we can randomly sort the data in Google Sheets.

Steps # 3:

How to Find Certain Percentage of Total Data in Google Sheets

Consider the total number of rows in the dataset as 100%. You can find the total number of rows in a dataset in Google Sheets using the COUNTA function.

Multiplying that number of rows with the percent value. For example, I want to find the 50% of the rows in the range A2:B14. I can use the below formula.

=round(counta(A2:A)*0.5)

As a side note, you can also use the PERCENTILE function here as below. This is for your reference only.

=ArrayFormula(PERCENTILE(if(LEN(A2:A),row(A1:A),),0.5))

Step # 4:

How to Constrain the Number of Rows to Certain Percentage

Actually, this is our final formula. I am combining all the above formulas here.

Syntax of ARRAY_CONSTRAIN Function:

ARRAY_CONSTRAIN(input_range, num_rows, num_cols)

Replace the “input_range” with the formula in Step # 2 above and “num_rows” with the formula in Step # 3. The “num_cols” is of course 2, i.e. the number of columns in our sample data.

Final Formula:

=array_constrain(sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0),round(counta(A2:A)*0.5),2)

This way you can randomly extract a certain percentage of the rows in Google Sheets.

Randomly Extract a Certain Percentage of the Rows

Now time to discuss about the pros and cons of the above formula.

Pros:

With the help of this formula, you can randomly copy ‘n’ percentage of rows in Google Sheets.

Cons:

The formula output will be randomized each time when you make changes to your sheet. Because the RANDBETWEEN is a volatile function.

You May Like: Frequency of Recalculation of Google Sheets NOW, TODAY, RAND, RANDBETWEEN Functions.

So you may need to copy and paste value the formula output to avoid changes.

Randomize Range and Copy ‘n’ Percentage of the Rows in Sheets

This time I am using the “Randomize Range” option in the data menu instead of the RANDBETWEEN function. Other steps will be similar as above.

The formula is so simple. This is the formula that you want to use to randomly copy ‘n’ percentage of the rows.

=array_constrain(A2:B,round(counta(A2:A)*0.5),2)

Here is the steps to use this formula.

  1. Select the data A2:B14.
  2. Go to the menu Data and select “Randomize range”.
  3. Apply the above formula in any blank cell.

That’s all. You can adopt this method also to randomly extract a certain percentage of the rows in Google Sheets. Here are the pros and cons.

Pros:

  1. Improve the performance of the Sheets.
  2. The formula output won’t change automatically. The changes will only happen if you make changes in the source data.

Cons:

  1. It’s semi-automatic. The user needs to manually randomize the range.

That’s all for now. Enjoy!

Additional Resources:

  1. How to Randomly Select N Numbers from a Column in Google Sheets.
  2. Google Sheets: Macro-based Random Name Picker.
  3. How to Shuffle Rows in Google Sheets Without Plugin.
  4. How to Pick a Random Name from a Long List in Google Sheets.
Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

How to Extract Numbers from Text in Excel with Regex

You can use the REGEXEXTRACT or REGEXREPLACE functions to easily extract numbers from text...

Using OFFSET and MATCH Together in Google Sheets: Advanced Tips

One powerful and flexible way to look up values is by combining the OFFSET...

How to Use OFFSET and XMATCH Functions Together in Excel

We often use the OFFSET and XMATCH functions together to match a value in...

How to Calculate Maximum Drawdown in Excel and Google Sheets

You can use the following dynamic array formula to calculate maximum drawdown (MDD) in...

More like this

Using OFFSET and MATCH Together in Google Sheets: Advanced Tips

One powerful and flexible way to look up values is by combining the OFFSET...

Running Count with Structured References in Google Sheets

Running a count with structured references is achievable in Google Sheets tables using the...

Running Total with Structured Table References in Google Sheets

You can use two types of formulas to create a running total with structured...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.