I have a formula based solution to randomly extract a certain percentage of the rows in Google Sheets. This will be useful in cases like; select 10% of data from a large number of a Form submission or something similar.
Actually, I do have two solutions. One is fully formula oriented but the other is not so.
I am going to provide you with both the solutions and both have its own pros and cons.
The Formula to Randomly Extract Certain Percentage of the Rows in Sheets
This is the fully formula based solution. The formula logic is as follows.
- Generate a set of random numbers using the RANDBETWEEN function in a unique way.
- Use that random numbers to Sort the data.
- Find the percentage of rows to extract.
- Constrain the number of rows using Array_Constrain or Query.
I’ll take you thru’ each and every step below. But first what we want is a sample data to work with. Here is one.
Step # 1:
How to Generate a Set of Unique Random Numbers in Google Sheets
To generate unique random numbers in all the non-blank rows, you can use the below combo formula in Google Spreadsheets.
=ArrayFormula(if(len(A2:A),randbetween(row(A1:A),9^9),))
Since our sample data contains data in 13 rows leaving the field label, this formula would generate 13 random numbers.
This formula works like this.
Here is the RANDBETWEEN sytax for your quick reference.
RANDBETWEEN(low, high)
- The LEN formula checks the non-blank cells in the data range.
- The ROW function in RANDBETWEEN feeds 20 numbers to the “low” argument, i.e. from 1 to 20.
- The “high” argument is the number 387420489 (9^9).
So the above formula generates 20 unique numbers between 1 to 387420489, 2 to 387420489 … 20 to 387420489.
Step # 2:
How to SORT the Data in Random Order in Google Sheets
I am going to SORT the above data randomly. Use the above unique random numbers as the sort column in the function SORT.
When using the function SORT you can omit the ArrayFormula.
=sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0)
This way we can randomly sort the data in Google Sheets.
Steps # 3:
How to Find Certain Percentage of Total Data in Google Sheets
Consider the total number of rows in the dataset as 100%. You can find the total number of rows in a dataset in Google Sheets using the COUNTA function.
Multiplying that number of rows with the percent value. For example, I want to find the 50% of the rows in the range A2
=round(counta(A2:A)*0.5)
As a side note, you can also use the PERCENTILE function here as below. This is for your reference only.
=ArrayFormula(PERCENTILE(if(LEN(A2:A),row(A1:A),),0.5))
Step # 4:
How to Constrain the Number of Rows to Certain Percentage
Actually, this is our final formula. I am combining all the above formulas here.
Syntax of ARRAY_CONSTRAIN Function:
ARRAY_CONSTRAIN(input_range, num_rows, num_cols)
Replace the “input_range” with the formula in Step # 2 above and “num_rows” with the formula in Step # 3. The “num_cols” is of course 2, i.e. the number of columns in our sample data.
Final Formula:
=array_constrain(sort(A2:B,if(len(A2:A),randbetween(row(A1:A),9^9),),0),round(counta(A2:A)*0.5),2)
This way you can randomly extract a certain percentage of the rows in Google Sheets.
Now time to discuss about the pros and cons of the above formula.
Pros:
With the help of this formula, you can randomly copy ‘n’ percentage of rows in Google Sheets.
Cons:
The formula output will be randomized each time when you make changes to your sheet. Because the RANDBETWEEN is a volatile function.
You May Like: Frequency of Recalculation of Google Sheets NOW, TODAY, RAND, RANDBETWEEN Functions.
So you may need to copy and paste value the formula output to avoid changes.
Randomize Range and Copy ‘n’ Percentage of the Rows in Sheets
This time I am using the “Randomize Range” option in the data menu instead of the RANDBETWEEN function. Other steps will be similar as above.
The formula is so simple. This is the formula that you want to use to randomly copy ‘n’ percentage of the rows.
=array_constrain(A2:B,round(counta(A2:A)*0.5),2)
Here is the steps to use this formula.
- Select the data A2
:B14 . - Go to the menu Data and select “Randomize range”.
- Apply the above formula in any blank cell.
That’s all. You can adopt this method also to randomly extract a certain percentage of the rows in Google Sheets. Here are the pros and cons.
Pros:
- Improve the performance of the Sheets.
- The formula output won’t change automatically. The changes will only happen if you make changes in the source data.
Cons:
- It’s semi-automatic. The user needs to manually randomize the range.
That’s all for now. Enjoy!
Additional Resources: