Randomly Extract a Certain Percentage of Rows in Google Sheets

Published on

You may need to extract a certain percentage of data from a large dataset for analysis. Rather than extracting from the first, last, or any specific portion, it’s better to randomly extract a certain percentage of rows, such as 20% or 50%. You can easily achieve this using a formula in Google Sheets.

Formula to Extract N% of Rows in Google Sheets

=SORTN(data, ROUND(COUNTA(col)*50%), 0, RANDARRAY(ROWS(col)), TRUE)

When using this formula for proportional random sampling, make the following adjustments:

  • Replace data with the data range.
  • Replace col with the reference of any column in the dataset.
  • Replace 50% with the percentage of data you want to randomly extract.

Example: Randomly Extract a Certain Percentage of Rows in Google Sheets

Consider the following dataset with names in column A and email addresses in column B.

Data for random sampling in Google Sheets, with names in column A and email addresses in column B

Extracting ‘N’ Percentage of Rows

To extract 30% of the data from this range, use the formula as follows:

=SORTN(A2:B14, ROUND(COUNTA(A2:A14)*30%), 0, RANDARRAY(ROWS(A2:A14)), TRUE)

The dataset contains 13 rows. 30% of 13 is 3.9. Since row numbers must be whole numbers, the formula rounds up and returns 4 random rows.

Google Sheets formula extracting a random 30% of rows from the dataset, displaying four randomly selected rows

Additional Step: Convert to Static Data

The formula refreshes whenever the sheet changes or at specific intervals due to the use of the RANDARRAY function, which is volatile.

If you want static data, copy the result and apply Paste Special as follows:

  1. Assume the result is in D2:E5.
  2. Select D2:E5.
  3. Right-click and select Copy.
  4. Right-click again and select Paste special > Values only.

How the Formula Extracts N% of Rows

The formula uses the SORTN function, which returns n rows after sorting.

  • n is determined dynamically by: ROUND(COUNTA(A2:A14)*30%)
    This calculates the number of rows to extract based on the given percentage.
  • The data is sorted using: RANDARRAY(ROWS(A2:A14))
    This generates random numbers corresponding to the height of the dataset, effectively shuffling the data.

That’s the logic behind randomly extracting a certain percentage of rows for random sampling in Google Sheets.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Expert Prashanth KV is a Diamond Product Expert in Google Sheets, officially recognized by Google for his contributions to the Docs Editors Help Community and featured in the Google Product Experts Directory. Explore his blog to learn advanced formulas, automation tips, and problem-solving techniques to elevate your spreadsheet skills.

Calculate Trip Days by Month (Start, End, and Full Days) in Google Sheets

If you're managing business travel in Google Sheets, you may need to calculate how...

Mode of Comma-Separated Numbers in Excel (Dynamic Array)

There is no dedicated function in Excel to directly find the mode of comma-separated...

How to Find Mode of Comma-Separated Numbers in Google Sheets

Working with comma-separated numbers inside a single cell is a common scenario in Google...

How to Count Merged Cells in Google Sheets (and Get Their Size)

Sometimes, you may have blocks of merged cells in a column or row. But...

More like this

Calculate Trip Days by Month (Start, End, and Full Days) in Google Sheets

If you're managing business travel in Google Sheets, you may need to calculate how...

How to Find Mode of Comma-Separated Numbers in Google Sheets

Working with comma-separated numbers inside a single cell is a common scenario in Google...

How to Count Merged Cells in Google Sheets (and Get Their Size)

Sometimes, you may have blocks of merged cells in a column or row. But...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.