Randomly Extract a Certain Percentage of Rows in Google Sheets

Published on

You may need to extract a certain percentage of data from a large dataset for analysis. Rather than extracting from the first, last, or any specific portion, it’s better to randomly extract a certain percentage of rows, such as 20% or 50%. You can easily achieve this using a formula in Google Sheets.

Formula to Extract N% of Rows in Google Sheets

=SORTN(data, ROUND(COUNTA(col)*50%), 0, RANDARRAY(ROWS(col)), TRUE)

When using this formula for proportional random sampling, make the following adjustments:

  • Replace data with the data range.
  • Replace col with the reference of any column in the dataset.
  • Replace 50% with the percentage of data you want to randomly extract.

Example: Randomly Extract a Certain Percentage of Rows in Google Sheets

Consider the following dataset with names in column A and email addresses in column B.

Data for random sampling in Google Sheets, with names in column A and email addresses in column B

Extracting ‘N’ Percentage of Rows

To extract 30% of the data from this range, use the formula as follows:

=SORTN(A2:B14, ROUND(COUNTA(A2:A14)*30%), 0, RANDARRAY(ROWS(A2:A14)), TRUE)

The dataset contains 13 rows. 30% of 13 is 3.9. Since row numbers must be whole numbers, the formula rounds up and returns 4 random rows.

Google Sheets formula extracting a random 30% of rows from the dataset, displaying four randomly selected rows

Additional Step: Convert to Static Data

The formula refreshes whenever the sheet changes or at specific intervals due to the use of the RANDARRAY function, which is volatile.

If you want static data, copy the result and apply Paste Special as follows:

  1. Assume the result is in D2:E5.
  2. Select D2:E5.
  3. Right-click and select Copy.
  4. Right-click again and select Paste special > Values only.

How the Formula Extracts N% of Rows

The formula uses the SORTN function, which returns n rows after sorting.

  • n is determined dynamically by: ROUND(COUNTA(A2:A14)*30%)
    This calculates the number of rows to extract based on the given percentage.
  • The data is sorted using: RANDARRAY(ROWS(A2:A14))
    This generates random numbers corresponding to the height of the dataset, effectively shuffling the data.

That’s the logic behind randomly extracting a certain percentage of rows for random sampling in Google Sheets.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Expert Prashanth KV is a Diamond Product Expert in Google Sheets, officially recognized by Google for his contributions to the Docs Editors Help Community and featured in the Google Product Experts Directory. Explore his blog to learn advanced formulas, automation tips, and problem-solving techniques to elevate your spreadsheet skills.

Top Discussions

More like this

Pivot Table Formatting, Output & Special Behavior in Google Sheets

Pivot Tables in Google Sheets are powerful—but they can get tricky once you move...

Pivot Table Calculations & Advanced Metrics in Google Sheets

When it comes to built-in tools for data analysis and visualization in Google Sheets,...

Google Sheets Pivot Table Tutorial: Basics, Setup, and Date Grouping

The easiest way to summarize, analyze, and report data in Google Sheets is by...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.