Randomly Extract a Certain Percentage of Rows in Google Sheets

Published on

You may need to extract a certain percentage of data from a large dataset for analysis. Rather than extracting from the first, last, or any specific portion, it’s better to randomly extract a certain percentage of rows, such as 20% or 50%. You can easily achieve this using a formula in Google Sheets.

Formula to Extract N% of Rows in Google Sheets

=SORTN(data, ROUND(COUNTA(col)*50%), 0, RANDARRAY(ROWS(col)), TRUE)

When using this formula for proportional random sampling, make the following adjustments:

  • Replace data with the data range.
  • Replace col with the reference of any column in the dataset.
  • Replace 50% with the percentage of data you want to randomly extract.

Example: Randomly Extract a Certain Percentage of Rows in Google Sheets

Consider the following dataset with names in column A and email addresses in column B.

Data for random sampling in Google Sheets, with names in column A and email addresses in column B

Extracting ‘N’ Percentage of Rows

To extract 30% of the data from this range, use the formula as follows:

=SORTN(A2:B14, ROUND(COUNTA(A2:A14)*30%), 0, RANDARRAY(ROWS(A2:A14)), TRUE)

The dataset contains 13 rows. 30% of 13 is 3.9. Since row numbers must be whole numbers, the formula rounds up and returns 4 random rows.

Google Sheets formula extracting a random 30% of rows from the dataset, displaying four randomly selected rows

Additional Step: Convert to Static Data

The formula refreshes whenever the sheet changes or at specific intervals due to the use of the RANDARRAY function, which is volatile.

If you want static data, copy the result and apply Paste Special as follows:

  1. Assume the result is in D2:E5.
  2. Select D2:E5.
  3. Right-click and select Copy.
  4. Right-click again and select Paste special > Values only.

How the Formula Extracts N% of Rows

The formula uses the SORTN function, which returns n rows after sorting.

  • n is determined dynamically by: ROUND(COUNTA(A2:A14)*30%)
    This calculates the number of rows to extract based on the given percentage.
  • The data is sorted using: RANDARRAY(ROWS(A2:A14))
    This generates random numbers corresponding to the height of the dataset, effectively shuffling the data.

That’s the logic behind randomly extracting a certain percentage of rows for random sampling in Google Sheets.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

How to Sort Pie Slices in Google Sheets

To sort pie slices in a pie chart, you need to sort the data...

Filter Items Unique to Groups in Google Sheets

In this tutorial, we'll learn how to filter items unique to groups in Google...

Find Common Items Across Multiple Columns in Google Sheets

This tutorial explains how to find common items across multiple columns in Google Sheets....

Sort Column by Length of Text in Google Sheets

To sort a column by length of text, you can either use the QUERY...

More like this

How to Sort Pie Slices in Google Sheets

To sort pie slices in a pie chart, you need to sort the data...

Filter Items Unique to Groups in Google Sheets

In this tutorial, we'll learn how to filter items unique to groups in Google...

Find Common Items Across Multiple Columns in Google Sheets

This tutorial explains how to find common items across multiple columns in Google Sheets....

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.