Randomly Extract a Certain Percentage of Rows in Google Sheets

Published on

You may need to extract a certain percentage of data from a large dataset for analysis. Rather than extracting from the first, last, or any specific portion, it’s better to randomly extract a certain percentage of rows, such as 20% or 50%. You can easily achieve this using a formula in Google Sheets.

Formula to Extract N% of Rows in Google Sheets

=SORTN(data, ROUND(COUNTA(col)*50%), 0, RANDARRAY(ROWS(col)), TRUE)

When using this formula for proportional random sampling, make the following adjustments:

  • Replace data with the data range.
  • Replace col with the reference of any column in the dataset.
  • Replace 50% with the percentage of data you want to randomly extract.

Example: Randomly Extract a Certain Percentage of Rows in Google Sheets

Consider the following dataset with names in column A and email addresses in column B.

Data for random sampling in Google Sheets, with names in column A and email addresses in column B

Extracting ‘N’ Percentage of Rows

To extract 30% of the data from this range, use the formula as follows:

=SORTN(A2:B14, ROUND(COUNTA(A2:A14)*30%), 0, RANDARRAY(ROWS(A2:A14)), TRUE)

The dataset contains 13 rows. 30% of 13 is 3.9. Since row numbers must be whole numbers, the formula rounds up and returns 4 random rows.

Google Sheets formula extracting a random 30% of rows from the dataset, displaying four randomly selected rows

Additional Step: Convert to Static Data

The formula refreshes whenever the sheet changes or at specific intervals due to the use of the RANDARRAY function, which is volatile.

If you want static data, copy the result and apply Paste Special as follows:

  1. Assume the result is in D2:E5.
  2. Select D2:E5.
  3. Right-click and select Copy.
  4. Right-click again and select Paste special > Values only.

How the Formula Extracts N% of Rows

The formula uses the SORTN function, which returns n rows after sorting.

  • n is determined dynamically by: ROUND(COUNTA(A2:A14)*30%)
    This calculates the number of rows to extract based on the given percentage.
  • The data is sorted using: RANDARRAY(ROWS(A2:A14))
    This generates random numbers corresponding to the height of the dataset, effectively shuffling the data.

That’s the logic behind randomly extracting a certain percentage of rows for random sampling in Google Sheets.

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Filter Top N per Category in Excel (Using FILTER & COUNTIFS)

You can use a combination of FILTER and COUNTIFS to filter the top N...

Rank per Group in Excel

You have two groups of 20 students each. How do you determine the rank...

Real-Time Excel Filtering Using Combo Box & FILTER Function

In this tutorial, you’ll learn how to set up real-time Excel filtering using a...

Google Sheets: Extract Top N per Group from Query Aggregation

When working with grouped and aggregated data in Google Sheets, you might need to...

More like this

Google Sheets: Extract Top N per Group from Query Aggregation

When working with grouped and aggregated data in Google Sheets, you might need to...

How to Extract Top N from Aggregated Query Results in Google Sheets

To summarize data in Google Sheets, you can use the QUERY function. However, if...

How to Use RANK IF in Google Sheets (Conditional Ranking)

You can use the RANK function to rank values in an entire dataset. But...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.