This time you can learn how to find the highest n values in each group. I am using Google Spreadsheets, and this tip is related to that.
The highest values or largest values are related to rank. But when finding the highest n ranks of each group, the functions RANK, MAX, MAXIFS, or even LARGE do not come to your use. But I have a formula for you!
I am using a combination of QUERY, SORT, MATCH, LET, CHOOSECOLS, and ROW functions. It may be the first time you are coming across such a formula.
With my formula, you can find the highest n values in each group in Google Sheets.
Here you can determine the number ‘n.’ I mean;
Find the highest or largest 2 values in each group, find the highest 3 values in each group, and so on.
Please see this example.
Here there are two groups: Player 1 and Player 2. I want to pick only their highest two scores. See my formula below.
Find the Highest N Values in Each Group in Google Sheets
Here is the formula that I have used in Cell D2 in the above example. It returns the highest 2 scores of each group.
=LET(
sorted,
SORT(A2:B,1,true,2,false),
ARRAYFORMULA(
QUERY(
HSTACK(
sorted,
IFERROR(ROW(A2:A)-
MATCH(CHOOSECOLS(sorted,1),CHOOSECOLS(sorted,1),0))
),"Select Col1,Col2 where Col3<3")
)
)
Note:- This is an updated version. I’ve included LET to improve the performance by avoiding repetitive calculations and used HSTACK instead of curly brackets to stack data horizontally.
This time I am not going to explain the logic of this formula. Instead, I will provide you with ample details about how to use this formula in your sheet.
I have categorized the tips under different sub-titles below.
How to Change the Highest 2 Value Rows in Each Group to N Value Rows
See the ‘n’ concept in this Google Sheets formula. I mean, how to change the highest 2 values to 3 values; or to a user-defined value?
Just see the last part of the formula. That is the Query “Where” clause as where Col3<3
.
Here <3
indicates the highest/largest 2 values/rows in each group.
If it is <4
, it indicates the highest 3 values in each group, <5
means the highest 4 values in each group, and so on.
You wish to ask the below question, right?
I have more than two columns. How can I adjust your formula, then?
Good question. Here are those tips.
How to Accommodate More Columns in My Formula that Finds the Largest N Rows in Group
You should only pay attention to the SORT formula part. Here is a four-column dataset.
You may replace the SORT formula in my above formula based on the below tips.
Earlier, we applied the formula in a 2-column dataset. There the SORT formula is like this.
SORT(A2:B,1,true,2,false)
The SORT formula must be as follows in our new four-column dataset.
SORT(A2:D,1,true,4,false)
It means, in the SORT formula, modifying the data range and changing the sort columns. You should;
- Sort the name (group) column in ascending order.
- Sort the value column that contains the highest n values in descending order.
Then you should slightly adjust the Query select clause (the last part of the formula).
In my two-column data set, the Select clause is used as below.
Select Col1,Col2 where Col3<3
In our new four-column dataset, it should be like below.
Select Col1,Col2,Col3,Col4 where Col5<3
What are the differences?
Here the number of columns is four. So it should reflect in the Query as above.
Then what about Col5<3
?
I have already detailed the role of <3
somewhere at the beginning of this Google Sheets tutorial. It determines the ‘n’ values.
Then why I used Col5
? We have only four columns, right?
My formula, which finds the highest n values in each group, creates an extra column.
So in a four-column data set, the number should be 5.
If you see my first formula (main formula), I used Col3
since the data set contains only two columns.
If you are using my formula in a 10-column dataset, the last part must be Col11<3
.
See the formula for the four-column data set. I have modified the SORT formula and Query Select clause as detailed above.
=LET(
sorted,
SORT(A2:D,1,true,4,false),
ARRAYFORMULA(
QUERY(
HSTACK(
sorted,
IFERROR(ROW(A2:A)-
MATCH(CHOOSECOLS(sorted,1),CHOOSECOLS(sorted,1),0))
),"Select Col1,Col2,Col3,Col4 where Col5<3")
)
)
Note:- Since the above is a new version, ignore the formula on the formula bar of the below image.
Accordingly, you can adjust this formula for any number of columns. Follow this tutorial to find the highest n values in each group in Google Sheets.
Hello,
I was wondering, if you knew how to accomplish this with two columns identifying the grouping.
I have ten records for a person on a given date/time and need the top 5 values from that day. Also I need another top 5 for a different day for the same person.
I have tried joining the Name and Date column to make a unique value but needed to split after. Wondering, if there is a simpler way.
Name, Date, Score
A, Jan 1, 20
A, Jan1, 40
A, Feb 1, 34
A, Feb 1, 33
B, Jan 1, 20
B, Jan 1, 30
Becomes…
Name, Date, Score
A, Jan 1, 40
A, Feb 1, 34
B, Jan 1, 30
It is expanded out to 10 records each day per person.
Hi, Taylor Wilson,
First and foremost, I’ve updated my formula in the post to enhance its performance using LET. Please check that.
Then regarding your question, we can do it as follows.
But before that, I could understand you have a Name and Date column.
If the Date column contains timestamps, you must remove the time element. So that grouping will work.
Range: A1:C
Where; A1, B1, and C1 contain the field labels Name, Date, and Score, respectively.
Formula:
=LET(sorted,SORT(A2:C,1,true,2,true,3,false),
ARRAYFORMULA(QUERY(HSTACK(sorted,IFERROR(ROW(A2:A)-
MATCH(CHOOSECOLS(sorted,1)&CHOOSECOLS(sorted,2),
CHOOSECOLS(sorted,1)&CHOOSECOLS(sorted,2),0))),
"Select Col1,Col2,Col3 where Col4<6")))
Hello Prashanth,
Ah, I did not see your update.
Amazing, this is spot on.
Thank you so much!
Hi Prashanth,
I just found this post, and thank you for this tutorial, I found it very helpful as I needed it, and I have a question, though.
If I have two or more groups, let’s say four columns; Level 0, Level 1, Level 2 groups and scores, and I want to sort it by top five and bottom five, from every Level 2 by each Level 0.
How to make it happen? Thanks.
Hi, Rach,
It seems doable!
I highly recommend you to share a sample/mockup sheet with me below. Include your hand entered (expected) result.
I will try to solve it ASAP.
Hi, Thank you. Below is the sample sheet.
— link (URL) removed by the Admin —
Hi, Rach,
I have added the required formulas to your sheet.
Thanks for sharing an easy-to-read sample sheet.
Hi, There,
Column M contains mixed Text and Numbers like this;
2160
2732
M293
2739
NP018
M149
Where can’t I replace
M2:M
withTO_TEXT(M2:M)
?Hi, Desmond Lee,
The best way is to select M2:M and apply Format > Number > Plain text.
If you do not want to format it like that, I have a new formula that works well with all data types. Here it is.
=filter(sort(A2:B,1,1,2,0),countifs(sort(A2:A,1,1,2,0),sort(A2:A,1,1,2,0),
row(A2:A),"<="&row(A2:A))<3)
Hello,
Thanks for the formula and it works exactly like you said it would. However, I am limiting my results to where I will need at least 5 occurrences to get a proper score.
With your formula, it will show everyone even those that only have one result which is exactly how it is programmed. Is there a way to limit the data to show those that have at least 5 results? Sorry if I didn’t explain it very well and I can try to clarify more if needed.
Caleb
Hi, Celeb,
We can achieve this without much effort.
Assume my data is in Sheet1!A2:B (as per my example above). If so, in another blank sheet (let’s call it Sheet2) in the same file, in cell A2, insert the below Filter.
=filter(Sheet1!A2:B,countif(Sheet1!A2:A,Sheet1!A2:A)>=5)
Use the formula given in my tutorial with this data in Sheet2, not with the one in Sheet1.
Thank you very much for your post. I found it very helpful! I do have a question though.
In my case, I need to get the highest three dates for a trailer number that is NOT greater than today’s date. Any suggestions on how to accomplish this using your methodology? Thank you for your help!
Hi, David,
You want a formula that returns the highest n dates that are less than or equal to today’s date. Also, it should be based on a criterion, i.e. Trailer number.
Eg.
A2:A contains dates.
B2:B contains trailer numbers (for example “XY 10001”)
(titles in A1:B1)
Here in this case, in cell C2 use this formula.
=sortn(filter(A2:A,B2:B="XY 10001",A2:A<=today()),3,0,1,0)
I know the SORTN parameters, i.e.
3,0,1,0,
maybe a little bit confusing.3 = number of values to return ('n')
0 = Show at most the first 'n' rows in the sorted range.
1 = sort column number (date column).
0 = sort in descending order.
This formula may return the same dates if there are duplicates. So here is a more refined one to extract the top 3 dates without duplicates based on a condition.
=sortn(filter(A2:A,B2:B="XY 10001",A2:A<=today()),3,2,1,0)
If you face issues using the above formulas, feel free to share the URL of a mockup sheet.
Hi, I can’t seem to get my formula to work correctly. It says there is an error with the last part of the formula
“Array Formula only takes 1 Argument, but this is argument 2”
On the transactions sheets, the rows are A(Date), B (Category), C (Description), D (Amount), E (Account).
Hi, Aaron,
If you share a mockup sheet, I would be happy to assist you.
Hi Prashanth,
Thanks for all your help with this. I guess now I will learn how to get rid of dashes when importdata via the
=importhtml
command:>) I will let you know if I find out.PS: I tried using rank out of curiosity, voila it works, it correctly highlights the top 10 in each column, and does not highlight the dashes. Formula:
=RANK(C3,C$3:C$121)<=10
. See the example in sheet 3.Regards,
Ron.
Hi, Ron,
Thanks for sharing your formula. It works well!
Cheers!
Hi Prashanth,
Can you please explain how this formula works, I see that it selects the first cell in the range, does the = after C3 mean that it looks in the whole range to look for dashes, but then I do not understand how the dashes are removed, as I cannot see this in the function?
Hi, Ron,
You are absolutely right. It’s called the relative reference. Read more about here – Relative Reference in Conditional Formatting in Google Sheets.
To remove the cell color, I have used white as the cell fill color in that rule. You can even hide the hyphen by changing the font color to white in the same formula rule.
So no need to specify the same in the formula.
Hi Prashanth,
That function works, however as some fields have dashes, it highlights them as well, thereby creating additional highlighted fields. How do you exclude dashes in the function for the given range?
Here is the link to the file: removed_by_admin
Hi, Ron,
Added a new rule to “Sheet2” in your file to remove highlighting the rows containing the hyphen character. It’s below.
=C3="-"
The above should be the first rule in the order.
Hi Prashanth,
Thanks for the code it works a treat. If I want to confirm the top 10 in each column I just use Rank
<=10
, instead of max, correct.Hi, Ron,
I think it won’t work. Try this formula.
=B$3:B$15>=large(arrayformula(n(B$3:B$15)),10)
This range is as per my previous answer.
Hi Prashanth,
I have a query about dynamically highlighting or identifying the highest value in each of five different columns (eg say 1 Mth, 3 Mths, 1 Yr, 3 Yrs & 5 Yrs.
What I would like to happen is when I open the Google worksheet it would automatically highlight the highest value in each of the five-column ranges.
Is it possible to do this under Conditional Formatting or does it have to be done via a Query? Please advise how I could achieve this.
Hi, Ron,
Yes! It’s possible using the “Format > Conditional formatting > Format rules > Custom formula is” in Google Sheets.
Assume the columns (1 Mth, 3 Mths, 1 Yr, 3 Yrs & 5 Yrs) to highlight are B3:F (5 columns).
With a single conditional format formula in Google Sheets, we can highlight the max values in each column.
In conditional formatting enter B3:F in “Apply to range” field.
Under “Custom Formula is” field (click the drop-down under “Format rules”) insert the following formula.
=B$3:B=max(B$3:B)
For the range B3:F100, the formula to use would be as below.
=B$3:B$100=max(B$3:B$100)
I hope this helps?
I can’t seem to get this to work for some reason. I even tried removing the “row(A2:A)-” like the person above suggested, but that resulted in an error stating “Function ARRAY_ROW parameter 2 has mismatched row size. Expected: 999. Actual: 995.”
For reference, the data I’m working with is 3 columns: Name, Team, Score. My goal is to find the highest score from each team and to display the team, score, and corresponding name.
I’ve come very close a number of times, but for some reason, it’s failing to actually display the highest record–it seems to be picking one from the middle.
Also, you explain that this formula adds in an extra column. While I’ve been operating with trust in what you’ve said, I’ve so far failed to identify wherein the logic that is dictated. Would you mind elaborating?
The formula I’ve ended up with is:
=ArrayFormula(QUERY({SORT(B2:D,2,true,3,false),IFERROR(row(B2:B)-match(query(SORT(B2:D,2,true,3,false),"Select Col3"),query(SORT(B2:D,2,true,3,false),"Select Col3"),0))},"Select Col1, Col2, Col3 where Col4<2"))
Column B is the name, Column C the team, and Column D the score.
Thank you in advance for any insight you can provide!
Hi, Punchcruff,
You were very close with your above attempt.
I assume, your data range is B2:D. In that, you want to sort the Team column (column C). You have correctly made changes in the SORT formulas.
But in the inside Query formulas, instead of column 2, the sort column, you have put 3. That’s the only issue.
Here is the correct formula.
=ArrayFormula(QUERY({SORT(B2:D,2,true,3,false),IFERROR(row(B2:B)-match(query(SORT(B2:D,2,true,3,false),"Select Col2"),query(SORT(B2:D,2,true,3,false),"Select Col2"),0))},"Select Col1,Col2,Col3 where Col4<2"))
Regarding the extra column, it is the grouped numbers. To see that use this formula.
=ArrayFormula(QUERY({SORT(B2:D,2,true,3,false),IFERROR(row(B2:B)-match(query(SORT(B2:D,2,true,3,false),"Select Col2"),query(SORT(B2:D,2,true,3,false),"Select Col2"),0))},"Select * where Col4<2"))
The above formula is useful to return the highest 'N' values. In your case, you want to find the highest 1 score from each team, not the highest 'N'. It can simply achieve with the below SORTN formula.
=sortn(sort(B2:D,3,0,2,1),9^9,2,2,0)
See the below two guides to learn the function used here.
1. How to Use SORTN Function in Google Sheets to Extract Sorted N Rows.
2. SORTN Tie Modes in Google Sheets – The Four Tiebreakers.
Best,
My apologies about the column–I knew that wasn’t correct, but had been trying whatever I could, and mistakenly copied the formula without correcting it. However, the formula still returns a value from the middle of each team, not the highest.
Regardless, though, I’ll be trying out the SORTN function, and seeing if that can provide results with more ease.
Thank you for your help!
The SORTN formula you wrote worked for me! Thanks again!
Thanks for the feedback!
Thank you for this tutorial, works like a charm for me.
The only change that I needed to do, is to remove “row(A2:A)-” from the formula