Here’s a dynamic array formula approach to merge duplicate rows and keep the latest values in your Excel spreadsheet. Before diving into the formula, let’s discuss when this method is useful.
Why Merge Duplicates and Keep Latest Values?
Assume you’re cleaning a student database. A student may have multiple entries, but you want to keep only the most recent one and remove the older records. However, the latest record might have missing data in fields, and you may want to fill those from the previous entries before deleting duplicates.
Generic Formula
=DROP(REDUCE("", SEQUENCE(n_col), LAMBDA(acc, val, HSTACK(acc, MAP(UNIQUE(id_col), LAMBDA(rowV, CHOOSEROWS(TOCOL(FILTER(INDEX(range, 0, val), id_col=rowV), 1), -1)))))), 0, 1)
Explanation of Parameters
n_col
– Replace with the number of columns in yourrange
.id_col
– Replace with the reference to the ID column.range
– The data range from which you want to merge duplicate rows and keep the latest values.
Prerequisites for Merging Duplicates and Keeping the Latest Values
- The dataset must have an ID column. If the same ID appears multiple times, only the second occurrence onward is treated as a duplicate.
- If there’s a date column that tracks entry updates, sort the data first by ID and then by date (ascending order). If there’s no date, keep the data as is. The formula assumes the latest records are at the bottom.
Now, let’s look at a sample dataset and apply the formula to merge duplicates while keeping the latest values.
Sample Data
Below is the sample data stored in the “student data” sheet (range A1:G5):
Student ID | Student Name | Address | Date of Birth | Major | GPA | Enrollment Date |
101 | Roy | 123, Main Street, Rohini | 2001-05-10 | CS | 3.9 | 2025-02-17 |
101 | Roy | 12, Inner Circle, Connaught Place | ||||
104 | Thanvi | B-45, Greater Kailash – II | 2001-07-05 | Physics | 3.8 | 2025-02-18 |
104 | Thanvi | 3.5 |
If you apply the merge duplicates and keep latest values formula, the output will be:
Student ID | Student Name | Address | Date of Birth | Major | GPA | Enrollment Date |
101 | Roy | 12, Inner Circle, Connaught Place | 10/05/2001 | CS | 3.9 | 17/02/2025 |
104 | Thanvi | B-45, Greater Kailash – II | 05/07/2001 | Physics | 3.5 | 18/02/2025 |
Example: Merging Duplicates and Keeping the Latest Values
To process this data dynamically, enter the following formula in cell A1 of a new worksheet:
=DROP(REDUCE("", SEQUENCE(7), LAMBDA(acc, val, HSTACK(acc, MAP(UNIQUE('student data'!A1:A5), LAMBDA(rowV, CHOOSEROWS(TOCOL(FILTER(INDEX('student data'!A1:G5, 0, val), 'student data'!A1:A5=rowV), 1), -1)))))), 0, 1)
This formula merges duplicates and keeps the latest values in Excel by:
- Extracting unique IDs from the dataset.
- Using
FILTER
,TOCOL
, andCHOOSEROWS
to retrieve the latest value for the first unique ID in the first column. - Applying
MAP
to extend this logic across all unique IDs. - Using
BYCOL
to apply this operation to each column individually. - Using
HSTACK
to combine the processed columns horizontally, forming the final dataset.
Things to Know
- The formula uses multiple LAMBDA functions, which may be resource-intensive for large datasets.
- If your dataset is too large, consider merging duplicates in smaller chunks and then combining results using
VSTACK
:=VSTACK(merge_duplicates_1, merge_duplicates_2, ...)
FAQs on Merging Duplicate Rows and Keeping Latest Values in Excel
1. How do I merge duplicate rows in Excel and keep the latest values without VBA or Power Query?
You can use a dynamic array formula in Excel to merge duplicates and keep the latest values without using VBA or Power Query. The formula extracts unique IDs and retrieves the latest non-blank values for each column. See the formula in this guide above.
2. What if my dataset doesn’t have a date column to determine the latest entry?
If your dataset lacks a timestamp or date column, Excel assumes that the latest values are at the bottom of the dataset. Ensure that your data is sorted properly before applying the formula.
3. Will this formula work on large datasets?
This formula is resource-intensive due to multiple LAMBDA
and FILTER
functions. If you have thousands of rows, consider processing data in smaller chunks.
4. Can I use this formula to merge duplicate rows across multiple sheets?
Consolidate your data into one sheet before applying the formula.
5. How do I fill missing values from other duplicate rows?
The formula automatically fills missing values in the latest row using data from older entries.
6. Does this work in Excel 2019 or earlier versions?
No. The formula relies on dynamic array functions (MAP, REDUCE, TOCOL, CHOOSEROWS), which are available only in Excel 365 and Excel 2021.