How to Remove Duplicate Values in Excel

by Matthew Kuo on October 12, 2013

in Error Checking, Excel

Five Ways to Find and Remove Duplicate Values in Excel

For people who work with large data sets in Excel, or databases for that matter, one of the most common error checking tasks is the removal of duplicates values.  Some people refer to this process as de-duping.  The main reason this is necessary in data sets is because any time you create a primary key field, you can only have unique values in that column.  Any repeating value is therefore an error.

Duplicate values can pop up in your data set for several reasons.  Thus it is always prudent to check for this error in any data that you receive from another person.  Whenever you want to check for duplicate entries, make sure to note which columns should only have unique values.  Then use one of the five methods described below to identify and remove the duplicates.  One important thing to note:

Each Excel duplicate removal method is slightly different and each has its share of pros and cons.

Therefore, always consider the context of the situation before you choose your approach.

Remove Duplicates 00

Remove Duplicates Data Function

Use When:

  • You need to quickly remove records, or entire rows of data, from a table
  • You are comfortable automatically removing the duplicates without actually seeing which entries are duplicated
  • You have an extremely large set that would be onerous to clean visually

Excel 2010 has a powerful data function that allows you to simply select a set of data and remove duplicate rows.  The data function approach is probably both the simplest and the fastest way to remove duplicate values in Excel.

The key disadvantage of using the data function is that it doesn’t really allow you to do any analysis before you remove the duplicates.  This can be risky if you want to visually confirm that your data is erroneous before getting rid of it.  Since the function automatically deletes the duplicates when you click enter, you won’t be able to perform a quick check of your data before it’s gone.

To use this function, simply select the data set in question.  Then go to the “Data” header in the Excel ribbon and click the button for Remove Duplicates.

Remove Duplicates 01

Within the data set you selected, the function will ask you which columns need to be unique.  If you only select one column, Excel will remove any row that has duplicated values in that column, regardless of the values in any other column.

If you select multiple columns to test, Excel will only remove rows that have the exact same values for all selected columns.  In this situation, assuming that two values are duplicated in the Unique ID field, as long as one of the other selected fields is different, Excel will not delete that row.  Checking all the columns is a better approach if you only want to remove rows that are an exact duplicate of another row.  You can then use one of the other methods described below to investigate any additional duplicates in the Unique ID field.

Remove Duplicates 02

After the process is completed, the data function tells you how many duplicates were identified and removed.

Remove Duplicates 03

Conditional Formatting

Use When:

  • You have a data set that is small enough to be cleansed visually
  • You want to visually identify, analyze, and confirm your duplicates before removing them
  • Your data is complex enough that highlighting of duplicates will help identify them

One of the most useful conditional formatting functions in Excel is the ability to highlight duplicate values.  To use this function, simply highlight the data in question.  Within our sample data set, since we are only concerned about duplicate values in the Unique ID field, we’ll highlight that specific column.  Then under the “Home” tab of the Excel ribbon, click:

  • Conditional Formatting
  • Highlight Cell Rules
  • Duplicate Values

 Remove Duplicates 04

If you use the default setting, your cells will be highlighted in red.  Since conditional formatting doesn’t automatically remove duplicates for you, you’ll have to manually delete them.  Because of this limitation, the conditional formatting approach is primarily useful for smaller data sets, where you can reasonably scan through the entire data set to make edits.

Remove Duplicates 05

Pivot Table

Use When:

  • You are creating a separate list of unique values
  • You want to compare that the count of unique values with the original data set to see if duplicates exist
  • You want to perform a roll up of the data using only the unique values

Using the Pivot Table approach is best if you want to quickly produce a unique list of values for a single column.  After you’ve done that, you can easily see if duplicates exist by comparing the count of the unique list to the count of your original data column.

The pivot table option is also good if you have a situation where having repeating values isn’t necessarily wrong, you just need to perform a roll up of any repeating values that exist.  For example, you might be working with a database where duplicate entries actually represent individual transactions and need to summed together rather than cleansed.

It’s important to remember that with Pivot Table, you are basically creating a brand new data set.  Therefore, if you actually need to make edits to your original data set, it would make sense to use one of the other methods.

To insert a Pivot Table, select the data set in question.  Then go to the “Insert” tab of the Excel Ribbon and click Pivot Table.

Remove Duplicates 06

Additionally, you can use the shortcut:  ALT + D + P + F

Once you’ve created your Pivot, simply grab the field name of the column you want to remove duplicates from and drop it into the Row Labels box.  After doing so, a unique list of values from that column will appear.

Remove Duplicates 07

If you do need to roll up your data based on this new list of unique values, simply use the SUMIF formula to sum specific values from the duplicate entries together.

Sort the Data

Use When:

  • You have a data set that is small enough to be cleansed visually
  • You want to visually identify, analyze, and confirm your duplicates before removing them
  • Your data is simple enough that you can easily tell the values apart

Sorting your data table is one of the fastest ways to find duplicates in Excel.  Assuming your data set is small and has simple values, a simple sort should allow you to see if any duplicates exist.

However, if you have complex values in your column, such as a ten digit alpha numeric code, it’ll be very difficult for you to see the repeating values visually.  In that case, you should use conditional formatting.

To sort the data, go to the “Data” tab of the Excel ribbon and click any one of the sort buttons.

Remove Duplicates 08

Additionally, to use the shortcut function, you can just select the column in question and type either:

ALT + A + S + D (for sort descending)
ALT + A + S + A (for sort ascending)

Advanced Filter

Use When:

  • You only want to see unique records in your data set
  • You don’t want to delete any records and would prefer just to hide them

The advanced filter works by hiding records that are the same as any other records in your data set.  Please note, that unique in this case is defined by the record, an entire row of data, not just an individual value.  Therefore, advanced filter will only filter out a row if the entire row is duplicated.

Because of its definition of unique, using the advanced filter function has a key disadvantage:  it won’t necessarily filter out repeating values in a primary key column.  If you have a table where two records have the same Unique ID, as long as one of the other fields is different, neither record will be filtered.  Another disadvantage of using this method is that, with a large data set, it can be difficult to see if any records we filtered out.

To use the advanced filter feature, select your data set and go to the “Data” header in the Excel ribbon.  In the “Sort & Filter” section, click the button for Advanced next to the large filter button.

Remove Duplicates 09

A message box will pop up with an option at the very bottom allowing you to filter so that you only see unique records.  Check the box and click OK.

Remove Duplicates 10

All duplicate rows will now be hidden.

Remove Duplicates 11

{ 1 comment… read it below or add one }

Jems May 17, 2014 at 4:12 am

This software is use to remove all comments and all duplicates in excel files.
http://www.windowindia.net/help-power-xl.html

Reply

Leave a Comment

Previous post:

Next post: