Jul
10

How to find and delete duplicate data in Excel

Every morning when we have just come to the office, turned on your PC and are sipping your first coffee, you have a few spare minutes to think over your today’s schedule. At this very moment you realize how huge the workload you have in front of you is. E-mails that wait patiently in your inbox folder, papers, reports, summaries, communiques and so on and so forth.

All of us have our own duties, but for many they often turn into a daily struggle with information flaws. Sometimes they become a real chore, fruitless attempts to weed out unnecessary information and find the data needed.

And hardly anybody realizes that all this can be done in a totally different way. Everything can be faster, more convenient and, which is the most important, more effective. It goes without saying that the MS Office Suite is one of the most frequently used applications in the Windows platform. Practical skills of work with Office programs long ago became a must for comfortable work in big corporations, small business and among home end-users. The wealth of its features and functions overwhelms, especially if to consider that end-users hardly exploit 30% of Office abilities.

However, I had one task in Office that could become a real dilemma. At first sight, the task looked quite easy: I needed to process some preliminary reports in Excel and make up a final report. But unexpectedly I got stuck when being delivered a number of Excel tables made by different employees. I found myself faced with the problem of duplicated data. It turned out that finding and removing duplicates in continuously updated Excel documents was taking the lion’s share of my time.

To my regret, the standard Excel Find\Replace function proved to be of no use. Finding data that were in an Excel document or data that shouldn’t be there turned out easier said than done. The standard Excel Find\Replace function (which is, by the way, the only one) is realized in such a way that before starting the search you need to know exactly what you are looking for.

Say, if you need to check a big table for duplicates, it could become a real problem. The Find\Replace function cannot be used just to search for duplicates. You can only search for a certain entry that later can be replaced or removed at your wish. That is, I had to search duplicates for each entry in my table. After that I needed to mark it (once again, manually) and proceed to the next one. In case your Excel table consists of hundreds of rows, it is the problem.

After a short googling on the net, I found a number of plug-ins intended for deleting duplicate data in Excel. Also, I made up a short list of requirements that must be met by the add-in.

  1. The add-in must be not too big in size and easy-to-install and get started. I don’t want to have any problems with the necessity to change my system settings. Also, I don’t want either the stability or speed of the system to suffer.
  2. Upon installation, the add-in should show up immediately and be ready to use (I have lots of other important things to do rather than wasting my time of searching for the add-in through the whole system).
  3. It should have an intuitive and user-friendly interface that wouldn’t set a puzzle to me and differ too much from the standard Excel interface.
  4. Should I have any questions, I want to have a clear and explicit help.
  5. When working with the add-in, I wish to know what is happening in my Excel worksheet, and have an ability to change the output data format rather than start the search from scratch.
  6. It would be nice to be notified if the record being entered already exists in the document.

After some search, I ended up with four plug-ins:

Add-in Vendor Price, USD Trial
Duplicate Remover for Excel AbleBits 29.95 yes
Duplicate Finder Macro Systems 49.95 no
Dupe Wizard Ozgrid 29.95 yes
Excel Unique & Duplicate Data Remover Sobolsoft 29 yes

Duplicate Remover for Excel (from AbleBits):

Upon installation the Duplicate Remover for Excel add-in placed its menu item on the Excel main menu (right to Help). The add-in can be also run by pressing Ctrl-Shift-R. You can see that hint if put the cursor on the Duplicate Remover` icon in the Excel menu.

Duplicate Remover for Excel - Main panel.

The interface is a step-by-step wizard with the possibility to return any number of steps back using the “Back” button. First, select the search mode and the search range (block of cells):

Duplicate Remover for Excel - Compare two tables.

Then, we decide what to search for: duplicates or unique entries.

Duplicate Remover for Excel - Find duplicate values.

The purpose of all options / functions is implied by their names. The search range can be set both before running the add-in (by selecting a block of cells in a sheet) and after the add-in is started (by clicking on the range selection button).

On the next step of the wizard, you can select the columns that will be used for comparison. If columns have headers, you can exclude them from search.

Duplicate Remover for Excel - Check the column for search.

Then, you decide what action to perform with search results. You can have duplicates or single /unique occurences selected, colored, copied or moved to another location and deleted.

Duplicate Remover for Excel - Check the action for a founded duplicates.

In case you need to compare two tables, you will see another window where you choose the worksheet and a region for search.

Duplicate Remover for Excel - Search in a two tables.

If on this step you accidentally entered or selected incorrect data, you will immediately be informed about it and after making necessary corrections will be able to continue the work further.

Duplicate Remover for Excel - User error alert.

Summing up, I can say that although someone may find the step-by-step wizard of the Duplicate Remover not convenient enough, for me this add-in is the friendliest one: it gives the most complete picture of current settings and a chance to change them at any time instead of starting the search from scratch. On the vendor’s web-site a detailed step-by-step instruction is available with lots of screenshots describing all abilities of the add-in.

Excel Unique & Duplicate Data Remover

This Excel Unique & Duplicate Data Remover add-in from Sobolsoft on the second step of the installation wizard suggested that I visit the developer’s web-site to find more information. Then, when I tried to run the Excel Unique & Duplicate Data Remover add-in in my Excel, it turned out to be not that easy. I couldn’t either see its icon in the Excel menu or find the add-in in the “Add-ins” list. Finally, I managed to find its startup file in the “All programs” list, but this time it proved impossible to run add-in in my Excel. It reported that the macros were not digitally signed or verified and suggested either to change the security level to a lower one (which for obvious reasons I was not quite willing to do) or have the maros digitally signed or verified (which, naturally, was impossible). So, I had no choice I set the security level to low (BTW, three other add-ins from the list installed quickly without any problems).

After restarting my Excel, I found The Excel Unique & Duplicate Data Remover menu item in the main Excel menu next to the Next button. Besides, its menu item can be moved to Tools if you wish.

The Excel Unique & Duplicate Data Remover add-in, as well as the other three, is driven by the intuitive and clear menu which is organized as a drop-down list of functions.

It has a number of options for duplicates and uniques, e.g. remove all duplicates, delete duplicates except for the first one, remove blank cells, and similar options for uniques (see the screenshot below). The available functions resemble those of the Duplicate Finder from Macro Systems, but in contrast to the Macro System’s add-in, the Excel Unique & Duplicate Data Remover doesn’t have tabs and all functions are presented in one list. That makes the menu look a bit confused. When starting a new search, you have to look through the whole list again which is not very convenient.

Excel Unique & Duplicate Data Remover - Main menu.

The indisputable advantage of the add-in is the most flexible options to display results.

Excel Unique & Duplicate Data Remover - Result display options.
Excel Unique & Duplicate Data Remover - Result display options - 2

Duplicate Finder (from Macro Systems)

The Duplicate Finder add-in as well as the other two (except for the Sobolsoft plug-in described above), installed quickly with no stumbling.

The menu of this add-in has three tabs: Search in a single list, Compare two lists and Special options. I personally find such approach very reasonable and convenient. The following options are available: coloring rows when entries change, extracting unique rows in a selection, selecting and coloring duplicates in a block of cells and blanking duplicate entries in a selection.

The Duplicate Finder can do the following:

  1. Find duplicates in a list.
  2. Copy duplicates to a new worksheet.
  3. Select duplicate data in a list.
  4. Delete duplicate data in a list.
  5. Find duplicate rows and the first occurrences in a list.
  6. Copy duplicate rows and the first occurrences in a list.
  7. Select duplicate rows and the first occurrences in a list.
  8. Delete duplicate rows and the first occurrences in a list.
  9. Find all 1st and unique occurrences in a list.
  10. Copy all 1st and unique occurrences in a list.
  11. Select all 1st and unique occurrences in a list.
  12. Find entries in one list that are in the second list.
  13. Find entries in one list that are NOT in the second list.

Duplicate Finder - Search in a single list options.
Duplicate Finder - Compare two lists options.
Duplicate Finder - Other options.

After you checked the necessary option, there appears another window to select the search range in which the current selection is shown. The desired block of cells is selected with the mouse. The add-in – user dialog is interactive and doesn’t differ from standard Windows wizards.

Dupe Wizard (from Ozgrid):

At fist sight the Dupe Wizard add-in has fewer functions but it offers a number of interesting features. For instance:

  1. Create a list of unique data. Hide duplicates.
  2. Summary report. The add-in creates a mini Pivot Table report and shows how many times each entry in the list occurs.
  3. Highlight Duplicates. This option will color code all duplicate entries: if an entry occurs twice it will have a Green background, three times – Yellow, more than three times – Red background.
  4. Prevent Duplicates. This option prevents users from adding duplicates to a list or table by showing a pop-up message.
  5. Compare Excel lists for duplicates. It compares 2 lists (sheets, files, workbooks) for matching entries and creates a mini Pivot Table report.
  6. Remove duplicates from a selected range.
  7. Remove entries from one list that exist in the other. The option compares 2 lists for matching entries (items that appear in both lists). Then the user selects the action: remove duplicates from List 1, List 2 or leave unchanged.
  8. Remove all but first occurrence of duplicates.
  9. Create a list of unique data. This option creates a fresh copy of your list on a new worksheet that only contains one occurrence of each entry.

As you can see it the screenshot below, each option has a context help that gives more details about this or that function.

Dupe Wizard - Main window.

Summary Table:

Duplicate finder Dupe Wizard Excel unique duplicate data remover Duplicate Remover for Excel
Installation and first start 4 of 5 4 of 5 3 of 5 4 of 5
Usability 5 of 5 4 of 5 4 of 5 5 of 5
Functionality 4 of 5 3 of 5 4 of 5 4 of 5
Help 5 of 5 4 of 5 3 of 5 5 of 5
Total 4.5 3.75 3.5 4.5

Grades:
5 – Easy, fast, clear, convenient, logical.
4 – Quite clear and logical.
3 – Inconvenient, ambiguous or illogical.
2 – Normal performance is impeded.
1 – Is not working or unavailable.

Thus, no ideal add-in was found. All have something to offer.

The Duplicate Finder from Macro Systems has color codes for duplicate entries based on the number of occurrences, plus the option of preventing users from adding duplicates to a list.

The Dupe Wizard from Ozgrid having a clear and easy-to-understand menu, in its full version offers a summary report on all found duplicate or unique occurrences. However, you cannot try this function till you buy the addin, the trial version doesn’t support it. Another advantage is the context help, but the documentation describing the add-in’s options and functions provided on the vendor’s web-site is complicated and embarrassing. Moreover the add-in doesn’t have any installer, the user should find it in Excel himself.

Excel Unique Duplicate Data Remover has duplicate counters, but huge delays in the work of the Duplicate Row Wizard and no reaction to incorrect settings set by the user largely spoil the whole picture. Also, the developers never cared of any help system or other ways to show its features to inexperienced users.

Duplicate Remover for Excel turned out to be the friendliest add-in in every respect, but the advantage of multi-step wizard may seem arguable to many. Trough, many users may find other solutions too overloaded.

To my mind the most worthy offers are the Duplicate Remover for Excel from AbleBits and the Duplicate Finder from Macro Systems. They applied totally different approaching to the realization of their interfaces, and each has its pros and cons. Macro Systems doesn’t provide any trial or evaluation versions, so you can play with the add-in only after purchasing it. Though, they do offer money back guarantee if the user is not satisfied. AbleBits offers a fully-functional 15-day trial. To tell you the truth, this is more appealing when it comes to paying money. I prefer to know what I pay for.

As to other ad-ins, the Dupe Wizard’s option for preventing duplicates seems to be highly interesting.

After all, besides processing existing documents you have to create some yourself. In my view,

It’s more effective to prevent errors and misprints than remove them later wasting your precious time.

The screenshot above is taken from the vendor’s web-site, it shows a pop-up informing the user that the entry he is trying to add already exists on the table. In a given case, it is “Jane D”. And what if the list contained a few variants of the same entry. Similar but not the same, e.g., : JaneD, Jane D, or Jane d?

And what if there are a few possible ways of writing of the same thing? It is the case with lists of clients, telephone numbers, abbreviations, excess spaces, initials, full names and so on. It looks like complicated and wearisome work, there are no clear-cut criteria, you know. Only fuzzy. This word, in fact, became the key that opened the chest.

The name of the add-in speaks for itself: Fuzzy Duplicate Finder for Excel from AbleBits. Priced at $29.95 this plug-in proved to be able to cope with this task as well as a number of others. You just select a column for search, and the add-in opens the dialog window. You indicate a number of possible misprints in an entry and click Start. In a few seconds, you will see all found occurrences at a glance structured by duplicate groups.

In this list of grouped duplicates, you can make any necessary corrections in a click:

Changes can be made both in a list of found duplicates and in an actual worksheet.

Then you click the “Apply changes” button and the corrections are saved.

In general, this plug-in is a great alternative to manual processing of multiple Excel tables, creating summary reports and polishing off typos and misprints from your worksheets.

Comments are closed.