• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

[Help Required] Hiding Rows based on duplicate values +1 and conditions

Lojik

New Member
Good Morning all,


*What we have*

I have a task at work that requires a little help from some knowledgeable people and would he highly grateful for any assistance anyone can provide


We have exported our entire local client database into excel format, this database contains 7859 rows of data in the following format:

A / B / C / D / I / J / M / O / P / R / U

Code / Full Name / UNLOCO / City / Created By / Created Time / Address 1 / Email / Fax / Phone / Mobile


Some rows have been hidden as the data is not required but will be needed when we "reimport" so I have hidden them for now


an example of one line would be:

132MARMEL / 132 MARKETING / AUMEL / COLLINGWOOD / RR / 28-Jul-09 15:22 / 36 WELLINGTON STREET / randomemail@optusmet.com.au / +61 (3) 9999-4444 / +61 (3) 9999-2222


*Our problem*

Due to overlapping data from older systems we have duplicates in our export file that have been created over time, these duplicates show up with a "1" at the end of the "Code", example below:


132MARMEL1 / 132 MARKETING / AUMEL / COLLINGWOOD / RR / 28-Jul-09 15:22 / 36 WELLINGTON STREET / randomemail@optusmet.com.au / +61 (3) 9999-4444 / +61 (3) 9999-2222


I have added a condition to bold and colour and code with a "1" at the end.


BUT


Due to the system using a specific way of coding not ALL the codes with "1" at the end are duplicates, for example "Bobs Manufacturing MEL" would be BOBMANMEL, but "Bobby Manufacturing MEL" would be BOBMANMEL1 as it can't be the same code and are different companies


The only way currently I can tell if a line with "1" a the end is to check the other data and see if it matches, I have added conditions to the document so that any Address that is a duplicate or number that is a duplicate is highlighted in Red.


This is great for me to look over and start deleting the duplicates


BUT


with 7859 lines to check over it will take too long and I need this done in a time frame


*what I need help with*

I need a VB code that will hide rows so that ONLY the codes with the "1" at the end AND their partner code "without the 1" are showing, for example BOBMANMEL and BOBMANMEL1 would not be hidden


Then I can check only the 1 codes vs their partner and see if any have different addresses, numbers and delete the rest.


Your help is highly appreciated


Regards

Jack
 
Hi, Lojik!


No need of a VBA procedure, you can do it with just three formulas and filter options.


Assuming your data start at row 2 and ranges from A:U columns, place these formulas:


AA1: Helper

AB1: General Count

AC1: Individual Count


AA2: =B2&C2&D2&I2&J2&M2&O2&P2&R2&U2

AB2: =CONTAR.SI(AA:AA;AA2) -----> in english: =COUNTIF(AA:AA,AA2)

AC2: =CONTAR.SI(AA$2:AA2;AA2) -----> in english: =COUNTIF(AA$2:AA2,AA2)

and copy down as required (2:7860).


You'll get in AB column the no. of ocurrences of that row and in AC the relative count. Then just filter on AC column for value 1 for unique rows or for value <> 1 for duplicate rows, depending on if you're going to copy the unique values or delete the duplicate ones.


Just advise if any issue.


Regards!
 
Good Morning SirJB7,


Thank you for the advice, this may work, a few issues that I can see though, this would require AA to contain identical values to show a number in AB correct?


If so I will need to restrict the data in AA to what I am checking for in duplicates?


I ask this as some lines have identical addresses but not an identical number (same building but different business) and some lines have an identical numbers but different address (same company but two different offices) also data entry errors play into effect and there may be a spelling mistake on one line compared to its "1" duplicate


Am I correct in my assumption?


I have a separate list of all the codes with "1" at the end I need to check, is there a formula I can use to show only the lines with a "1" at the end and their duplicate code that does not have the 1? then maybe I can filter from there?


Edit for clarification, all characters before the "1" on the code are based of the company name and my task is to weed out the duplicate lines and set aside the unique lines that end in 1, I have a list of all the codes ending in "1" and I need to seperate using my master list of all clients which ones are unique companies and which ones are duplicates of a company already in our system


Jack
 
Hi, Lojik!


AA column should contain the whole chain of data to decide uniqueness, that's why my formula only used columns B:D, I:J, M, O:p, R & U, assuming that non used columns might hold different data which shouldn't be checked, if they had. So the answer is yes, you should restrict the concatenation to the columns that you actually want to check.


Regarding typo errors it'd be hardly difficult either with formulas or VBA to handle all possible combinations and discern when it's a typo error or a different data. This formulas let you certainly identify duplicated values, but not approximated ones, I think that those might and should be manually analyzed. At least without having seen the actual data.


Since the 1-at-the-end could be either a valid code or a manually set indicator of duplicity I explicitly excluded column A from the concatenated chain. That's to say I didn't consider the column A data and just kept the 1st occurrence (column AAC = 1). If you are to provide a way to include the code in the test, please explain it (in other words if you've yet did it as then I didn't understand) and I'll try to adapt the formulas.


Now if you have a separate list of the codes ending in 1 that you should check, you could try entering these formula in column AD, assuming your gray list is in column A on worksheet Sheet2:

=SI.ERROR(BUSCARV(A2;Sheet2!A:A;1;FALSO);"") in english: -----> =IFERROR(VLOOKUP(A2,Sheet2!A:A,1,FALSE),"")

and then filter for non blank values.


Regards!
 
Hi SirJB7,


Thanks, I have adapted your first formula to work for me, the "code" column is generated by our system so there are no data entry errors on that column, so what I did was I filtered the entire list by "ending in 1" and then added another column with the below code:

=LEFT(A2,LEN(A2)-1)

and filled down


This left me with a column of codes without the "1" on the end


I then added another column with the below code:

=IF(AM2="",A2,AM2)


"AM" being my new column, this gave me a full column of all the codes including the "1" codes but without the 1's


I then added another column and used your formula:

=COUNTIF(AM:AM,AM2)


This gave me a column of 1's and 2's, the 2's being the duplicate codes that have a "1" and the 1's being unique with no duplicate, I then filtered by the column having the number 2 and was able to review the 682 duplicates from there


Thanks for the help


was very insightful :D
 
Hi, Lojik!

Glad you solved it. Thanks for your feedback and for your kind words too. And welcome back whenever needed or wanted.

Regards!
 
Back
Top