• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula that checks for duplicates, and then indicates first occurrence

Grumpy88

Member
Hi.


Can anyone assist please? I have a column full of names (call it Column A), many of which are duplicated many times over. Can anyone come up with a formula that I can place in the column alongside (Column B), which will a) search Column A for any duplicate entries that match the name in the same row as the formula in Column A, and then return the result "yes" in Column B in the same row as the first occurrence of that name in Column A.


In other words, if the name "John Smith" appears altogether 20 times in Column A, the word "yes" must appear next to it in Column B in the first row in which "John Smith" appears (i.e. the smallest row number - the other 19 "John Smith" occurrences should be blank in Column B).
 
i typically use the coutif formula for this:


in cell A2 write: =COUNTIF($B$1:B2;B2)


so as you pull that formula down, the count range top most border will stay at B1, but the bottom range will move along with the drag down. Meaning, the first occurence will be 1, the 2nd 2, etc etc.


if you nest that formula in an IF, you're set!


=if(COUNTIF($B$1:B2;B2)=1;"Yes;"")


ps: i use ; as a seperator, i know internationally a comma is used as well, adapt accordigly :)
 
Grumpy88,


You could also use the first formula by MrBramme in conditional formatting as well.


Regards
 
Sorry, another issue has popped up that I didn't originally anticipate. How would I adapt that formula if I wanted to include another criterion into the outcome?


For example, I want the "yes" in Column B for the first instance of a name in Column A, but only when there is also a certain entry in Column C. Thus, a "yes" must appear only next to the first combination of a given name and a given entry in Column C. It should not appear again for any further instances of that particular combination. However, should the same name in Column A appear next to a different entry in Column C, then the "yes" must again appear in the first row of that combination, etc.


Maybe in retrospect the wording should differ for each different type of Column C entry, such that it shouldn't always be "yes".


Am I explaining myself well?
 
If i'm reading your post right, a countifs is the way to go.


similar to this (the columns wont be right, i'm a bit confused as to what is where in your file :) )


=COUNTIFS($B$1:B2;B2;$C$1:C2;C2)


so it counts the combo of column B and C in this manner, the nesting in the IF formula should be identical.


Is that what you needed? :)
 
I'm tying myself up in knots here! In more practical terms, what I'm trying to do is the following:


I have my list of names in Column H and the additional criterion to be matched to each name in Column F. There are three different criterion options in Column F, such that the same name in Column H could be linked (i.e. be in the same row) to one, two or all three of the Column F options at one stage or another. There are also a few names that are not linked to any of the three Column F options.


Thus I need the formula (to be placed in Column I) to say "yes 1" for the very first combination of a given name with the first option in Column F (but never again for any reoccurances thereafter), "yes 2" for the very first combination of a given name with the second option in Column F (but never again for any reoccurances thereafter), and "yes 3" for the very first combination of a given name with the third option in Column F (but never again for any reoccurances thereafter).


It will thus happen that a given name in Column H could have all of a "yes 1", a "yes 2" and/or a "yes 3" next to it at some point in Column I (in the appropriate row), but never more than those three entries in total for any one name). For those names that do not have any of the three Column F options in the same row, a "-" should appear in Column I for them instead.


Have I explained that properly? Essentially there are three different COUNTIF formulas at play here, which as a combined single nested formula is looking for the first occurrence of each name with each of the three Column F criteria.
 
Okay.


Playing around I've eventually managed to figure it out. This formula seems to have worked:


=IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion One")=1,"yes 1",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion 2")=1,"yes 2",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion 3")=1,"yes 3","-")))


Couldn't have done it without you, MrBramme!
 
Actually, on closer inspection it's not perfect.


In the vast majority of cases it has produced the correct "yes" outcome, but for certain names for some reason or other the formula has generated an incorrect "yes" number. There's no reason for it that I can see, as the Column F entry is the same for that name as for the name in the row immediately above it (which is displaying the correct "yes" response), yet for the other names it will generate "yes 2" for example, and for that name alone it will show "yes 1".


Interestingly enough, for that same "problem" name it will continue to display "yes" outcomes, even though the name has appeared previously, which leads me to believe that there is some problem with the name itself. Can't think what or why though.


Very perplexing!
 
Hi ,


Can you try this ?


=IF(NOT(OR(IFERROR(SEARCH({"Criterion 1","Criterion 2","Criterion 3"},F2),0))),"-",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,F2)=1,"yes"&MATCH(F2,{"Criterion 1","Criterion 2","Criterion 3"},0),""))


Narayan
 
That produces the output "TRUE" in every row.


Do the curly brackets mean that I must enter it as an array formula?
 
Hi ,


I don't understand how it can produce TRUE in every row , since that is not the output I got !


Can you copy + paste just one row of your data ?


Narayan
 
Wow!! I have no idea what you did in that formula, but that seems to have solved the issue with those problem names! I must sit down and analyse it.


I always have such mixed emotions when posting in this forum - on the one hand I'm ecstatic about all of the excellent help that I get, but on the other hand I generally feel a little depressed about how much I don't know about Excel!


Thanks so much!
 
Hi ,


I'm also having mixed emotions ! Glad that your problem has been resolved , and sorry that you think your knowledge of Excel is inadequate. I am sure all of us are learning something new every day.


Narayan
 
Just another quick question though - where in your formula are the output results hidden, i.e. if I wanted to change the output to something more meaningful than "yes1", where would I tweak that?


Thanks.
 
Okay, I see that the "1" , "2" and "3" are position indicators that come from the MATCH formula used. So what I would then ideally like is to replace that output with specified text instead.


Not essential, but it would be nice!
 
Hi ,


The portion :


"yes"&MATCH(F2,{"Criterion 1","Criterion 2","Criterion 3"},0)


inserts the text "yes1" , "yes2" or "yes3" ( without the quotes ).


The MATCH function returns 1 , 2 or 3 depending on whether Criterion 1 or Criterion 2 or Criterion 3 was present in column F.


If you want that different text strings have to be output for the three Criteria , then you will have to replace the above segment by :


CHOOSE(MATCH(F2,{"Criterion 1","Criterion 2","Criterion 3"},0),"text 1","text 2","text 3")


Depending on whether the MATCH function returns 1 , 2 or 3 , the CHOOSE function will output one of the text strings "text 1" , "text 2" or "text 3" ( without the quotes ).


Narayan
 
Awesome! That worked a charm, thanks.


Any idea what the problem was with my initial nested multiple IF formula?
 
Hi ,


What I can think of is that the Criterion One / Criterion 2 / Crieterion 3 is being specified explicitly ; in the formula :


=IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion One")=1,"yes 1",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion 2")=1,"yes 2",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,"=Criterion 3")=1,"yes 3","-")))


what you are doing is checking whether the data in the current cell in column H has already occurred anywhere earlier in the column , and also whether the text "Criterion One" or "Crierion 2" or "Criterion 3" has occurred anywhere earlier in the column F.


Now suppose we are in row 7 , and the current content of F7 is "Criterion 2" , and the current content of H7 is "Name 3" ; the formula in I7 will check whether the combination "Criterion One" and "Name 3" has occurred anywhere earlier in the rows 2 through 6 ; it will also check for the combination "Criterion 2" and "Name 3" as well as the combination "Criterion 3" and "Name 3" ; which ever of these checks returns the value 1 , will output a corresponding "yes 1" , "yes 2" or "yes 3".


So , if the combination "Criterion One" and "Name 3" has occurred only once earlier , it will output "yes 1" even though the current criterion in F7 is "Criterion 2" !


So , the COUNTIFS function has to look for the combination ($F$2:F2,F2,$H$2:H2,H2).


Narayan
 
Okay, I get it - I think.


It's just strange to me that it did that for some names (consistently) and not for others. I suppose that if I really applied my mind to it I could work out what was different about those particular names, but it seems scant worth the effort considering that I do now have a formula that works!


There is definitely something to be said for letting sleeping dogs lie, in my opinion!
 
Hi ,


If you really want to go ahead with your formula , try this version and see if it is OK :


=IF(F2="Criterion One",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,F2)=1,"yes 1",""),IF(F2="Criterion 2",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,F2)=1,"yes 2",""),IF(F2="Criterion 3",IF(COUNTIFS($H$2:H2,H2,$F$2:F2,F2)=1,"yes 3",""),"-")))


Narayan
 
I appreciate the effort, thanks, but am more than happy with yours!


I will compare that one to mine though, and see where I went wrong.


Much obliged.
 
Would I really be pushing my luck if I asked for a further adaptation of the formula??!


I have realised that, handy as this formula is to me, it is still not quite going to help me to the extent that I require in subsequent filtering and pivot table analyses.


Can this formula therefore please also be expanded to include a third criterion in Column F, such that the CHOOSE(MATCH(F2,{"Criterion 1","Criterion 2","Criterion 3"},0),"text 1","text 2","text 3") portion will display the "text 1","text 2" or "text 3" output not just in the first row where a specific name and criterion combination is encountered for the first time, but also in all subsequent rows of that combination that share the same entry in Column F?


I need this because the same name / criterion combination can occupy several rows that have the same Column F entry, and I'd like all of these rows to be marked with the "text 1","text 2" or "text 3" formula output.


Sorry to be such a pain!
 
Back
Top