• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

creating an ID column

Manuela Imthurn

New Member
Hi,

I have a list of names - more than 1000. These names are listed in an order of appearance - to be exact, by date. Some people are listed 2, 3 or even more times. Unfortunately, some people have the same name (to complicate things... though - they have a different birth date).

How can I create am ID column that will count the number of people, but put the same number for the same person?

For example: If my name were the first and I would appear again in observation 100 - how can I make my number be 1 in observation 100 as well.

Manuela
 
Hi Manuela,

Welcome to the forum, what you could try is putting the DOB in another column, but format the cells to General to generate an ID number. Dates are numbers so the same person that appears a number of times will have the same ID number.
 
Assuming first name is in column A, Surname in column B, Birthdate in column C: Place the following in D2 and copy down
=A2&B2&C2COUNTIF($C$2:$C2,C2).
 
Assuming first name is in column A, Surname in column B, Birthdate in column C: Place the following in D2 and copy down
=A2&B2&C2COUNTIF($D$2:$D2,D2).


Thank you, I will try this! unfortunately, the fist and last names are together in A2, but I will try with A2&B2.
 
Good day Manuela Imthurn

It is not a good idea to have both first and last names in the same cell/column, what if you only have part of the name to do a search on?
You should do a text to column and split the name data, or any other data that share a single cell such as the address! the spread sheet will so become a cubersome monster.
Read the rules on normalization.
 
Hello Bobhc,

you have a good point. This was a data set that was given to me. I am trying to get it ready so that I can read it into R. How do I split a cell with text.
Column A is formatted in this way:

IMTHURN,MANUELA ESTHER

Is there a way I can separate the name into 2 or 3 cells? But what I really want is to give them numbers and "de-identify" the data.
 
Last edited:
Assuming first name is in column A, Surname in column B, Birthdate in column C: Place the following in D2 and copy down
=A2&B2&C2COUNTIF($C$2:$C2,C2).


Unfortunately, this does not work - it counts the number of times they appear... so most people are 1 - they only appear once.
 
One approach might be to get a clean list of non-dupes (hence my original approach), number those 1-whatever, then, on your master sheet with the dupes, do a Match or INDEX/MATCH to get the simliar numbers there. Perhaps a sample upload might help?

I guess I'm unclear as to what the end result is...a clean list of no duplicates, or, just a list of data that's ok to have dupes in?
 
One approach might be to get a clean list of non-dupes (hence my original approach), number those 1-whatever, then, on your master sheet with the dupes, do a Match or INDEX/MATCH to get the simliar numbers there. Perhaps a sample upload might help?

I guess I'm unclear as to what the end result is...a clean list of no duplicates, or, just a list of data that's ok to have dupes in?



Thank you!

This is very helpful!

yes, in the end I need the original data with the duplicate names - but just in numbers. I need to "de-identify" the data so instead of names, I need numbers.
 
Yeah...if you just want basic numbers, you can use the Match function on it's own to get the "value". Since they'll be unique you can base your IDs off of that.

Be aware though that you also need to create the A2&B2&C2 combo on the 2nd sheet to do the "match". If you post a sample, I can try to mock it up...
 
Hello Manuela,
An alternative, that keeps everything in one column, could be something like this.

Assuming Name in column A, DOB in column B, ID to go in column C. Enter an ID number in cell C2 (e.g. 1001), then in cell C3 down, enter this:

{=IF(ISERROR(INDEX($B$2:$B2,MATCH(A3&B3,$A$2:$A2&$B$2:$B2,0))),MAX($C$2:C2)+1,INDEX($C$2:$C2,MATCH(A3&B3,$A$2:$A2&$B$2:$B2,0)))}

Use CTRL+SHIFT+ENTER instead of Enter, as it is an array formula.

Essentially, for each row, it does an Index/Match for the range down to the row above it, searching for an instance of the Name & DOB together, if it doesn’t find one (implying that the current row is the first instance of that person), it gives an error, and assigns a new ID number by adding 1 to the highest ID number so far. If it isn’t an error, it means it has found a previous instance of the Name&DOB combination, and it performs an index/match on Name&DOB to give C, the ID already assigned.

I’ve attached an example worksheet.

Hope that helps.
 

Attachments

  • Creating an ID column.xlsx
    11.7 KB · Views: 5
Hello Manuela,
An alternative, that keeps everything in one column, could be something like this.

Assuming Name in column A, DOB in column B, ID to go in column C. Enter an ID number in cell C2 (e.g. 1001), then in cell C3 down, enter this:

{=IF(ISERROR(INDEX($B$2:$B2,MATCH(A3&B3,$A$2:$A2&$B$2:$B2,0))),MAX($C$2:C2)+1,INDEX($C$2:$C2,MATCH(A3&B3,$A$2:$A2&$B$2:$B2,0)))}

Use CTRL+SHIFT+ENTER instead of Enter, as it is an array formula.

Essentially, for each row, it does an Index/Match for the range down to the row above it, searching for an instance of the Name & DOB together, if it doesn’t find one (implying that the current row is the first instance of that person), it gives an error, and assigns a new ID number by adding 1 to the highest ID number so far. If it isn’t an error, it means it has found a previous instance of the Name&DOB combination, and it performs an index/match on Name&DOB to give C, the ID already assigned.

I’ve attached an example worksheet.

Hope that helps.



Dear Tom,

Today I was going to start working on my data set again, when I saw your answer to my request. The words I use right now, do not do justice to how much you have helped me! This is wonderful. It is exactly what I needed. I have put in about 4+ hours trying solutions and reading information about indexing. The attached worksheet was also great help.

Manuela
 
Back
Top