Generating sequence numbers from cluster values [VLOOKUP to the rescue]

Posted on June 2nd, 2016 in Excel Howtos - 12 comments

Last night I got an email from Joshua, one of our readers with the subject – Hard Excel problem. Hard?!?, at this stage of summer, the hard problems seem to be (in no particular order),

  1. Lack of good quality mangoes to eat
  2. Intense heat and humidity
  3. Lack of good quality mangoes to eat

Yes, I like mangoes.

Any how, back to Joshua’s email, So I got curious and read it. He is facing an interestingproblem.

I have a very difficult inquiry I am hoping you might be able to solve…

Is there a formula (i.e., without using VBA) that will look at another columns values and provide a new sequential number (i.e., reordered) when the value changes; however, keep the same sequential number for the duplicates?

Below is a table with two columns. […] I now need to rank order those cluster groups. Since cluster 12 appears first it would get a value of ‘1’ and all of the cluster 12’s should now be a ‘1’. Since cluster 4 appears next it would get a rank of 2, etc…

generate-squence-numbers-problem

Well, it is an interesting problem for sure. But hard problem, it isn’t. For really hard problems, refer to my list above.

So how to generate the sequence numbers?

Logic: If a value is already listed, we fetch corresponding sequence number. Else, we generate a new sequence number.

Implementation: Simple, we use VLOOKUP.

Assuming the cluster values are in column B, from B4 onwards, in C4, write

=IFERROR(VLOOKUP(B4,$B$3:C3,2,FALSE), SUM(MAX($C$3:C3),1))

Let’s examine the formula.

VLOOKUP(B4,$B$3:C3,2,FALSE) portion: This one looks value in column B and tries to find corresponding sequence value in column C.

SUM(MAX($C$3:C3),1) portion: Gives us next sequence number

IFERROR(VLOOKUP(…), SUM(…)) portion: This does the magic of choosing either existing sequence number or generating a new one.

For more, read about VLOOKUP and IFERROR formulas.

sequence-generation-formula-explained

Sequence number generation – Example spreadsheet

Play with the sequence number generation spreadsheet embedded below or Click here to grab a copy of the file.

How would you generate the sequence numbers?

Its your turn to take a crack at the hard problem. How would you solve it? Go ahead and share your answers in the comments.

More hard problems – solved:

Hard problems are not new at Chandoo.org. We take lob vlookups and sumproducts regularly to crack them. Here are few examples:

Written by Chandoo
Tags: , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

12 Responses to “Generating sequence numbers from cluster values [VLOOKUP to the rescue]”

  1. Anish says:

    What about IFERROR(VLOOKUP(B4,$B$3:C4,2,FALSE),MAX(C3:C4)+1), can omit one sum function.

  2. Jacob says:

    I know it is simple, but not EASY.. Thanks to inspire me.

  3. Nikki says:

    This is going to be very useful, thanks Chandoo.

    Fellow mango lover 🙂

  4. Sandeep says:

    Hi Chandoo,

    I seen your site recently and learned excel a lot thanks a ton for all your contributions..........

    The solution striked to me for the above is

    In column E we can filter all the duplicates (these are dynamic) and In column F serial no 1 to 10 (These are static)

    In column B we use vlookup formula...

    This may be not the accurate solution but I given one more solution...

    Plz correct me if i am wrong

  5. Kuldeep says:

    Awesome as usual 🙂

  6. Cary says:

    I would have used the following formula in cell C4 and copied it down to the remaining cells (although your formula is much more concise!) :

    =IF(COUNTIF(B$1:B3,B4),INDEX(C$1:C3,MATCH(B4,B$1:B3,0)),MAX(C$1:C3)+1)

  7. David N says:

    Unlike the solution from Chandoo's file and the solutions from previous comments, this one is different because it does not reference the answers/results from previous rows. It is shown referencing B4 and could be copied down to cover the other rows, but getting the correct answer for row 10 (as an example) does not require that the formula be entered for rows 4-9 as well.

    =SUMPRODUCT((MATCH($B$4:$B$20,$B$4:$B$20,0)<MATCH(B4,$B$4:$B$20,0))/COUNTIFS($B$4:$B$20,$B$4:$B$20))+1

    • Anish says:

      Dear David,
      Great formulation but it will be easy to learn if you could please explain how the formula works.

      • David N says:

        Since I mentioned row 10 in my original post, let's use that for this example. The first MATCH assigns a unique number to each value corresponding to the position of its first occurrence in the list.

        =SUMPRODUCT(({1;1;1;1;5;6;6;1;9;1;1;6;5;5;15;1;5}<MATCH(B10,$B$4:$B$20,0))/COUNTIFS($B$4:$B$20,$B$4:$B$20))+1

        The advantage is that those numbers are in a sort of ascending order, very much like using the RANK function, only with position being what is important as opposed to size. Then the combination of the second MATCH and the COUNTIFS is borrowed from a trick for getting a rank (without using the RANK function) where there are no skips in the sequence for duplicate values. It considers if the current value's first position is less than the first position of all other values and divides by the number of times each value appears in the list.

        =SUMPRODUCT({1;1;1;1;1;0;0;1;0;1;1;0;1;1;0;1;1}/{8;8;8;8;4;3;3;8;1;8;8;3;4;4;1;8;4})+1

        This effectively creates an array of fractions that allow us to cheat our way into an accurate count. Here we see the ones (trues) in the numerator correspond to the eights and fours in the denominator such that we have 1/8 a total of 8 times and 1/4 a total of 4 times, with all other fractions equating to zero.

        Finally, SUMPRODUCT adds up the fractions for a total of 2, and the +1 increments that count because the 20 on row 10 is in fact the 3rd unique value in the list. Technically, I could even have used an LTE (less than or equal to) between the two MATCH functions and avoided the need for a +1.

  8. Matt says:

    Simplest, but less elegant, solution would be to sort the values in B. Then set C4

    =IF(ISNUMBER(B3),IF(B4=B3,C3,C3+1),1)

    Of course, this assumes there is no intrinsic meaning to the cluster IDs (ie, 12 doesn't have to be 1) which isn't exactly clear from the question.

  9. Davidlynne says:

    Its Awesome way to learn the lesson.Thanks for posting this.

  10. Rahul Mangla says:

    I entered numbers in F13 TO F23 and formula started from G16. Default value in G15=1

    IF(ISERROR(VLOOKUP(F16,F$15:G15,2,0))=FALSE,VLOOKUP(F16,F$15:G15,2,0),MAX(G$15:G15)+1)

Leave a Reply