Last night I got an email from Joshua, one of our readers with the subject – Hard Excel problem. Hard?!?, at this stage of summer, the hard problems seem to be (in no particular order),
- Lack of good quality mangoes to eat
- Intense heat and humidity
- Lack of good quality mangoes to eat
Yes, I like mangoes.
Any how, back to Joshua’s email, So I got curious and read it. He is facing an interestingproblem.
I have a very difficult inquiry I am hoping you might be able to solve…
Is there a formula (i.e., without using VBA) that will look at another columns values and provide a new sequential number (i.e., reordered) when the value changes; however, keep the same sequential number for the duplicates?
Below is a table with two columns. […] I now need to rank order those cluster groups. Since cluster 12 appears first it would get a value of ‘1’ and all of the cluster 12’s should now be a ‘1’. Since cluster 4 appears next it would get a rank of 2, etc…

Well, it is an interesting problem for sure. But hard problem, it isn’t. For really hard problems, refer to my list above.
So how to generate the sequence numbers?
Logic: If a value is already listed, we fetch corresponding sequence number. Else, we generate a new sequence number.
Implementation: Simple, we use VLOOKUP.
Assuming the cluster values are in column B, from B4 onwards, in C4, write
=IFERROR(VLOOKUP(B4,$B$3:C3,2,FALSE), SUM(MAX($C$3:C3),1))
Let’s examine the formula.
VLOOKUP(B4,$B$3:C3,2,FALSE) portion: This one looks value in column B and tries to find corresponding sequence value in column C.
SUM(MAX($C$3:C3),1) portion: Gives us next sequence number
IFERROR(VLOOKUP(…), SUM(…)) portion: This does the magic of choosing either existing sequence number or generating a new one.
For more, read about VLOOKUP and IFERROR formulas.

Sequence number generation – Example spreadsheet
Play with the sequence number generation spreadsheet embedded below or Click here to grab a copy of the file.
How would you generate the sequence numbers?
Its your turn to take a crack at the hard problem. How would you solve it? Go ahead and share your answers in the comments.
More hard problems – solved:
Hard problems are not new at Chandoo.org. We take lob vlookups and sumproducts regularly to crack them. Here are few examples:













7 Responses to “Extract data from PDF to Excel – Step by Step Tutorial”
Dear Chandoo,
Thank you very much for this and it is very helpful.
However, all the Credit Card Statements are now password protected.
Please advise how can we have a workaround for that
Hello sir,
How to check two names are present in the same column ?
Thanks and Regards
Hi, Thank you for the great tip. One problem, when I click on get data >> from file, I don't see the PDF source option. How can I add it?
I tried to add it from Quick Access toolbar >>> Data Tab, but again the PDF option is not listed there.
I am using Office 365
Hi, Thank you for your video. I see you used the composite table, but I when I load my pdf, it does not load any composite table. It has 20 tables and 4 pages for one bank statement. I have about 30 bank statements that I want to combine. Your video would work except that I can't get the composite table and each of the tables I do get or the pages does not have all the info. what to do?
Dear Chandoo,
How do we select multiple amount of tables/pages in one PDF and repeat the same for rest of the PDF;s in the same folder and then extract that data only on power query.
Thank you
Hi, Thank you for your video. I see you used the composite table, but I when I load my pdf, it does not load any composite table. It has 20 tables and 4 pages for one bank statement. I have about 30 bank statements that I want to combine. nice share
One bank statement takes up 20 tables and four pages in this document. I need to consolidate roughly thirty different bank statements that I have. Your video would be useful if I could only get the composite table, which I can't for some reason, and each of the tables or pages that I can get is missing some information.