• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to find the first number in a substring and display all values after the number?

JuliusV

Member
Hi All,

I am trying to figure out how I would go about finding the first number in a string and after that extract all the values after the number.

For example, asdfadsfadsf 3 My Address City, State ZipCode. Therefore, I want to extract 3 My Address City, State ZipCode.

The excel sheet I am working with has 8 separate columns for addresses that I used the "&" to combine everything together. From there, I want to extract a substring from all the 8 columns that I have combined. It is this way because it includes international addresses, but currently I am trying to just address US addresses that includes numbers first for most cases. I want to specifically only extract the address portion.

I asked a similar question last year, though I believe that was much more complicated. Below is the link for reference:
http://forum.chandoo.org/threads/he...cell-text-with-3-edge-cases.16206/#post-98468

I have tried using the formulas there and tailoring it for this situation but it has been futile. any help in this matter is greatly appreciated!

Thank you!!
 
Hi Narayank991,

Thank you for providing a previous thread to reference.

I had 3 additional questions below and have attached a spreadsheet of the data I would be using:

(1) How can you modify the formula below to include the numbers in the block of text?
=MID(A1,MATCH(1,MMULT(-ISERR(-MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})),{1;-1}),)+1,6^6)

From: http://chandoo.org/wp/2013/07/22/fo...ring-after-the-first-block-of-numbers-part-4/

(2) Also, is there a way when extracting the sub string to space the blocks of texts based on each of the address headings? (e.g., During the extraction, Number + Street Name + (Space ) + City + remaining text where each address heading contains a space. The headings would be the following:{Address 3 = Number + Street name} (space) {Address 4 = City} (space) Address 5 = State (space) Address 6 = Zip Code)

(3) Additionally, I am have issues with using the formula presented above. For some reason, when I try to use it, I am only getting 0, rather than the text.

Thank you very much for the help.
 

Attachments

  • Question - Substring including Numbers.xlsx
    10.7 KB · Views: 0
Hi.

Your formula in cell A2 is referencing cell A2 (i.e. referencing itself), hence causing a circular reference. I believe the formula here should be referencing cell B2.

Regards
 
Hi,

Can you post some real data ... I was thinking of alternate formula to combine last 4 columns where you have data..

Regards,
Prasad DN
 
Hi All,

@XOR LX - Thanks for the catch. However, the formula now only gives me #N/A.
@prasaddn - The data provided mimics exactly how the data would be presented. The only difference is the PII info is not there.

I have uploaded the data once again with the updated formula that is still giving me errors, which I realized is because there are spaces in the combined info column.

I still have the following outstanding questions/commentary:

(1) How can you modify the formula below to include the numbers in the block of text and account for spaces in the text?
=MID(A1,MATCH(1,MMULT(-ISERR(-MID(A1,ROW(OFFSET(A$1,,,LEN(A1))),{1,2})),{1;-1}),)+1,6^6)

- @NARAYANK991 - One of the differences between the formula is that there are spaces between my words and for this formula it needs to be a block of words. I tried using & to combine all the segments together, but it retains spaces. I then used substitute (A1, " ", "") to eliminate all the spaces, which allows me to use the current formula, however, the end result I want is the address with spaces as it would normally be presented.

- Additionally instead of adding +1, I know I need to subtract the amount of the numbers before the first instance of the number. How can I modify the +1 portion to accurately assess the amount of numbers that are in the formula so that all numbers are always included?
For example:
- asdfads342asdf - The first would need to start extracting from after position 7
- asd22222222asd - The second would need to start extracting from after position 4.

From: http://chandoo.org/wp/2013/07/22/fo...ring-after-the-first-block-of-numbers-part-4/


Thank you very much for the help.
 

Attachments

  • Question - Substring including Numbers_v2.xlsx
    10.8 KB · Views: 1
Last edited:
Hi ,

I am not sure about the formula posted ; see if this works for all inputs.

=MID(B2,MATCH(0,MMULT(-ISERR(-MID(B2,ROW(OFFSET($A$1,,,LEN(B2))),{1,2})),{1;1}),0),99)

entered as an array formula , using CTRL SHIFT ENTER.

I have not checked thoroughly , but it looks like the first constant array is to take one item and two items alternately ; this will give rise to a 2 column matrix , with as many rows as there are characters in the input string.

The -MID converts this to error values where alpha characters are involved , and proper digits where numeric characters are involved ; the -ISERR converts this so that where there is no error , we will get 0 , and -1 otherwise.

The MMULT by {1;1} converts the 2 column matrix back to a column vector.

Narayan
 
@NARAYANK991 - When I tried using the formula above, it works for only 2 of the 7 situations.
I tried stepping through the function, but I cannot pinpoint why for the scenarios that only have one number, why that first occurrence is not recognized. The error is specific to when there is only one number than a continuing block of text.
For example,
- (i) NameOneNameTwo3AddressCityState20999 - It does not recognize the 3, therefore only 20999 (the bolded part is part of the address).
- (ii) Name19CaliforniaAddressBakersfieldCA20999 - It recognizes the 19, therefore the 19CaliforniaAddressBakersfieldCA20999 (which is the address). This is correct! However, if there is a way to space it that would be ideal!

Any help in understanding why the first number is ignored in (i) would be greatly appreciated!
Please see my attached spreadsheet as reference. (line items highlighted in Red are correct )
 

Attachments

  • Question - Substring including Numbers_v2a - Modified.xlsx
    11.1 KB · Views: 0
Last edited:
Hi ,

The simple fact is you cannot use the formula you are trying to use.

The formula is trying to detect the changeover between a numeric character and an alpha character , which is why you have the constant array {1,2} , where you consider 1 character and then the 2 characters ; where there is a changeover , the 1 character will be numeric , while the two characters together will not be numeric.

Given your requirement , you can use a simpler formula ; see the file.

Narayan
 

Attachments

  • Question - Substring including Numbers_v2a - Modified.xlsx
    12.5 KB · Views: 0
Thank you very much for the help. It makes sense now why the original formula does not work.

Additionally, I am assuming that to accurately break out all the US addresses with spaces, I would need to do some coding to recognize the pattern of:

# Address City, State Zip Code Country
 
My contribution
Define range ={0;1;2;3;4;5;6;7;8;9} with name "IstNumbers" and use

{=MID(B3,MIN(IFERROR(FIND(IstNumbers,B3),"")),9^9)}

on A2 and drag down.

It retains the spaces between the words as I think you intended.

Regards
 
Hi ,

I would imagine that works without CSE, no?

Regards
Hi ,

I find it difficult to remember when CSE is necessary , and when we can do without it. I think the consequences of entering a normal formula as an array formula are not severe , or are they ? I thought that since MMULT is normally a function which needs to be array entered , this too would need CSE ; if it doesn't , thanks for the clarification.

Narayan
 
@Tiago MF - I tried using your suggestion to split the address but it did seem to work for me.

Can you clarify if there is a way to do this? I have attached the file for your reference. Optimally, I would like to be able to split:

3AddressCityState20999
to
3 Address City State 20999
 
Hi Julius

My solution was for the first file you sent so before you added the columns where you remove the spaces. See file attached.

To include spaces (or commas) between blocks of text of different columns, you can do that when you concatenate them (e.g. =C2&" "&C3&" "...), but that can be troublesome depending on how the data reaches you.
 

Attachments

  • Question - Substring including Numbers_v2.xlsx
    11.1 KB · Views: 0
Hi All,

In terms of a solution to extract USA Addresses including PO Boxes, the following is the solution I have used:

=IFERROR(IF($T2="USA",IF(IFERROR(FIND("PO BOX",$V2),0)>0, MID(V2,FIND("PO BOX",$V2),6) &" "& MID($V2,MIN(IFERROR(FIND(IstNumbers,$V2),"")),9^5),MID($V2,MIN(IFERROR(FIND(IstNumbers,$V2),"")),9^5)),""),"")

where IstNumbers is a global variable ={0;1;2;3;4;5;6;7;8;9}
and $V2 is a combination of A1& " "& B2&" "& ect, so you can space the words.

@Tiago MF - I realized the reason the spacing wasnt correct, was because for the situation of A1& " "& B2&" "& ect, the data in each of those cells had spaces, therefore I used TRIM() to eliminat those spaces.

One last question I had is can you please explain how the formula works:
{=MID(B3,MIN(IFERROR(FIND(IstNumbers,B3),"")),9^9)}?

I am usually confused on how to use the MID, LEFT, RIGHT functions appropriately.

Thank you for all of the help in helping me to figure out how to extract the data.
 
Last edited:
Hi Julius

I advise you to read about Text functions right here in chandoo so that you familiarize yourself with their syntax and basic use:
http://chandoo.org/excel-formulas/text-functions.shtml

But basically LEFT lets you extract a user-defined number of characters (argument "num_chars") counting from the leftmost part of a text (argument "text" which refers to a cell or manually input text). RIGHT does the same but counting from the rightmost part of a text. MID you use when you want to extract a portion of text that starts on some other position that not the left or rightmost part of the whole text.

The more trickier uses of LEFT, MID and RIGHT is when you use other text or lookup/reference functions for the arguments of start_num (for the MID funtion; lets you define the position from where you want to start extracting text) and num_chars so that the function is volatile and works in the desirable way independently of the content of the cell.

So on the formula:
{=MID(B3,MIN(IFERROR(FIND(IstNumbers,B3),"")),9^9)}

you are going to use a FIND within a MIN to define the position of the first number on the text string and this feeds the MID function, namely the start_num argument.
Since you want to extract everything starting from the first number found on the text string, you can define the num_chars as some big number (9^9 should do for the majority if not all cases) so you are confident you'll extract everything.
The reason why this has to be input as an array formula is because of how the FIND function works with the variable IstNumbers, basically testing if it finds any number from 0-9 within the text.

I hope this helps but if you need any further explanations, we'll happily provide :)

Regards,
 
Hi Tiago MF,

The parts I fully understand are the following:
- Find function tot find any numbers
- The 9^9 to extract all the characters in the sub-string

The questions I had are:
- For the formula, why do you choose to use Mid? Doesn't that mean you are only extracting from the middle of the sub-string? How does the formula know to start at the most left position?
- Why do you use Min? Doesnt that find the lowest number that exists? So if the numbers for the address are 201, would it identify 0? If so, how does it know to really start at 2?

Thanks for the clarification!
 
Hi Julius

- For the formula, why do you choose to use Mid? Doesn't that mean you are only extracting from the middle of the sub-string? How does the formula know to start at the most left position?

Let me clarify; the MID here isn't exactly extracting from the middle of the sub-string, it's extracting from the middle of the entire string to create a sub-string. For example, on cell B2 the entire string is "Name OneName Two3 AddressCityState20999", and from that you want to extract a sub-string which first character will be the first number found on the entire string.


- Why do you use Min? Doesnt that find the lowest number that exists? So if the numbers for the address are 201, would it identify 0? If so, how does it know to really start at 2?

We're not using MIN here to find the lowest number that exists but to find the position of the first number of the entire string. Firstly note that FIND there will give you the position of the occurrence of each number (0,1,2,3,4,5,6,7,8,9) on the entire text string. The result of this find on B2 ("Name OneName Two3 AddressCityState20999") is
{36;#VALUE!;35;17;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;37}
meaning that it's telling us that:
  • the first 0 is at position 36
  • the first 2 is at position 35
  • the first 3 is at position 17
  • it can't find any 1,4,5,6,7 and 8s on our text string (as you can confirm)
  • the first 9 is at position 37
This then is fed into Min so you can find the position of the first number which on B2 as we saw above is number 3 and is found on position 17 of our text string. Ultimately it's with this Min that we can tell MID to start extracting from this part onwards.

Hope this explains everything.

BR,
 
Back
Top