• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Text Formula Assistance Needed - What to use for various situation

Josiev

New Member
Hello.....
I am not sure if I am using the proper approach to a weekly challenge that I have to work with...I appreciate any assistance to ensure I can efficiently complete this weekly task which typically takes enormous amount of time and can be prone to mistakes.

I run a report which has over a thousand line items...For each line or row, I need to identify WO number (work order) that is referenced in one of the two column headers in the report. Unfortunately, the WO # format is not uniformly formatted as there are three types or ways they are identified. The WO's are either formatted in three ways:
1. WO w/ 9 characters: Begins with 3 letters, followed by -, followed by 5 digits (ex. DAL-76987, MDW-76855, BED-75549, etc.)
2. WO w/ 8 characters : Begins with 2 letters "CL" followed by -, followed by 5 digits (CL-615414, CL-71978, etc)
3. WO w/ 5 digits only: For example, 60511, 61400, 61563, etc. Numbers always start either with a 5, a 6 or a 7.

These WO's are referenced in the attached report in either column G "Reference" or column H "Narrative". The WO's exist in one of these columns and never in both. Most of the time, they are entered all by itself but other times, they are entered with additional narratives and can be placed in the middle of a narrative.
The challenge then is to identify the WO's and input them in a separate column, in column J "WO #". I typically sort column G first and and copy and paste them over to column J, then sort column H and do the same thing. I then clean up column J to ensure the blank spaces or extra characters that are sometimes entered before or after the WO#'s are removed. I realize this is not the most efficient way as the report is really long and I find mistakes that I missed inputting. Any ideas or better approach is appreciated. Thank you!
 

Attachments

  • Sample - Text Formula Challenge.xlsx
    16.9 KB · Views: 8
I do not believe this to be a simple task. I managed it only by having an evaluation copy of Charles Williams's FXL Speedtools on my laptop. He has written a function Rgx.MID( ) that will extract sub-strings that match a Regular Expression
= CONCAT( Rgx.MID(Table14[@[Reference]:[Narrative]], RegEx ) )
where my name RegEx referenced a cell containing
\w{3}-\d{5}|CL-\d{5}|\b[5-7]\d{4}
The "|" indicates alternatives:
The first option looks for 3 letters followed by a "-" and 5 digits;
the second looks for "CL-" followed by 5 digits;
and the third is a word boundary followed by 5,6,or 7 and 4 further digits.
Good luck if you believe you can perform such a task with Excel's search strings!

Somewhere to experiment with RegEx is https://regexr.com/
 
Extract specified text from Column G and Column H data

This formula solution used a FILTERXML function of which available in Excel 2013 or above.

65026


In K2, copied down :

=IFERROR(FILTERXML("<a><b>"&SUBSTITUTE(SUBSTITUTE(G2&" "&H2,",","")," ","</b><b>")&"</b></a>","//b[contains(.,'-') and string-length(.)<10 or .>9999 and .<99999]"),"")

Regards
Bosco
 

Attachments

  • Text Formula Challenge (BY).xlsx
    18 KB · Views: 3
Last edited:
@bosco_yip
I am impressed. I am not familiar with the criteria available within XML searches Like regular expressions it appear to be an acquired art. As it stands, though, the 9-character string must contain a "-" but there is no requirement for it to contain any digits. Hence, the string, "cat-treat", matched the criteria. That said, you appear to have returned the codes without picking up too much in the way of extraneous junk.
 
@Peter Bartholomew,

Good catch, in avoid something like "cat-treat", and

65030

re-read the OP's condition, mentioned that "...Numbers always start either with 5, 6 or 7...".

So,

the formula changed to :

=IFERROR(FILTERXML("<a><b>"&SUBSTITUTE(SUBSTITUTE(G2&" "&H2,",","")," ","</b><b>")&"</b></a>","//b[contains(.,'-5') or contains(.,'-6') or contains(.,'-7') and string-length(.)<10 or .>9999 and .<99999]"),"")

Regards
Bosco
 

Attachments

  • Text Formula Challenge (BY-R1).xlsx
    18.2 KB · Views: 5
Last edited:
You guys rock....I have tried all sorts of formulas but was only able to accomplish a portion of what I needed. I would have never thought of a way to do it this way. Thank you for this valuable replies. This has vastly improved my time management and assurance that reports that I provide are accurate.
 
Back
Top