• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract Date from Text String/Power Query

In our practice, often times we'll include dates and text in a description (09/13/17 SPED, 1-15-09 AUDIO). I am trying to figure out if there is a way in Power Query to extract the date portion of the text string and then format it as a date. The date is not always at the beginning of the text string and is not always "mm/dd/yy" format. I've searched the webs and the forum, though admittedly I may not be using the right search verbiage, but I haven't been able to find anything. Is there a way to do this in Power Query?

Thanks in advance!!
YL
 

Chihiro

Excel Ninja
I'd recommend uploading sample patterns that can occur. Along with expected output.

However, there is no robust mechanism that will extract all date patterns. You can build custom function to handle most common cases though.
 

Chihiro

Excel Ninja
Ok, if it's always going to be like what your sample has. Then it's very easy.

Select [DESC] -> Split Column -> By Delimiter.

Then use Space as delimiter and use Left Most.
upload_2018-12-19_14-59-15.png

That's it, if you have system date format set to "mm/dd/yyyy" then it should automatically apply data type change.
upload_2018-12-19_14-59-49.png

Edit: Added screen shots.
 
Thanks Chihiro. I guess I didn't provide a big enough sample, BUT, this will catch most of them for sure. I have other descriptions with the date at the end, so I will use a combo of steps to extract from the right most when I need to. Thanks for your help!
 
For reference, here is a better sampling of the differences in description. I would say that most of the dates will be at the beginning or end, which will make it easier to extract, but some are in the middle, and some are spelled out.
 

Attachments

Chihiro

Excel Ninja
That's going to be significantly more complex. Without use of R-script and PowerBi. I can't think of quick solution. Let me think on it a bit.

Is there a possibility of validating input? Rather than cleaning up output. I tend to enforce validation upfront to avoid headaches down the line.
 
Thanks Chihiro. We have been consistent the last few years, but I will make notes regarding how to input. I don't think the database we has any kind of built in data validation, so we'll have to have copious notes on it.

Thanks for your help!
 

Chihiro

Excel Ninja
I'm still working on ways to deal with text Month name pattern. (ex. April 20, 2004). But I've got function to extract dates from other patterns (m/dd/yy, mm-dd-yy, m/d/yy, m/d/yyyy etc).

See below function.
Code:
(myStr) =>
let
    myStr = Text.Replace(myStr, "- ", " "),
    Split = Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)(myStr),
    Convert = List.Generate(
                            ()=> [Date=null, x=-1],
                            each [x]<List.Count(Split),
                            each [Date= try Date.From(Split{x}) otherwise null, x=[x]+1],
                            each [Date]),
    myDate = List.RemoveNulls(Convert){0}
in
    myDate
Sample usage and result:
upload_2018-12-20_12-22-29.png

I'll update when I've figured out logic for extracting date with month name and space between date parts.
 

Chihiro

Excel Ninja
Ok here's second function.
Code:
(myStr) =>
let
    Split = Splitter.SplitTextByDelimiter(", ")(myStr),
    Year = List.RemoveNulls(
                List.Generate(
                              ()=>[Year=null,x=0],
                              each [x]<List.Count(Split),
                              each [Year= try Number.From(Text.Start(Split{x},4)) otherwise null, x = [x]+1],
                              each [Year])
                            ){0},
    Split2 = Splitter.SplitTextByDelimiter(Number.ToText(Year))(myStr),
    Temp = List.LastN(Splitter.SplitTextByDelimiter(" ")(Text.Trim(Split2{0})), 2),
    myDate = Date.From(Text.Combine(Temp, " ") & " " & Number.ToText(Year))
in
    myDate
You could combine the two, but I kept it separate for ease of maintenance/use.

See attached sample for how it's used.
 

Attachments

Top