• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract Date from Text String/Power Query

Status
Not open for further replies.
In our practice, often times we'll include dates and text in a description (09/13/17 SPED, 1-15-09 AUDIO). I am trying to figure out if there is a way in Power Query to extract the date portion of the text string and then format it as a date. The date is not always at the beginning of the text string and is not always "mm/dd/yy" format. I've searched the webs and the forum, though admittedly I may not be using the right search verbiage, but I haven't been able to find anything. Is there a way to do this in Power Query?

Thanks in advance!!
YL
 
I'd recommend uploading sample patterns that can occur. Along with expected output.

However, there is no robust mechanism that will extract all date patterns. You can build custom function to handle most common cases though.
 
Thanks Chihiro. I've uploaded a sample of real-life examples. What I'd like is to extract the date as appearing in column B.

I haven't touched on or used custom functions very much so I will look into that some more.

YL
 

Attachments

  • Date Extraction Example.xlsx
    199.5 KB · Views: 6
  • Date Extraction Example.xlsx
    199.5 KB · Views: 2
Ok, if it's always going to be like what your sample has. Then it's very easy.

Select [DESC] -> Split Column -> By Delimiter.

Then use Space as delimiter and use Left Most.
upload_2018-12-19_14-59-15.png

That's it, if you have system date format set to "mm/dd/yyyy" then it should automatically apply data type change.
upload_2018-12-19_14-59-49.png

Edit: Added screen shots.
 
Thanks Chihiro. I guess I didn't provide a big enough sample, BUT, this will catch most of them for sure. I have other descriptions with the date at the end, so I will use a combo of steps to extract from the right most when I need to. Thanks for your help!
 
For reference, here is a better sampling of the differences in description. I would say that most of the dates will be at the beginning or end, which will make it easier to extract, but some are in the middle, and some are spelled out.
 

Attachments

  • Date Extraction Example.xlsx
    199.9 KB · Views: 15
That's going to be significantly more complex. Without use of R-script and PowerBi. I can't think of quick solution. Let me think on it a bit.

Is there a possibility of validating input? Rather than cleaning up output. I tend to enforce validation upfront to avoid headaches down the line.
 
Thanks Chihiro. We have been consistent the last few years, but I will make notes regarding how to input. I don't think the database we has any kind of built in data validation, so we'll have to have copious notes on it.

Thanks for your help!
 
I'm still working on ways to deal with text Month name pattern. (ex. April 20, 2004). But I've got function to extract dates from other patterns (m/dd/yy, mm-dd-yy, m/d/yy, m/d/yyyy etc).

See below function.
Code:
(myStr) =>
let
    myStr = Text.Replace(myStr, "- ", " "),
    Split = Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)(myStr),
    Convert = List.Generate(
                            ()=> [Date=null, x=-1],
                            each [x]<List.Count(Split),
                            each [Date= try Date.From(Split{x}) otherwise null, x=[x]+1],
                            each [Date]),
    myDate = List.RemoveNulls(Convert){0}
in
    myDate

Sample usage and result:
upload_2018-12-20_12-22-29.png

I'll update when I've figured out logic for extracting date with month name and space between date parts.
 
Ok here's second function.
Code:
(myStr) =>
let
    Split = Splitter.SplitTextByDelimiter(", ")(myStr),
    Year = List.RemoveNulls(
                List.Generate(
                              ()=>[Year=null,x=0],
                              each [x]<List.Count(Split),
                              each [Year= try Number.From(Text.Start(Split{x},4)) otherwise null, x = [x]+1],
                              each [Year])
                            ){0},
    Split2 = Splitter.SplitTextByDelimiter(Number.ToText(Year))(myStr),
    Temp = List.LastN(Splitter.SplitTextByDelimiter(" ")(Text.Trim(Split2{0})), 2),
    myDate = Date.From(Text.Combine(Temp, " ") & " " & Number.ToText(Year))
in
    myDate

You could combine the two, but I kept it separate for ease of maintenance/use.

See attached sample for how it's used.
 

Attachments

  • Date Extraction Example (1).xlsx
    209.5 KB · Views: 49
Hi Chihiro,

That was excellent

In my case I have variable dates and formats, I have to pull start date & end date in to 2 columns
could you help me out
 

Attachments

  • Example.JPG
    Example.JPG
    312.9 KB · Views: 7
Hello @Chihiro

I have come across somewhat similar problem with longer text length and different date formats which is manually entered, need null where there is no date. I tried to copy your solution in my scenario but it didn't work perhaps due to my limited knowledge of Power Query.

Could you please look into the file attached and assist me in extracting dates from last comment column. The actual data contains more than 100K rows.

Thanks.
 

Attachments

  • Sample Data for Chandoo.xlsx
    11.4 KB · Views: 1
Status
Not open for further replies.
Back
Top