• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA macro to extract the date from the text string

roshalphonso

New Member
Hello team.......i need a macro that can extract the date in the format DD/MM/YYYY from column A and store in the corresponding cell in Column B. These are the scenarios to be covered
1. Dates are in multiple formats like DD/MM/YYYY or DD/MM or DD.MM.YYYY or DD-MM-YY or DD-MM-YYYY. for every such format, i would like the output in DD/MM/YYYY format
2. If there is no year in the date format for eg. its just 23/10...then i should get the current year and the output should be 23/10/2023
3. More than one date in cell and i need the latest one extracted in DD/MM/YYYY
4. Dates are in the format for eg: Nov/21 which should actually stand for 21/11/2023, so extracted as DD/MM/YYYY
5. Dates are in the format D/MM/YYYY and need the output extracted as DD/MM/YYYY
6. Not always necessary that the date is at the right or left of the comment. It could be anywhere in the string and maybe in one of the formats given in point 1 to 5 above.
7. There maybe a typo error in the comment with the date put as 10/11/202 instead of 10/11/2023. For all such cases it should consider the year as the current year.
 

Attachments

  • Book3.xlsx
    8.8 KB · Views: 9
Hello team.......i need a macro that can extract the date in the format DD/MM/YYYY from column A and store in the corresponding cell in Column B. These are the scenarios to be covered
1. Dates are in multiple formats like DD/MM/YYYY or DD/MM or DD.MM.YYYY or DD-MM-YY or DD-MM-YYYY. for every such format, i would like the output in DD/MM/YYYY format
2. If there is no year in the date format for eg. its just 23/10...then i should get the current year and the output should be 23/10/2023
3. More than one date in cell and i need the latest one extracted in DD/MM/YYYY
4. Dates are in the format for eg: Nov/21 which should actually stand for 21/11/2023, so extracted as DD/MM/YYYY
5. Dates are in the format D/MM/YYYY and need the output extracted as DD/MM/YYYY
6. Not always necessary that the date is at the right or left of the comment. It could be anywhere in the string and maybe in one of the formats given in point 1 to 5 above.
7. There maybe a typo error in the comment with the date put as 10/11/202 instead of 10/11/2023. For all such cases it should consider the year as the current year.
Hello

Sorry couldn't open your excel file, However
Creating a macro to handle all these scenarios requires a bit of complexity. Assuming you are using Excel VBA, here's a basic outline for the macro:
Function ExtractDate(inputString As String) As Date
' Function to extract date from input string and return it in DD/MM/YYYY format

' Your code to extract and format the date here

End Function

Sub ProcessDates()
Dim lastRow As Long
Dim i As Long

' Assuming your data starts from row 2
lastRow = Cells(Rows.Count, "A").End(xlUp).Row

For i = 2 To lastRow
Dim inputDate As String
Dim extractedDate As Date

' Assuming your date is in column A
inputDate = Cells(i, 1).Value

' Extract date using the function
extractedDate = ExtractDate(inputDate)

' Output in DD/MM/YYYY format in column B
Cells(i, 2).Value = Format(extractedDate, "DD/MM/YYYY")
Next i
End Sub

Now, let's break down the `ExtractDate` function:

1. Handling multiple date formats:
- Use a combination of functions like `InStr`, `Mid`, and `IsDate` to identify and extract dates in various formats.

2. Adding current year:
- Check if the year is missing and append the current year if needed.

3. Handling multiple dates in a cell:
- Use a loop to iterate through potential dates and extract the latest one.

4. Handling month abbreviations:
- If the date contains month abbreviations, map them to their respective numeric values.

5. Handling D/MM/YYYY format:
- Check for single-digit day and add a leading zero if necessary.

6. Handling date position in the string:
- Implement logic to search for dates within the string, not just at the beginning or end.

7.Handling typo errors:
- Check for common typos and assume the current year in such cases.

Remember to thoroughly test the macro with various scenarios to ensure it covers all cases in your data.
 
I'm a situation where you are writing complex logic to get around poor data format, your first and best approach is to get a better data source. Can users be directed to a form that constrains how they add data

I assume you want the dates to actually behave as dates so step 1 is to format column B as date. This will have the added effect that if you drop in a date without a year, it will default to current year. Alo if you specify date format as dd/MM/yyyy then the missing zeroes will get added automatically
For the macro I would start by finding those the instances of Mon/dd and reversing them
Then replacing all instances of Jan, Feb, etc with the digit code. Then replacing dots and dashes with slashes
Then search backwards for slashes so you find only the last potential date.
From your examples there is always a space before the date so your start point for the date is the last space before the last slash. Let's call this ground zero

Pull out everything from the ground zero to the first character that is neither a slash nor a digit
Count the slashes in your substring to see how many segments in your date.

If there is one slash so two blocks then this is day and month so put it straight into the target volume and it will default to this year.
If there are two slashes , check whether the text after the last slash is a valid year within the expected range. Otherwise delete it so that ot will default to this year
 
Your approach to improve the data source by directing users to a constrained form
I'm a situation where you are writing complex logic to get around poor data format, your first and best approach is to get a better data source. Can users be directed to a form that constrains how they add data

I assume you want the dates to actually behave as dates so step 1 is to format column B as date. This will have the added effect that if you drop in a date without a year, it will default to current year. Alo if you specify date format as dd/MM/yyyy then the missing zeroes will get added automatically
For the macro I would start by finding those the instances of Mon/dd and reversing them
Then replacing all instances of Jan, Feb, etc with the digit code. Then replacing dots and dashes with slashes
Then search backwards for slashes so you find only the last potential date.
From your examples there is always a space before the date so your start point for the date is the last space before the last slash. Let's call this ground zero

Pull out everything from the ground zero to the first character that is neither a slash nor a digit
Count the slashes in your substring to see how many segments in your date.

If there is one slash so two blocks then this is day and month so put it straight into the target volume and it will default to this year.
If there are two slashes , check whether the text after the last slash is a valid year within the expected range. Otherwise delete it so that ot will default to this year

for data entry is a wise first step. It can prevent issues stemming from inconsistent data formats. However, when dealing with existing data challenges, your outlined logic for cleaning and transforming dates in the macro seems thorough and should help address the inconsistencies effectively. Just ensure robust testing to handle various scenarios and edge cases.
 
@Monty - since you were unable to open the file, here is the sample data. Can you confirm if your macro will work for the following scenarios:
Comment
Nov/9 - New email received
Nov/16 - New email received
08/11 New email arrived
TPA - approval from XXX - 7/11/2023
TPA - approval from XXX - 27/11/2023
TPA - approval from XXX - 27.11.2023
TPA - approval from XXX - 7/1/2023
TPA - approval from XXX - 28-July-2023
QA - Death Reco - 16/11
New forms arrived on 20-Nov
TPA - approval from XXX - 7/1/2023
 
I'm a situation where you are writing complex logic to get around poor data format, your first and best approach is to get a better data source. Can users be directed to a form that constrains how they add data

I assume you want the dates to actually behave as dates so step 1 is to format column B as date. This will have the added effect that if you drop in a date without a year, it will default to current year. Alo if you specify date format as dd/MM/yyyy then the missing zeroes will get added automatically
For the macro I would start by finding those the instances of Mon/dd and reversing them
Then replacing all instances of Jan, Feb, etc with the digit code. Then replacing dots and dashes with slashes
Then search backwards for slashes so you find only the last potential date.
From your examples there is always a space before the date so your start point for the date is the last space before the last slash. Let's call this ground zero

Pull out everything from the ground zero to the first character that is neither a slash nor a digit
Count the slashes in your substring to see how many segments in your date.

If there is one slash so two blocks then this is day and month so put it straight into the target volume and it will default to this year.
If there are two slashes , check whether the text after the last slash is a valid year within the expected range. Otherwise delete it so that ot will default to this year
Would you be able to translate this into a vba script?
 
Would you be able to translate this into Here's a simple example:
Here we go

Code:
Function ExtractDatesFromString(inputText As String) As String
    Dim regex As Object
    Dim matches As Object
    Dim match As Object


    Set regex = CreateObject("VBScript.RegExp")
    regex.Global = True
    regex.IgnoreCase = True
    regex.Pattern = "\b(?:0?[1-9]|[12][0-9]|3[01])[-\/.](0?[1-9]|1[0-2])[-\/.](?:\d{2}|\d{4})\b"


    Set matches = regex.Execute(inputText)
    
    Dim result As String
    For Each match In matches
        result = result & match & vbCrLf
    Next match


    ExtractDatesFromString = result
End Function


To use this function, open the VBA editor, insert a new module, and paste the code. You can then use it in your workbook like this:


Code:
Sub TestExtractDates()
    Dim inputText As String
    Dim extractedDates As String


    ' Replace this with your actual cell or text string
    inputText = "YourSampleTextHere"


    extractedDates = ExtractDatesFromString(inputText)


    ' Display the extracted dates in a message box
    MsgBox "Extracted Dates:" & vbCrLf & extractedDates
End Sub


Replace "YourSampleTextHere" with your actual text containing dates. Run the `TestExtractDates` macro to see the extracted dates in a message box. Note that this example uses a basic regular expression and may need adjustment based on the variations in your data.
 
Using the data in your file, perhaps something like this:
Code:
Sub DateExtract()
    Dim rng As Range, R As Range
    Dim I As Long
    Dim S As String
    Dim SA As Variant
    Dim D As Date

    With ActiveSheet
        Set rng = .Range("A2", .Range("A" & .Rows.Count).End(xlUp))
        rng.Offset(0, 1).NumberFormat = "dd/mm/yy"
    End With

    For Each R In rng
        S = Application.Trim(R.Value)
        SA = Split(S, " ")
        For I = LBound(SA) To UBound(SA)
            SA(I) = Replace(SA(I), ".", "-")
            If VBA.IsDate(SA(I)) Then
                D = CDate(SA(I))
                If Len(Year(D)) < 4 Then
                    D = VBA.DateSerial(Year(Date), Month(D), Day(D))
                End If
                R.Offset(0, 1).Value = CStr(D)
            End If
        Next I
    Next R
End Sub
 
Here we go

Code:
Function ExtractDatesFromString(inputText As String) As String
    Dim regex As Object
    Dim matches As Object
    Dim match As Object


    Set regex = CreateObject("VBScript.RegExp")
    regex.Global = True
    regex.IgnoreCase = True
    regex.Pattern = "\b(?:0?[1-9]|[12][0-9]|3[01])[-\/.](0?[1-9]|1[0-2])[-\/.](?:\d{2}|\d{4})\b"


    Set matches = regex.Execute(inputText)
   
    Dim result As String
    For Each match In matches
        result = result & match & vbCrLf
    Next match


    ExtractDatesFromString = result
End Function


To use this function, open the VBA editor, insert a new module, and paste the code. You can then use it in your workbook like this:


Code:
Sub TestExtractDates()
    Dim inputText As String
    Dim extractedDates As String


    ' Replace this with your actual cell or text string
    inputText = "YourSampleTextHere"


    extractedDates = ExtractDatesFromString(inputText)


    ' Display the extracted dates in a message box
    MsgBox "Extracted Dates:" & vbCrLf & extractedDates
End Sub


Replace "YourSampleTextHere" with your actual text containing dates. Run the `TestExtractDates` macro to see the extracted dates in a message box. Note that this example uses a basic regular expression and may need adjustment based on the variations in your data.
Cant get this to work. Can you put in a sample workbook with my data and show me an example
 
Thank you this makes more sense now. Couple of questions, to make it work in my original data:
a. My input column is F and output column where i need the dates extracted is column S. What part of the code do i update?
b. Also i want to run this macro from another sheet and not the sheet which has this data. The name of the sheet where data is called "RawData". What do i replace in the code?
 
Sorry one more requirement please. After testing the code, i realised that it does not extract the date if the date is in between 2 words (the date may not necessarily be at the extreme right).
 
My input column is F and output column where i need the dates extracted is column S. What part of the code do i update?

Replace R.Offset(0, 1).Value = D with R.Offset(0, 18).Value = D 'Col S

Sorry one more requirement please. After testing the code, i realised that it does not extract the date if the date is in between 2 words (the date may not necessarily be at the extreme right).

Please post an example of where it does not extract the date if the date is in between 2 words because when I test it, it does extract the date for that scenario. ,
 
Last edited:
Back
Top