• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extraction of information

gknath79

New Member
Hello Excel Gurus,

I would like to extract specific details from every cell [around 20000] in Column A using regex or any other possible/easiest way. I have attached the sample file with this message which contains 2 sheets. Sheet "Raw" contains the raw contents and the Sheet "Expected" has the example of how the extracted data should look like.

I need to read from Sheet1 "Raw" and the extracted information should goto a new Sheet named "Expected"

Any help would be appreciated.
 

Attachments

Marc L

Excel Ninja
Hi !

At least explain the algorithm (or even your logic) needed in order to
proceed to the expected extraction and join a more relevant workbook.
The better explanation and attachment, the better result …
 

gknath79

New Member
@Marc L, the attached workbook in #1 contains two sheets (Raw & Expected).

Raw:
It contains 3 rows (in Column A), actual data is around 15000 rows in Column A. What i wanted to do is to extract information out of these cells and put the resulted [extracted] information in another Sheet.

Condition 1:
Basically whatever details that is present under the first DATE is the actual data that is to be extracted.

2015-06-18 23:11:10 - Raki (Travel Details)

Booking has been confirmed.

Condition 2:

In some cases, rather than the body of the message it also may contain Hello and Regards which should be omitted and only body of the message has to be extracted out of it.

Hello <username>,

Message copied.

Regards,
<username>

Condition 3:
In some cases, instead of Hello and Regards, there can be Hey <username> and Thanks <username> which should be omitted and whatever message is in between them has to be taken out.

Note: All these 3 conditions are applicable for all the cells in Column A. To give you more details, please look at the second sheet in the attached workbook in #1. Let me know if something is unclear.
 

Marc L

Excel Ninja
As a starter, you can mod it accordingly to your needs :​
Code:
Sub Demo0()
         Dim Rg As Range, V, R&
    For Each Rg In Worksheets(1).UsedRange.Columns(1).Cells
        For Each V In Split(Rg.Value, vbLf)
            If V > "" Then
                If Val(V) = 0 Then
                    Select Case True
                        Case V = "Hi", V Like "Hello*", V Like "Please *", V Like "*regards*", V Like "VR"
                        Case Else
                            R = R + 1
                            Worksheets(2).Cells(R, 1).Value = V
                    End Select
                End If
            End If
        Next
    Next
End Sub
Do you like it ? So thanks to click on bottom right Like !
 

gknath79

New Member
@Marc L, I see a problem with the above logic. Could you please help me in fixing the same?

Please check the attached file "Book2" if there is a space after the first date and if there are two sentences. Then it copies it to two cells in the second sheet. For example, please look at the cell A1 in the Sheet (RAW) and its data has been extracted to the Sheet (Expected) but in A1 and A2 which is wrong. Both these should be in A1 only.
 

Attachments

Marc L

Excel Ninja
As a demonstration starter, you can mod the code for all
you didn't explained in your initial post and as it's logic is yours ‼

At least without any crystal clear & complete explanation of the need / issue
post a correct & complete expected result worksheet
according to the source worksheet …

Better than moding the code each time you forgot a logic element,
think about a parameters worksheet.
So you won't have to mod the code but just to update parameters …
 

vk7

Member
@Marc L, thank you for all the help.

I have gathered all the possibility's without missing out anything and attached here. Could you please help me out? I am not quite sure about the parameter solution which you are talking about. So if you could show me an example with my current data that would be great.
 

Attachments

gknath79

New Member
@Marc L, with the new set of data in #7, I have added all the possibilities. Apologies for missing it out for the first time. Could you please help me with the VBA code?

Thumb rule is if the Sheet1(Raw) has 9 rows then the results sheet "Expected" should also only contain 9 rows with the respective filtered values in each rows. Have been struggling with this quite a lot of time and any help on this would be great.
 

Marc L

Excel Ninja
Post #7 as it is just appears as a hijack !

Add in top of the module code Option Compare Text
then run Demo0 procedure, you will see the benefit !

For all those you forgot, so just mod the code
with all you do no want to see in result
as you will do each time a new element as to be removed …
Or as yet stated, create a parameter worksheet for those unwanted
so the next code will never have to be updated
each time but just the parameter worksheet …

Or warming a couple of neurones, maybe it could be easier
to just list what is authorized in result
instead of a list of all the unauthorized !
Just think about it and choose the easiest way between both …
 

Marc L

Excel Ninja
Option Compare Text may be useless
as it seems with your last attachment
a list of just 3 words is necessary to reach the result
(aka each first word of authorized)
but as it depends of all possible cases,
if you do not want to mod the code each time
a new element is needed within this list,
you must create this list in a list parameter worksheet …
 
Top