• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract Contact Information from a string

Bulberino

New Member
Excel file attached outline the current format of the data, which is POC information, e.g. last name, first name, email, phone, etc. I am trying to find a way to quickly separate all that out without manually having to do a find/mid or left/right type equation, which does not work well when there are multiple POCs listed in the string, or when there are hundreds of rows. any assistance would be greatly appreciated! i tried a search but could not find what I was looking for. thank you very much for your time and assistance!
 

Attachments

C8: =MID($C2,FIND(C$7,$C2)+LEN(C$7)+2,FIND("\",$C2,FIND(C$7,$C2)+LEN(C$7)+2)-(FIND(C$7,$C2)+LEN(C$7)+2))
copy down and across

This works except for City as the city eg: Albany doesn't have a trailing \ as all the other fields does

The other way would be to simply use the Data, Column to Text function
 
Wow, thank you for the prompt response! This works pretty well. the reason the city does not work (and doesn't have the trailing \) is because the poc info pattern repeats itself if there are multiple authors, which is indicated by the single ",".

Everything for Book1 worked great, but for Book 2, I could not extract more than the first author's contact info. thoughts on how to accomplish this? the column to text works, but it is quite labor intensive and crashes excel when I am doing this on hundreds of thousands of rows.

thank you again!
 
VBA, if you like.
Code:
Sub test()
    Dim a, i As Long, temp As String, m As Object, dic As Object
    Set dic = CreateObject("Scripting.Dictionary")
    dic.CompareMode = 1
    a = Sheets("sheet1").[a2].CurrentRegion.Value
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = " *([^:,\\]+): *([^\\,]+)"
        For i = 1 To UBound(a, 1)
            If a(i, 2) <> "" Then
                temp = a(i, 2): a(i, 2) = ""
                For Each m In .Execute(temp)
                    If Not dic.exists(m.submatches(0)) Then
                        dic(m.submatches(0)) = dic.Count + 2
                        If UBound(a, 2) < dic.Count + 1 Then ReDim Preserve a(1 To UBound(a, 1), 1 To dic.Count + 1)
                    End If
                    a(i, dic(m.submatches(0))) = a(i, dic(m.submatches(0))) & _
                    IIf(a(i, dic(m.submatches(0))) <> "", " / ", "") & m.submatches(1)
                Next
            End If
        Next
    End With
    With Sheets.Add.Cells(1).Resize(, UBound(a, 2))
        .Rows(1).Cells(2).Resize(, dic.Count).Value = dic.keys
        .Rows(2).Resize(UBound(a, 1)).Value = a
        .CurrentRegion.Columns.AutoFit
    End With
End Sub
 

Attachments

Or try,

1] Current Data helper, A2, formula copy down :

=(LEN(C2)-LEN(SUBSTITUTE(C2,"Surname",)))/LEN("Surname")

2] Desired Output B8, formula copy down :

=LOOKUP(ROW(A1),SUMIF(OFFSET(A$1,,,ROW($1:$10),),"<>")+1,B$2:B$3)&""

3] Desired Output C8, formula copy across and down :

=IFERROR(SUBSTITUTE(TRIM(LEFT(SUBSTITUTE(MID(INDEX($C$2:$C$5,MATCH($B8,$B$2:$B$5,0)),FIND("ß",SUBSTITUTE(INDEX($C$2:$C$5,MATCH($B8,$B$2:$B$5,0)),C$7,"ß",COUNTIF($B$8:$B8,$B8)))+LEN(C$7)+1,90),",",REPT(" ",90)),90)),"\",),"")

Regards
Bosco
 

Attachments

Jindon - When I try to run the VBA code you provided, I receive a run time error 1004 and the following line is highlighted:
.Rows(1).Cells(2).Resize(, dic.Count).Value = dic.keys

I am not familiar enough to VBA to troubleshoot and understand what is going on. Any assistance you can provide is greatly appreciated. thank you!
 
sorry, i did not notice the file you uploaded. i used the code you provided and it gave me an error when i ran it. however, i opened your file and it worked. i am sure there is user error on my part.

one question, if it is possible. in the worksheet, a book with multiple authors shows all the authors' information in one cell, e.g. givenname cell is: bob/pat. can this be separated out so that there are two rows, one with bob's info and one with pat's info?

thank you again for your assistance!
 
If you upload a workbook with raw data and the exact result that you want, I will see if it is possible or not.
 
Looked at Sheet2 in your file...
Code:
Sub test()
    Dim a, i As Long, n As Long, temp As String
    Dim ws As Worksheet, m As Object, dic As Object
    Set dic = CreateObject("Scripting.Dictionary")
    dic.CompareMode = 1
    a = Sheets("sheet1").[a2].CurrentRegion.Value
    Set ws = Sheets.Add: n = 1
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = " *([^:,\\]+): *([^\\,]+)"
        For i = 1 To UBound(a, 1)
            If a(i, 2) <> "" Then
                temp = a(i, 2)
                For Each m In .Execute(temp)
                    If m.submatches(0) = "Surname" Then n = n + 1: ws.Cells(n, 1) = a(i, 1)
                    If Not dic.exists(m.submatches(0)) Then
                        dic(m.submatches(0)) = dic.Count + 2
                        ws.Cells(n, 1).Value = a(i, 1): ws.Cells(1, dic.Count + 1).Value = m.submatches(0)
                    End If
                    ws.Cells(n, dic(m.submatches(0))) = m.submatches(1)
                Next
            End If
        Next
    End With
End Sub
 
Okay, I will take a look at that. I uploaded another file with a cleaner example of what I am trying to accomplish. the current data is what my data looks like now. the desired output is how I am trying to parse the data into separate columns. hopefully that makes more sense.

Bosco's solution worked great, however it did not scale well when trying to run that on a large data set.
 

Attachments

your updated vba code parsed everything out great. Hopefully when I try it on the same large data set it performs similarly! thank you again!
 
Back
Top