• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Using Regular Expression to Extract specific substring

arishy

Member
The string text is Huge (HTML web page).
The string starts with unique ID and end with </span)
Here is an example

<div id=currency_converter_result>1000 USD = <span class=bld>62785 INR</span>

I need to build a function with 3 parameters ( str , start, end)
start=" id=currency_converter_result"
end = "</span>"

The return results are two strings "USD" and "INR" ( these are examples )
AND two values 1000 and 62785

The regular expression is

( str =~ m/(?<=start)(?<string_extracted>.*?)(?=end)/g )

Since it is unique ID this will happen only ONCE

Your help will be greatly appreciated.
 
Hi, arishy!

Regular expressions are widely available in HTML world. Regarding MS products, in Visual Studio .NET validations, in any language, either C# or VB. But they're available just at the point to be used for field validation purposes. What does it mean? That VB doesn't know what to do or how to handle that regular expression by itself within code without using object functions or methods, i.e., VB natively doesn't understand regular expressions.

And even VB & VBA are languages that have lots of things in common, in VBA field forms regular expressions don't exist, and neither VBA is able to understand them, so they're just weird strings.

So the function you're looking for should receive as input argument 3 parameters:
- text to search into
- start search string
- end search string
and it would retrieve 1 argument:
- found text

If you want to refine it, you'll have to add parsing capabilities to extract the "1000 SD" and "62785 INR", in either 2 or 4 arguments.

Hope it helps.

Regards!
 
Of course it helped me ......moving away from RE.
How do I parse this string or "1000 USD" and "62785 INR"
Simple pointers will let me head in the right direction.
 
Hi, arishy!

Give a look at this topic:
http://chandoo.org/forum/threads/specific-data-and-html-beyond-excel-39-s-capabilities.9640/

In the uploaded file at 11th post:
http://chandoo.org/forum/threads/sp...yond-excel-39-s-capabilities.9640/#post-55329
you'll find the code for parsing and extracting data from web pages's source code. Check constantas ksSilkPrefix and ksSilkSuffix, which you should adapt to your particular case.

Unfortunately the posted code suffers from CDIAFM (aka collateral damage issue after forums migration) and the code is unindented and special characters are replaced by "&#nnn:" (unquoted) being nnn the ASCII code for that char. They're supposed to be fixed and updated each time we (Ninjas) stump into them (yes, manually, post by post), but as the code is within the linked file I hope you'll manage to handle it.

Just advise if any issue.

Regards!
 
I checked the link , and this will be my today's project to modify it to get what I want.
One issue which I need help with , the site I am getting data from is User/Password protected.
I have two questions:

1. How to use VBA to pass that login page to the page I am interesting in ( actually two linked pages)
2. Can I use Record macro to create the necessary code to GET IN to my pages or I have to dig in the login page
 
Thank you for the links...These will be "tomorrow"'s project !!!
My issue today is to go over the hurdle of logging in to secure Web page

The log in page has two boxes
Username and Password
And a submit button


the html code behind them ( only the relevant parts)

<div class="cr2tooltip">
<input id="_58_login" name="_58_login"
type="text" value=""
autocomplete="off" maxlength="30"/>

And

<input id="_58_password" name="_58_password"
type="password" value=""
autocomplete="off" maxlength="12"/>

What do I put in the following VBA (bold variable)

.document.all.Item("_58_login").Value ="UserName"
.document.all.Item("_58_password").Value ="Password"

Is that correct ??

Also what to do with :

<input type="submit" id="_58_signInButton" value="Sign In">
How to activate the submit in VBA to go to the main page of the site ???

The VBA that I need is :
1. Goto the login page (No problem)
2. Auto fill User Name and password (see above)
3. Click the submit button (what VBA used for that ???)

I do hope I provided enough info to get this done
 
Thanks Narayan....

One quick question: Is there an easy way to retrieve the element name that I am interested in.

An Example: In the HTML page I am interested in there are input box for user name.
In order for me to fill it with my user name using VBA, I need to know the variable name so I can set it up by VBA.

Some web designers use very sophisticated ways to hide these obvious names. There must be JS command to retrieve it rather than guessing it.

In the samples above,
.document.all.Item("_58_login").Value ="UserName"

I want to know what variable name is used for username in the input html line

The answer is _58_login ( I had to dig into the source code to get it)

I am looking for something like this:

.document.ElementName("Input ID") .....I know THAT does not exists; I am just trying to explain what I am looking for.

And THAT was supposed be a quick question !!!!!

 
Hi, arishy!

Give a look at this code, it's a procedure extracted from a project which I'm not allowed to distribute. It uses getElementsByTagName instead of getElementsByID, but I hope it'd guide you.
Code:
Private Sub cmdGo_Click()
    '
    ' constants
    '  ranges
    Const ksWSMain = "Main"
    Const ksURL = "ParamURL"
    Const ksWSWork = "Work"
    Const ksColumnBase = "ParamColumnBase"
    '  browser
    Const kdDelay1 = #12:00:01 AM#
    Const kdDelay3 = #12:00:03 AM#
    Const ksSubmit = "submit"
    Const ksSubmitName = "X"
    Const ksElementBy = "*" ' input a submit option list"
    Const kiElementMax = 255
    '
    ' declarations
    '  ranges
    Dim rngW As Range
    '  browser
    Dim sURL As String
    Dim HtmlDoc As Object, elemInputButton As HTMLInputButtonElement
    Dim elemCollection As IHTMLElementCollection, elemItem As IHTMLElement
    Dim iColumnBase As Integer, sElementBy() As String
    '  others
    Dim I As Long, J As Long, K As Long
    '
    ' start
    sURL = Worksheets(ksWSMain).Range(ksURL).Cells(1, 1).Value
    Set rngW = Worksheets(ksWSWork).Cells
    iColumnBase = Worksheets(ksWSWork).Range(ksColumnBase).Column
    sElementBy = Split(ksElementBy)
    '
    ' process
    With wbOne
        ' go page
        .Visible = True
        .Navigate2 sURL
        Do Until Not .Busy Or .ReadyState <> READYSTATE_COMPLETE
        Loop
        ' document
'        Application.Wait Now() + kdDelay3
        On Error Resume Next
        Set HtmlDoc = .Document
        If Err.Number <> 0 Then Application.Wait Now() + kdDelay1
        ' inspecting
        For I = 0 To UBound(sElementBy)
            Set elemCollection = HtmlDoc.getElementsByTagName(sElementBy(I))
            If Err.Number <> 0 Then Application.Wait Now() + kdDelay1
            ' clear
            If I = 0 Then
                rngW.ClearContents
                J = 0
                J = J + 1
                rngW.Cells(J, 1).Value = "Name" & vbLf
                rngW.Cells(J, 2).Value = "tagName"
                rngW.Cells(J, 3).Value = "ID"
                rngW.Cells(J, 4).Value = "Value"
                rngW.Cells(J, 5).Value = "Length"
                rngW.Cells(J, 6).Value = "innerText"
                rngW.Cells(J, 7).Value = "innerHTML"
                rngW.Cells(J, 8).Value = "getAttribute(""onClick"")"
                For K = 0 To kiElementMax
                    rngW.Cells(J, iColumnBase + K + 1).Formula = "=""Item(""&" & K & "&"")"""
                Next K
                Worksheets(rngW.Parent.Name).Cells.AutoFilter
            End If
            For Each elemItem In elemCollection
                J = J + 1
                rngW.Cells(J, 1).Value = elemItem.Name
                rngW.Cells(J, 2).Value = elemItem.tagName
                rngW.Cells(J, 3).Value = elemItem.ID
                rngW.Cells(J, 4).Value = elemItem.Value
                rngW.Cells(J, 5).Value = elemItem.Length
                rngW.Cells(J, 6).Value = elemItem.innerText
                rngW.Cells(J, 7).Value = elemItem.innerHTML
                rngW.Cells(J, 8).Value = elemItem.getAttribute("onClick")
                For K = 0 To elemItem.Length
                    rngW.Cells(J, iColumnBase + K + 1).Value = elemItem(K).Value
                Next K
            Next elemItem
        Next I
        ' quit
        GoTo Exit_Sub
        '
        ' click
        For Each elemItem In elemCollection
            If elemItem.Type = ksSubmit Then
                If elemItem.Name = ksSubmitName Then
                    Set elemInputButton = elemItem
                    elemInputButton.removeAttribute ("onClick")
                    elemInputButton.setAttribute "onClick", "return True"
                    elemInputButton.Click
                    SendKeys "~"
                    Exit For
                End If
            End If
        Next elemItem
    End With
    '
    ' end
Exit_Sub:
    Set rngW = Nothing
    Beep
    '
End Sub

This snippet (yes, truncated, I'm sorry) will help you too with the retrieving and assigning values from/to fields and submitting action:
Code:
'...
  ' constants
  '  object fields
  Const ksOEmail = "user_email"
  Const ksOSubmit = "Submit"
  '  values
  Const ksVEmail = "SirJB7@SirJB7.com"
'...
  ' declarations
  Dim objEmail As Object
  Dim objElements As Object, objElement As Object
'...
    With IE
        .Visible = True
        .Navigate2 sURL
        Do While .Busy Or .ReadyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
        Set HtmlDoc = .Document
        ' input values
'...
        '  email
        Set objEmail = .Document.getElementsByName(ksOEmail)
Debug.Print "|"; objEmail.Item(0).Value; "|"
        objEmail.Item(0).Value = ksVEmail
'...
Debug.Print "|"; objEmail.Item(0).Value; "|"
',,,
        ' submit form
        Set objElements = .Document.getElementsByTagName(ksHTMLInput)
        I = 0
        For Each objElement In objElements
            I = I + 1
            Debug.Print I; "|"; objElement.Value; "|"
            If objElement.Type = ksHTMLSubmit And _
              objElement.Name = ksOSubmit Then
                objElement.Click
                Exit For
            End If
        Next objElement
        Do While .Busy Or .ReadyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
'...
    End With
'...

Regards!
 
Last edited:
Your last snippet shows that you can answer my "simple" question:

If I have a log in page that has two input boxes and a submit button.

WITHOUT digging into the source code, can I debug.print the variable name that I have to set by VBA (later)
for both username/password.

In my HTML example above, can I debug.print to display both "_58_login" and "_58_password"
Your answer : .Document.getElementsByName(??????)
I do not know how to fill in the ????? to get debug.print output of "_58_login" ....

I really hope that I succeeded in explaining this very frustrating issue.

Why I am doing this ? you may ask ! Digging into the source code of some pages is a pain
And in my view since we are dealing with oop, it is possible to retrieve the PROPERTY of an object ( in our case INPUT box) and get it's NAME ...not value

 
Hi, Arishy!
My first snippet does that, not the censored second. It builds a worksheet with all the webpage analysis. And sets Autofilter for easy searching.
If it's not classified, would you post the target's URL?
Regards!
 
Hi, arishy!
Give a look at the uploaded files, and check the data there (use filters) against the page source code to see if you get the required field names.
Regards!
 

Attachments

  • Using Regular Expression to Extract specific substring - 1 (for arishy at chandoo.org).xlsx
    29.6 KB · Views: 3
  • Using Regular Expression to Extract specific substring - 2 (for arishy at chandoo.org).xlsx
    54.9 KB · Views: 3
Thank you for taking the time to help me.
No, but I am sure you got my remark about dealing with complicated web pages.
Now if you look at the source code of the SECOND page (The log in page)
at
<div class="cr2tooltip">
<input id="_58_login" name="_58_login"
type="text" value=""
autocomplete="off" maxlength="30"/>

This where I need the "_58_login" to set my Username in the VBA I am creating.

I really hate to waste your time digging into it...
All I want is a simple VBA macro to login to the main page.after successfully entering the Username and password and hitting submit
 
Hi, arishy!
You started this topic asking for a function to handle string parameters and return parsed stuff. You were led to many links related, always assuming that you'd had the web page source code yet into a string variable. Then you changed to how to find out the object (not variable) names of a page, so as to fill them from VBA. ed by code In the links posted. I posted a full procedure with which I then posted the files with the field details for each page, and all the required parts to build a procedure to fill them. Now you're asking for a code that does all the job. It's a quite different issue than the 1st one, isn't it?
With my 2 procedures and the 2 uploaded files you have all the elements required for doing the job. I haven't analyzed the web page source code, and as long as it exposes all the data fields and doesn't handles them via Java Script or other non-exposing methods, the goal maybe achieved.
It's just a matter of time and dedication, that requires medium (not basic) VBA skills for piloting IE Object Browser, both things which I don't actually have. If I happen to find them in the next days I'll get back to you.
A last thought. Being an internet banking webpage, I'd never think about automatizing it from any other application, in fact I'd even try to use the virtual on-screen keyboard to log in. Just a matter of security and privacy concerns. And not in public or shared computers, but in my private ones as well; you'll never know until it's too late that your system has been compromised (antivirus not working or passed by, malware, etc.).
Regards!
 
Hi, SirJB7

You are absolutely right in your comments. After all the support you gave me, and solely my fault, I am still trying to solve this very common situation, from your side , and First for me.

The reason for changing gear was the advice I got here to avoid using RE if I can. And I did.

The function I started with was to parse data from a web page. This web page is the Main Page in my Bank Statement.

Once I have retrieved the page into excel I will use the function to extract the numbers I need.

I faced a hurdle of getting to this main page , mainly the logging in ( The rest of the story you know already)

As for your security issues with Banking system, all I am doing is emulating what the bank is allowing me to do:

1. Go to a web page
2. Sign in
3. Look, print, save what I want from MY main page and leave.

If I try to do something else their highly sophisticated web site will not allow me and may be even block me from further use of MY ACCOUNT. But again your concern is well taken.

Here what I did so far and I need your help
Code:
Sub Basic_Web_Query()

Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        .Navigate "https://ebanking.cibeg.com/web/guest/home"

??????????????????????????????????????????????????????????????????????????????????????/

       
        .document.all.Item("_58_login").Value = "UserName"
        .document.all.Item("_58_password").Value = "Password"
        .document.forms(0).submit
                Do Until .ReadyState = 4
                DoEvents
                Loop
    End With

With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://ebanking.cibeg.com/web/guest/home?p_p_id=58&p_p_lifecycle=0&p_p_state=maximized&p_p_mode=view&saveLastPath=0&_58_struts_action=%2Flogin%2Flogin&login_event_id=0.5675766871015673&loginType=3", Destination:=Range("$B$4"))

.Name = "q?s=goog_2" ???????????????????????????
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "1,2"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False

End With
End Sub
The specific help :
As you know (you actually went to the links), to login you have to go to TWO pages before you allowed to see your main account page

1. First link is just asking you to LOG ON
2. If you press LOGON it will take you to a second page to key in your credentials (2nd page)

The code above will not do that ( see ?????) because I do not know how to click the LINK LOG ON using VBA to go to the second page.
3. The second link in the QueryTables statement is not needed because the LOG ON once pressed will take you there.(to the second page) I need to fix that too.

The purpose of the script is to reach My Account Main Page.
Once there I will dump the page to Excel and believe it or not use THE FUNCTION
 
Hi, arishy!

Very common situation (for me) vs. first (for you). Don't believe it's exactly like that, in fact I've always done web access things for freely available pages, just twice with login required.

I'm very short of time this weekend (as usual), including Monday that's a non working local day, so I don't promise to help you earlier than Tuesday. In the meanwhile, do come back and post any question, if I can I'd answer it.

Regards!

EDITED

PS: I didn't analyzed 1st page, but just gave a quick look to 2nd page. The submit button is _58_signInButton. You should proceed similarly as in my 2nd snippet and after filling ID & Pwd use the previous button as ksOSubmit constant. So why don't you proceed directly to the 2nd one and try to adapt the 2nd code?
 
Very good question.....Answer is Security.....
If you do that (use the second link directly) there is script to prevent you from doing it. It has to do of clearing cache... I will try it and send you the exact message.
Have a very relaxing long week end. And we will see then
 
Here where I stand:
1. Navigated to page1 (LOG ON)
2. Navigated to page 2 ( UserName/Password)
3. Successfully opened the main page where I can download it for further analysis.

Now, If I use :

With ActiveSheet.QueryTables.Add(Connection:= ......
I am stuck with WHAT URL I use ??? It is the CURRENTLY OPEN WEB PAGE and I do not have info about it ??

Is there a way to save the CURRENTLY OPENED WEB PAGE URL to a variable, then use that in the statement above ??

Your help will be greatly appreciated.
 
Hi, Arishy!
It'll all depend on how the website do the job. There're those which change the URL accordingly to each page that you're actually seeing and those which don't do any change (maintaining the same URL) doing everything with JavaScript, for example. As I can't access at that main page (neither yours nor any other) I can't confirm what case is it.
Check this link as reference of what happened there when selecting options and changing what displayed but not the URL:
http://chandoo.org/forum/threads/read-data-from-web-form-to-excel-sheet-using-vba.9782
Regards!
 
Back
Top