• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA - extract unique words from range and count - optimization?

Hi cacos !​
I saw another way to count words but 'cause I was thinking counting comments is more logical,​
I didn't explore it …​
So in that particular case of many rows with same few comments, this is an evolution of my speedy function !​
In fact a 2 in 1 function counting either comments or words of your sample file in less than 0.2 second ‼​
(on a test computer just after boot time with no data in cache !)​
Try procedure CountWords :​
Code:
Sub DictionaryHashComments(Optional Words As Boolean)
    Dim Dict As New Dictionary
'    Set Dict = CreateObject("Scripting.Dictionary")
     Sheet2.UsedRange.Clear
    ReDim CT&(0)
    AR = Sheet1.Range("I2", Sheet1.[I2].End(xlDown)).Value
 
    For R& = 1 To UBound(AR)
        HV = Dict.HashVal(AR(R, 1))
 
        If Not Dict.Exists(HV) Then
            ReDim Preserve CT(UBound(CT) + 1)
            CT(UBound(CT)) = R
        End If
 
        Dict.Item(HV) = Dict.Item(HV) + 1
    Next
 
    Application.ScreenUpdating = False
    Sheet2.Activate
 
    If Words Then
        CC = Dict.Items
        Dict.RemoveAll
        ERASECHAR = [{",",".",";"}]
        KEEPSPACE = [{"—","'s "}]
 
        For R = 1 To UBound(CT)
            T$ = LCase$(Application.Clean(AR(CT(R), 1)))
            For Each C In ERASECHAR:            T = Replace$(T, C, ""):        Next
            For Each C In KEEPSPACE:            T = Replace$(T, C, " "):       Next
            For Each C In Split(T):  Dict.Item(C) = Dict.Item(C) + CC(R - 1):  Next
        Next R
 
        With [C2].Resize(Dict.Count, 2)
            .Value = Application.Transpose(Array(Dict.Items, Dict.Keys))
            .Sort [D2], xlAscending, Header:=xlNo, MatchCase:=False
        End With
 
    Else
        [C2].Resize(Dict.Count).Value = Application.Transpose(Dict.Items)
        For R = 1 To UBound(CT):  Cells(R + 1, 4).Value = Application.Clean(AR(CT(R), 1)):  Next
    End If
 
    Application.ScreenUpdating = True
    Dict.RemoveAll
    Set Dict = Nothing
End Sub
 
 
Sub CountComments()
    DictionaryHashComments
End Sub
 
 
Sub CountWords()
    DictionaryHashComments True
End Sub
And you can still CountComments !​
Enjoy & Like it ‼​
 
It's really cool!

I love the multi-word phrases, I'll definitely include this in my analysis.

Sorry for the late response Marc!
 
I remember I had seen a similar approach that took care of all the "unnecessary" phrases (such as those ending in "the", "of", "I", "in", etc).

Applying a procedure that'd take care of that seems like the logical step, I'll get to that.
 
I did it works perfectly, I'm going over the code since I'm unfamiliar with the dictionary approach. Thanks Hui for that reference as well.

Thank you!
 
For R& = 1 To UBound(AR)

never seen that before and it produces an error.
Please explain?

Nice coding :awesome: can we add also the row number of the first occurrence of each unique value ?
[be2].Resize(Dict.Count, 2).Value = Application.Transpose(Array(Dict.Keys, Dict.Items, dict.rownumberpos))
 
Hi, jan deschepper!
You're posting in a very old thread (more than 3 years!). Please start a new thread and post a link to this old one if you think it might help.
Regards!
PS: Don't reply to this message. Thank you.
 
Back
Top