• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

extract unique words from range and count

VijaySM

New Member
Hi Friends

I am asking this question with respect to the below discussion.

http://forum.chandoo.org/threads/vb...rom-range-and-count-optimization.13001/page-2

@Marc L - I tried your macro CountWords, it is working fine for less uniques in data. But for large data about 40K, it is not showing all values. I have kept both Macros in single file and uploaded it. Please check and suggest if there is any mistake. Also, Is hash value different from normal dictionary. I have gone through this great reference in the below link, but since I am a beginner I am not able to find the exact reason.

https://www.experts-exchange.com/articles/3391/Using-the-Dictionary-Class-in-VBA.html

I am unable to upload the file at the moment as the site is not taking the upload but will try and do asap.

Thank you in advance
 
Last edited:
Hi !

As it depends on your data …

Your great reference is same Dictionary object used in my code
which purpose is to count words from same few comments !

If your data do not have same repeating few comments,
so to just count words do not use HashVal
but the Key as usual like in this great reference link.

As Dictionary object is also in VBA inner help …
 
Thanks Marc for your suggestion. So, Hashval, in your code is to count the words in comments and not data. Please go through the code and check. Now the words are working but not comments. Also, can an option be added to ask the user to specify whether to count words in data or comments to the macro.

All this code is added in a single module.

Code:
Sub DictionaryHashComments(Optional Words As Boolean)
    Dim Dict As New Dictionary
'    Set Dict = CreateObject("Scripting.Dictionary")
    Sheet2.UsedRange.Clear
    ReDim CT&(0)
    AR = Sheet1.Range("I2", Sheet1.[I2].End(xlDown)).Value
    'AR = Sheet1.Range("I2", Sheet1.Cells(Rows.Count, "I").End(xlUp)).Value
   
    For R& = 1 To UBound(AR)
        HV = AR(R, 1)
        If Not Dict.Exists(HV) Then
            ReDim Preserve CT(UBound(CT) + 1)
            CT(UBound(CT)) = R
        End If
        Dict.Item(HV) = Dict.Item(HV) + 1
    Next
    Application.ScreenUpdating = False
    Sheet2.Activate
    If Words Then
        CC = Dict.Items
        Dict.RemoveAll
        ERASECHAR = [{",",".",";"}]
        KEEPSPACE = [{"—","'s"}]
        For R = 1 To UBound(CT)
            T$ = Application.Clean(AR(CT(R), 1))
            For Each C In ERASECHAR:            T = Replace$(T, C, ""):        Next
            'For Each C In KEEPSPACE:            T = Replace$(T, C, " "):      Next
            For Each C In Split(T):  Dict.Item(C) = Dict.Item(C) + CC(R - 1):  Next
        Next R
        With [D2].Resize(Dict.Count, 2)
            .Value = Application.Transpose(Array(Dict.Items, Dict.Keys))
            .Sort [D2], xlAscending, Header:=xlNo, MatchCase:=True
        End With
    Else
        [D2].Resize(Dict.Count).Value = Application.Transpose(Dict.Items)
        For R = 1 To UBound(CT):  Cells(R + 1, 4).Value = Application.Clean(AR(CT(R), 1)):  Next
    End If
    Application.ScreenUpdating = True
    Dict.RemoveAll
    Set Dict = Nothing
End Sub
 
Sub CountComments()
    DictionaryHashComments
End Sub
 
Sub CountWords()
    DictionaryHashComments True
End Sub

Thanks
 
HashVal is used to count same sentences, not words …

If you load the cacos sample in its post #16, you will see there are
only 15 sentences (or comments) in cells repeated between 60K rows.
You could try then my code whatever for comments or words.

In the same thread from its first page there are other codes
as well under my code the very insteresting Phrase Density link …
 
Thanks Marc. Actually I thought the comments are 'incell' comments and tried to insert and count them. Now I got it that it is the unique list of complete sentences.
I saw the other codes also and the Phrase density is very nice and interesting.
 
Last edited:
Back
Top