• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Selecting varied numbers of records from a giant list

Hayley

Member
I have a list of 255,000 email addresses. I need to pull 8% of each domain name and create a new list out from those. I can't figure out a way to make this process easier.

For example, I have 48,791 "yahoo.com" and I need to grab a random 3,903 of them; I have 536 "me.com" and I need to grab a random 43 of them. The big hassle is for the domains where I need only 1 or 2. There are hundreds for which I only need a couple.

Right now I'm manually counting, cutting and pasting. Is there an easier way to do this?
 
Hi Hayley,

Here's an example of how to build a random seed generator. First, the data setup. I don't know whether you have one or two columns currently. I assumed 2, one with full address and one with just the domain. I created a 3rd.

upload_2015-12-15_12-46-16.png

Formula in C2 is:
=RAND()

Select columns A:C and create a PivotTable. In Row labels, put Group then Domains. Put Random Seed into the data field. Change to be a Sum.

Change PivotTable layout to (Design - Layout) to tabular. Change filter on List of Domains to a Value Filter - Top 10, change settings
upload_2015-12-15_12-47-36.png

Top 10 dialogue:
upload_2015-12-15_12-47-58.png

Refreshing the PivotTable will draw a new group of addresses, as the RAND formulas will have changed.
 
Example workbook attached in this post.
 

Attachments

  • RandomSeeding.xlsx
    15.5 KB · Views: 5
It's not working correctly for me. On the list of domains filter, when I click to do the filter there's a little warning message on there that says "Not all items are showing" and when I click on that message it says "This field has more than 10,000 items. Only the first 10,000 items are displayed." And when I spot checked yahoo.com, it only gave me around 1,900 email addresses instead of the 3,900 that made up the 8%. I assume that message about the first 10,000 is the reason?
 
OH! I think I got it...I actually changed the Values back to Count of Random Seed and then it calculated the 8% properly!
 
Back
Top