Hi all,
I recently posted this on Mr Excel, but I've not had a reply in 48 hours so I thought it safe to see if anyone here at Chandoo could help. Hope someone can!
This is a tough one to explain, but I’ll do my best as I’ve not found anyone who can offer a solution.
I have a list of measures that a number of organizations are monitoring and I've been asked to find out which organizations are measuring similar things. There are thousands of measures, so I’ve used the fuzzy match plug-in to identify similarities in the text. I’ve set the threshold at 0.6.
The problem that I’ve now got is that I now need to categorically state who measures similar topics. However, the way the fuzzy match works means that some similar measures might be matched for some organizations and not others, that’s because there are differences in how the measures are worded.
In my example (see below), we’ve got one organization measuring ‘Banana supply’ and the fuzzy match shows that another 3 are measuring the same sort of thing. However, further down the list I’ve got an organization measuring ‘Supply of local bananas’ and the match against this measure has revealed that a further organization is measuring something along the same lines. So, if I wanted to state who has an interest in counting bananas I should be listing a total 5 organizations, I need to somehow amalgamate the two groups.
This is a very simple example. I’m guessing that I won’t be able to avoid doing a manual check at some point, but if anyone can give me any hints as to how I can cut hours out of this task I’d really, really appreciate it.
Let me know if this isn’t clear and I’ll try and clarify.
I'm struggling to format the example table and I can't access any upload sites from my work LAN. If the table below comes out all wonky, here is the example on Mr Excel: http://www.mrexcel.com/forum/excel-questions/670379-comprehensive-fuzzy-match-counting-bananas.html
[pre]
[/pre]
I recently posted this on Mr Excel, but I've not had a reply in 48 hours so I thought it safe to see if anyone here at Chandoo could help. Hope someone can!
This is a tough one to explain, but I’ll do my best as I’ve not found anyone who can offer a solution.
I have a list of measures that a number of organizations are monitoring and I've been asked to find out which organizations are measuring similar things. There are thousands of measures, so I’ve used the fuzzy match plug-in to identify similarities in the text. I’ve set the threshold at 0.6.
The problem that I’ve now got is that I now need to categorically state who measures similar topics. However, the way the fuzzy match works means that some similar measures might be matched for some organizations and not others, that’s because there are differences in how the measures are worded.
In my example (see below), we’ve got one organization measuring ‘Banana supply’ and the fuzzy match shows that another 3 are measuring the same sort of thing. However, further down the list I’ve got an organization measuring ‘Supply of local bananas’ and the match against this measure has revealed that a further organization is measuring something along the same lines. So, if I wanted to state who has an interest in counting bananas I should be listing a total 5 organizations, I need to somehow amalgamate the two groups.
This is a very simple example. I’m guessing that I won’t be able to avoid doing a manual check at some point, but if anyone can give me any hints as to how I can cut hours out of this task I’d really, really appreciate it.
Let me know if this isn’t clear and I’ll try and clarify.
I'm struggling to format the example table and I can't access any upload sites from my work LAN. If the table below comes out all wonky, here is the example on Mr Excel: http://www.mrexcel.com/forum/excel-questions/670379-comprehensive-fuzzy-match-counting-bananas.html
[pre]
Code:
Organisation Measure Organisation Similar Measure
Org 1 Banana supply Org 2 Supply of bananas
Org 3 Banana Volume
Org 4 The number of bananas
Org 2 Supply of local bananas Org 1 Banana supply
Org 3 Banana Volume
Org 4 The number of bananas
Org 5 Supply of yellow bananas