Analysing sets of data

TheMouse · Nov 17, 2009

Hi all,

I've got a (hopefully) interesting problem; I need to analyse multiple sets of data to get the best match possible. Let me explain.

I'm starting with 2000 location ids each of which is separated into one of about 200 groups with a group id. The locations have been regrouped due to a change in the grouping criteria. What I need to do is compare the original groups with the new groups and rename the new groups to ensure minimum change.

Example:

Original groups:

ID Group ID

123 CE01

124 CE01

125 CE01

126 CE02

127 CE02

128 CE02

129 CE03

130 CE03

131 CE03

New group

ID Temp Group ID

123 T01

124 T01

131 T01

126 T02

127 T02

130 T02

129 T03

125 T03

128 T03

In the examples above T01 would become CE01 as there's a 66% match. Similarly, T02 would become CE02. T03 would become CE03 by default.

Can anyone help?

Thanks

Phil

keymaster · Nov 17, 2009

you can write a simple UDF that can take 2 parameters,

(1) old group name

(2) list of new groups

then scan the list of new groups to find the closest match.

Analysing sets of data

TheMouse

New Member

keymaster

New Member