michaelaltose
New Member
Hello,
THE QUESTION:
My ultimate goal is to look at how a set of students answered all of the questions on a multiple-choice exam, and compare each student with each other student to see if there are any pairs of students who are more similar to each other than all the other pairs - suggesting that copying (or "collaboration") might have occurred between those two students.
I am open to your suggestions, but my idea is to create a "heat map" showing how similar each pair is.
THE DETAILS:
This is a sample set of data, where the students are named S1 thru S10, and there are 10 questions (Q1 thru Q10) in this data set. Each question could be answered A,B,C, or D.
. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
S1 A B D C A B C D A A
S2 B B A C A A D D A A
S3 A B D C A B D D A A
S4 B A A C B B C D B A
S5 A B B D C D C D A D
S6 C B D C B C D D C D
S7 A D D C A B D A A D
S8 A B C A A B C D A A
S9 D B D C D A C D D C
S10 A C D B A B C D A C
So far, I can compare S1 to all the other students for just Q1 using the formula:
{=IF(B2=$B$2:$B$11,1,0)}
which will give me the output:
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 1
S2 0
S3 1
S4 0
S5 1
S6 0
S7 1
S8 1
S9 0
S10 1
But automatically extending this table to compare S2 to the class in the second column, then comparing S3 to the class in the third column, etc, requires a lot of manual input. If there were a way to do this easily, I would end up with a table that looks like this:
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 1 0 1 0 1 0 1 1 0 1
S2 0 1 0 1 0 0 0 0 0 0
S3 1 0 1 0 1 0 1 1 0 1
S4 0 1 0 1 0 0 0 0 0 0
S5 1 0 1 0 1 0 1 1 0 1
S6 0 0 0 0 0 1 0 0 0 0
S7 1 0 1 0 1 0 1 1 0 1
S8 1 0 1 0 1 0 1 1 0 1
S9 0 0 0 0 0 0 0 0 1 0
S10 1 0 1 0 1 0 1 1 0 1
Repeating this process again for questions Q2 thru Q10 would yield a total of 10 tables like the table above. And adding all 10 tables together would yield a table like this (I just made up the values):
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 10 6 1 5 5 5 2 3 2 3
S2 6 10 4 7 4 6 4 10 1 7
S3 1 4 10 7 6 3 1 4 8 3
S4 5 7 7 10 7 3 8 3 7 7
S5 5 4 6 7 10 6 7 6 4 2
S6 5 6 3 3 6 10 4 8 5 6
S7 2 4 1 8 7 4 10 9 2 4
S8 3 10 4 3 6 8 9 10 1 7
S9 2 1 8 7 4 5 2 1 10 2
S10 3 7 3 7 2 6 4 7 2 10
Obviously each student is a perfect match with him/herself (and the top right of the table is a mirror image of the bottom left). But otherwise, high matches might suggest that cheating occurred. I could also use the conditional formatting function to create a color heat map.
And of course, I have more than 10 students (usually about 70) and more than 10 questions (usually 20-40 per exam).
I don't have any VBA experience (yet!), so if this can be done with formulas or existing Excel tools, that would be easiest for me, but I'm willing to take any suggestions you can offer!
Thank you for reading this far and helping me ponder this question and improve my Excel skills!
THE QUESTION:
My ultimate goal is to look at how a set of students answered all of the questions on a multiple-choice exam, and compare each student with each other student to see if there are any pairs of students who are more similar to each other than all the other pairs - suggesting that copying (or "collaboration") might have occurred between those two students.
I am open to your suggestions, but my idea is to create a "heat map" showing how similar each pair is.
THE DETAILS:
This is a sample set of data, where the students are named S1 thru S10, and there are 10 questions (Q1 thru Q10) in this data set. Each question could be answered A,B,C, or D.
. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
S1 A B D C A B C D A A
S2 B B A C A A D D A A
S3 A B D C A B D D A A
S4 B A A C B B C D B A
S5 A B B D C D C D A D
S6 C B D C B C D D C D
S7 A D D C A B D A A D
S8 A B C A A B C D A A
S9 D B D C D A C D D C
S10 A C D B A B C D A C
So far, I can compare S1 to all the other students for just Q1 using the formula:
{=IF(B2=$B$2:$B$11,1,0)}
which will give me the output:
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 1
S2 0
S3 1
S4 0
S5 1
S6 0
S7 1
S8 1
S9 0
S10 1
But automatically extending this table to compare S2 to the class in the second column, then comparing S3 to the class in the third column, etc, requires a lot of manual input. If there were a way to do this easily, I would end up with a table that looks like this:
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 1 0 1 0 1 0 1 1 0 1
S2 0 1 0 1 0 0 0 0 0 0
S3 1 0 1 0 1 0 1 1 0 1
S4 0 1 0 1 0 0 0 0 0 0
S5 1 0 1 0 1 0 1 1 0 1
S6 0 0 0 0 0 1 0 0 0 0
S7 1 0 1 0 1 0 1 1 0 1
S8 1 0 1 0 1 0 1 1 0 1
S9 0 0 0 0 0 0 0 0 1 0
S10 1 0 1 0 1 0 1 1 0 1
Repeating this process again for questions Q2 thru Q10 would yield a total of 10 tables like the table above. And adding all 10 tables together would yield a table like this (I just made up the values):
. S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 10 6 1 5 5 5 2 3 2 3
S2 6 10 4 7 4 6 4 10 1 7
S3 1 4 10 7 6 3 1 4 8 3
S4 5 7 7 10 7 3 8 3 7 7
S5 5 4 6 7 10 6 7 6 4 2
S6 5 6 3 3 6 10 4 8 5 6
S7 2 4 1 8 7 4 10 9 2 4
S8 3 10 4 3 6 8 9 10 1 7
S9 2 1 8 7 4 5 2 1 10 2
S10 3 7 3 7 2 6 4 7 2 10
Obviously each student is a perfect match with him/herself (and the top right of the table is a mirror image of the bottom left). But otherwise, high matches might suggest that cheating occurred. I could also use the conditional formatting function to create a color heat map.
And of course, I have more than 10 students (usually about 70) and more than 10 questions (usually 20-40 per exam).
I don't have any VBA experience (yet!), so if this can be done with formulas or existing Excel tools, that would be easiest for me, but I'm willing to take any suggestions you can offer!
Thank you for reading this far and helping me ponder this question and improve my Excel skills!