Only counting cells when different from a reference cell?

Austinma · Sep 20, 2013

Hi
I am attempting a relatively simple task but I’m struggling, and was hoping that someone might have some advice.

Specifically: I would like to count how many times a cell in my data range is NOT the same as my reference cell.

I have a reference cell and a non-numerical data range.

For example

Reference cell A1 = Apple

Data range in B1:F1 = Apple, Apple, Pear, Peach, Apple Apple

The answer to my question is 2 (i.e. Pear and Peach)

I can use the following Countif formula to give me the reciprocal answer (i.e 4)
=Countif(B1:F1,A1) but is there a way of including a NOT logical in there somewhere?

I’ve also fudged it with the following formula taken from previous Chandoo posts:

=SUMPRODUCT(1/COUNTIF(B1:F1,B1:F1))-1

But I think this is a bit clunky….is there a better, simpler way?

Thanks in advance for your help in this matter,

Regards,

Mark

Sajan · Sep 20, 2013

Hi Mark,
You can do
=Countif(B1:F1,"<>"&A1)

Cheers,
Sajan.

Austinma · Sep 20, 2013

Hi Sajan,

That's fantastic!! Thanks for taking the time to reply.

Best wishes,

Mark

Austinma · Sep 20, 2013

Thanks to Sajan for his quick reply

Just to up the ante a bit. Is it possible to take Sajan's formula:

=Countif(B1:F1,"<>"&A1)

and combine it with another....

In an ideal world I'd like to ask more about those cells in my data that do not match my reference sequence. For example is it possible to list what those different values are and list and tally them individually in a range on the worksheet somewhere??

For example in the following data range B1:H1

Apple, Apple, Pear, Peach, Apple, Apple, Pear

The answer I would really like would be:

Pear = 2
Peach = 1

I'm guessing I would have to have a table with the fruits listed (Pear, Peach etc). But what if I don't know what the fruits are and can't pre-populated the table accordingly?

Thanks once again in advance for all your help,

Regards,

Mark

Sajan · Sep 20, 2013

Hi Mark,
There are many options for listing out the names of the fruits, from a source list that may have them listed one or more times.

Do you have the list (i.e. all in a single row or single column) of fruits somewhere, but do not know if the list is going to grow or shrink? Or do you have a data area (rows and columns) where fruits are listed, that may grow or shrink? We can accommodate either case, but the solution is different for these scenarios.

By the way, in case your actual data domain is not about fruits and vegetables, please take the time to describe your entire situation so that we can determine a solution that fits your total question.

-Sajan.

Austinma · Sep 21, 2013

Dear Sajan,

Thanks once again for your reply and advice - it's very much appreciated

I've (hopefully) attached a spreadsheet with a typical data set that should make things a little clearer/specific....

Specifically what I'm trying to do is compare a reference range (protein amino acid sequence) to a number of related amino acid sequences (clones 1 to n). In my example file I have the reference data running from D2:S2. The different clone sequences are then listed below the reference (D4:S23). The reference sequence will remain unchanged, but the number of related clones may vary (i.e. D4:Sn). The number of alternative amino acid codes (A, R, N, D.... etc) is invariable at 20 and these are listed in a separate table (U4:U23).

My latest query centres on counting the number of times an alternative amino acid appears at a given position in the reference sequence. I've manually filled in the first column by way of illustration (V4:V23).

Hopefully this makes sense? Please do ask for further clarification if required.

Thanks once again for all your help,

Best wishes,

Mark

NARAYANK991 · Sep 21, 2013

Hi Mark ,

I am sure Sajan will give you the solution , as and when he comes online ; in the meantime , can you clarify how you got the data in V4 through V23 ?

The reference list ( D2 through S2 ) is as under :

Q V L Q W Y P A A A G H W E A R

The first row of data pertaining to Clone 1 is as under :

Q V L L W Y P A F V G H W E A E

So , from the above , how do you get the data in V4 through V23 ?

Narayan

Austinma · Sep 21, 2013

Dear Narayan,

Thanks for your reply.

In answer to your question the clones sequences listed in V4:V23 are, to all sense and purposes, cut and pasted from an output file generated by an amino acid/protein sequence analysis programme. All clone sequences are going to be the same length (i.e 16 individual characters), so it's just a case of populating my clones below the reference sequence by a quick Ctrl V. To reiterate a point I highlighted earlier: the number of clones may vary from 1 to n.

Best wishes,

Mark

NARAYANK991 · Sep 21, 2013

Hi Mark ,

Sorry , but I am still in the dark ; your earlier post had the following sentence :

My latest query centres on counting the number of times an alternative amino acid appears at a given position in the reference sequence. I've manually filled in the first column by way of illustration (V4:V23).

According to that , I aassumed you have filled in the following data in V4 through V23 :

A ----- 1
E ----- 1
F ----- 1
V ----- 2

I also assumed that you want the remaining cells in the range W4 through AO23 to be filled in with appropriate numbers , based on the data you have given in the range D5 through S23.

Are my assumptions correct ? If so , then what is the basis for arriving at the numbers in column V ?

Narayan

Austinma · Sep 21, 2013

Hi Narayan,

Sorry for the confusion.

You're absolutely right - what I actually want is to know how many times one of the 20 alternative amino acids appears at a given position in the reference amino acid sequence. (Actually I suppose that should be 19 alternative amino acids to account for the one amino acid defined in the reference cell)

I've amended the excel file to reflect this.

Hopefully this should make more sense now.....?

Best wishes,

Mark

NARAYANK991 · Sep 21, 2013

Hi Mark ,

Sure. Can you check your file and see if the figures are correct ?

Narayan

Austinma · Sep 21, 2013

Dear Narayan,

Perfect - very elegant!!

Thank you for your input in this - it's very much appreciated.

Have a great weekend.

Regards,

Mark

Austinma · Sep 23, 2013

Hi Narayan

I've now attempted to use the formula you suggested:

=IF($U7=D$5,"",COUNTIF(D$7:$26,$U7))

And it works perfectly providing me with the information that I need - which is great.

That siad, I was just wondering if it was possible to tweak the formula slightly.....

Specifically the formula as described counts the number of times a defined amino acid is seen in my range

, adds a " " in the cell when the amino acid is the same

, but if one of the 20 potential amino acid codes are not seen in my range and it is not the same as the reference then it returns a 0

i.e. the formula is doing its job correctly but recording an absence as a zero.

Is there a way of perhaps including an OR logical that will also record a " " when the count is 0??

Thanks in advance for your advice,

Best wishes,

Mark

NARAYANK991 · Sep 23, 2013

Hi Mark ,

You can either format the range to not display the zeros , or you can extend the formula to take care of the zeros by :

Code:

=IF($U7=D$5,"",IF(COUNTIF(D$7:D$26,$U7)=0,"",COUNTIF(D$7:D$26,$U7)))

Narayan

Austinma · Sep 23, 2013

Dear Narayan,

Thanks once again for all your help - your fix(es) works really well.

Best wishes,

Mark

Austinma · Sep 30, 2013

Dear Narayan,

My worksheet still continues to evolve....

I'm trying to complete another countif style formula. Specifically I would like to count the number of differences contained within a given clone (I expect it is easier to describe by looking at the attached file).

I've attempted to adapt your previous formula:

=COUNTIF(D7:26,"<>"&D5)

In this instance, the range part of the formula is fine, but I struggle to adapt the "<>"&D5 section as I'm now dealing with a range of values, rather than a single cell.

Also, as a nice to have, is there a conditional format option I can use to colour in the different amino acids?? The worksheet as it stands has had these differences manually highlighted in purple.

Your help is very much appreciated.

Best wishes,

Mark

NARAYANK991 · Sep 30, 2013

Hi Mark ,

Somewhat confusing !

As I can see from your worksheet , the clone amino acids are in rows , so that Clone 1's amino acids extend over the range $D7 through $S7 ; is this correct ?

When you say you wish to count the differences , do you mean that you wish to find out how many of the positions have differences between the reference range from $D5 through $S5 , and each of the clone rows ?

For instance , for Clone 1 , the number of differences will be : 4 , in positions 4 , 9 , 10 and 16. Is this correct ?

If so , then in U7 , you can use the following formula :

=SUMPRODUCT(--($D$5:$S$5<>$D7:$S7))

Copy this down , to get the differences for the remaining clones.

Narayan

Austinma · Sep 30, 2013

Dear Narayan,

Apologies for the confusion - but you have deciphered what I mean/want correctly

The formula works really well - thanks.

Could expand what the '--' part of the formula means? I've never seen this used before.

Best wishes,

Mark

NARAYANK991 · Sep 30, 2013

Hi Mark ,

This is what is called double negation , and is commonly used to convert logical results ( TRUE / FALSE ) to numeric values ( 1 / 0 ).

The SUMPRODUCT does an array comparison , comparing each of the values in one array ( $D5 through $S5 ) with each of the values in the other array ( $D7 through $S7 ) ; for each comparison , the result will either be TRUE if the two values do not match , or FALSE if the two values do match.

Thus for U7 , the intermediate result of this array comparison , would be the following array :

{FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE}

where the TRUE values correspond to the mismatches in positions 4 , 9 , 10 and 16.

Instead of counting how many values are TRUE , we convert the above logical results into numeric values , with 0 standing for FALSE , and 1 standing for TRUE. This will result in the following array :

{0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,1}

Now , it's a simple matter to use the SUMPRODUCT function to sum the elements of the above array , which results in 4.

The conversion from the TRUE / FALSE results to the 1 / 0 results is done by using the double negation --

You may ask why we need the double negation , why not just one ?

If you have just one minus sign , then the resulting array is like this :

{0,0,0,-1,0,0,0,0,-1,-1,0,0,0,0,0,-1}
Using SUMPRODUCT with this will give -4 as the result.

You can ask why we need to use two minus signs , why not use one plus (+) sign ?

Unfortunately , Excel does not respond to a + sign the way it does to a - sign ; using a + sign does nothing.

Narayan

Austinma · Sep 30, 2013

Hi Narayan,

Thanks for taking the time to explain your formula in such detail. It all makes sense now seeing it broken down like that.

Have a good day,

Regards,

Mark

Only counting cells when different from a reference cell?

Member

Excel Ninja

Member

Member

Excel Ninja

Member

Attachments

Excel Ninja

Member

Excel Ninja

Member

Attachments

Excel Ninja

Attachments

Member

Member

Excel Ninja

Member

Member

Attachments

Excel Ninja

Member

Excel Ninja

Member