• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Only counting cells when different from a reference cell?

Austinma

Member
Hi
I am attempting a relatively simple task but I’m struggling, and was hoping that someone might have some advice.

Specifically: I would like to count how many times a cell in my data range is NOT the same as my reference cell.

I have a reference cell and a non-numerical data range.

For example

Reference cell A1 = Apple

Data range in B1:F1 = Apple, Apple, Pear, Peach, Apple Apple

The answer to my question is 2 (i.e. Pear and Peach)

I can use the following Countif formula to give me the reciprocal answer (i.e 4)
=Countif(B1:F1,A1) but is there a way of including a NOT logical in there somewhere?

I’ve also fudged it with the following formula taken from previous Chandoo posts:

=SUMPRODUCT(1/COUNTIF(B1:F1,B1:F1))-1

But I think this is a bit clunky….is there a better, simpler way?

Thanks in advance for your help in this matter,

Regards,

Mark
 
Thanks to Sajan for his quick reply :)

Just to up the ante a bit. Is it possible to take Sajan's formula:

=Countif(B1:F1,"<>"&A1)

and combine it with another....

In an ideal world I'd like to ask more about those cells in my data that do not match my reference sequence. For example is it possible to list what those different values are and list and tally them individually in a range on the worksheet somewhere??

For example in the following data range B1:H1

Apple, Apple, Pear, Peach, Apple, Apple, Pear

The answer I would really like would be:

Pear = 2
Peach = 1

I'm guessing I would have to have a table with the fruits listed (Pear, Peach etc). But what if I don't know what the fruits are and can't pre-populated the table accordingly?

Thanks once again in advance for all your help,

Regards,

Mark
 
Hi Mark,
There are many options for listing out the names of the fruits, from a source list that may have them listed one or more times.

Do you have the list (i.e. all in a single row or single column) of fruits somewhere, but do not know if the list is going to grow or shrink? Or do you have a data area (rows and columns) where fruits are listed, that may grow or shrink? We can accommodate either case, but the solution is different for these scenarios.

By the way, in case your actual data domain is not about fruits and vegetables, please take the time to describe your entire situation so that we can determine a solution that fits your total question.

-Sajan.
 
Dear Sajan,

Thanks once again for your reply and advice - it's very much appreciated :)

I've (hopefully) attached a spreadsheet with a typical data set that should make things a little clearer/specific....

Specifically what I'm trying to do is compare a reference range (protein amino acid sequence) to a number of related amino acid sequences (clones 1 to n). In my example file I have the reference data running from D2:S2. The different clone sequences are then listed below the reference (D4:S23). The reference sequence will remain unchanged, but the number of related clones may vary (i.e. D4:Sn). The number of alternative amino acid codes (A, R, N, D.... etc) is invariable at 20 and these are listed in a separate table (U4:U23).

My latest query centres on counting the number of times an alternative amino acid appears at a given position in the reference sequence. I've manually filled in the first column by way of illustration (V4:V23).

Hopefully this makes sense? Please do ask for further clarification if required.

Thanks once again for all your help,

Best wishes,

Mark
 

Attachments

  • MAustin_Workbook 1.xlsx
    16.3 KB · Views: 1
Hi Mark ,

I am sure Sajan will give you the solution , as and when he comes online ; in the meantime , can you clarify how you got the data in V4 through V23 ?

The reference list ( D2 through S2 ) is as under :

Q V L Q W Y P A A A G H W E A R

The first row of data pertaining to Clone 1 is as under :

Q V L L W Y P A F V G H W E A E

So , from the above , how do you get the data in V4 through V23 ?

Narayan
 
Dear Narayan,

Thanks for your reply.

In answer to your question the clones sequences listed in V4:V23 are, to all sense and purposes, cut and pasted from an output file generated by an amino acid/protein sequence analysis programme. All clone sequences are going to be the same length (i.e 16 individual characters), so it's just a case of populating my clones below the reference sequence by a quick Ctrl V. To reiterate a point I highlighted earlier: the number of clones may vary from 1 to n.

Best wishes,

Mark
 
Hi Mark ,

Sorry , but I am still in the dark ; your earlier post had the following sentence :
My latest query centres on counting the number of times an alternative amino acid appears at a given position in the reference sequence. I've manually filled in the first column by way of illustration (V4:V23).
According to that , I aassumed you have filled in the following data in V4 through V23 :

A ----- 1
E ----- 1
F ----- 1
V ----- 2

I also assumed that you want the remaining cells in the range W4 through AO23 to be filled in with appropriate numbers , based on the data you have given in the range D5 through S23.

Are my assumptions correct ? If so , then what is the basis for arriving at the numbers in column V ?

Narayan
 
Hi Narayan,

Sorry for the confusion. :oops:

You're absolutely right - what I actually want is to know how many times one of the 20 alternative amino acids appears at a given position in the reference amino acid sequence. (Actually I suppose that should be 19 alternative amino acids to account for the one amino acid defined in the reference cell)

I've amended the excel file to reflect this.

Hopefully this should make more sense now.....?

Best wishes,

Mark
 

Attachments

  • MAustin_Workbook 2.xlsx
    16.1 KB · Views: 4
Hi Mark ,

Sure. Can you check your file and see if the figures are correct ?

Narayan
 

Attachments

  • MAustin_Workbook 2.xlsx
    16.9 KB · Views: 4
Dear Narayan,

Perfect - very elegant!!

Thank you for your input in this - it's very much appreciated.

Have a great weekend.

Regards,

Mark
 
Hi Narayan

I've now attempted to use the formula you suggested:

=IF($U7=D$5,"",COUNTIF(D$7:$26,$U7))

And it works perfectly providing me with the information that I need - which is great.

That siad, I was just wondering if it was possible to tweak the formula slightly.....

Specifically the formula as described counts the number of times a defined amino acid is seen in my range :), adds a " " in the cell when the amino acid is the same :), but if one of the 20 potential amino acid codes are not seen in my range and it is not the same as the reference then it returns a 0 :)

i.e. the formula is doing its job correctly but recording an absence as a zero.

Is there a way of perhaps including an OR logical that will also record a " " when the count is 0??

Thanks in advance for your advice,

Best wishes,

Mark
 
Hi Mark ,

You can either format the range to not display the zeros , or you can extend the formula to take care of the zeros by :
Code:
=IF($U7=D$5,"",IF(COUNTIF(D$7:D$26,$U7)=0,"",COUNTIF(D$7:D$26,$U7)))
Narayan
 
Dear Narayan,

My worksheet still continues to evolve....

I'm trying to complete another countif style formula. Specifically I would like to count the number of differences contained within a given clone (I expect it is easier to describe by looking at the attached file).

I've attempted to adapt your previous formula:

=COUNTIF(D7:26,"<>"&D5)

In this instance, the range part of the formula is fine, but I struggle to adapt the "<>"&D5 section as I'm now dealing with a range of values, rather than a single cell.

Also, as a nice to have, is there a conditional format option I can use to colour in the different amino acids?? The worksheet as it stands has had these differences manually highlighted in purple.

Your help is very much appreciated.

Best wishes,

Mark
 

Attachments

  • MAustin_Workbook_3.xlsx
    16.7 KB · Views: 3
Hi Mark ,

Somewhat confusing !

As I can see from your worksheet , the clone amino acids are in rows , so that Clone 1's amino acids extend over the range $D7 through $S7 ; is this correct ?

When you say you wish to count the differences , do you mean that you wish to find out how many of the positions have differences between the reference range from $D5 through $S5 , and each of the clone rows ?

For instance , for Clone 1 , the number of differences will be : 4 , in positions 4 , 9 , 10 and 16. Is this correct ?

If so , then in U7 , you can use the following formula :

=SUMPRODUCT(--($D$5:$S$5<>$D7:$S7))

Copy this down , to get the differences for the remaining clones.

Narayan
 
Dear Narayan,

Apologies for the confusion - but you have deciphered what I mean/want correctly :)

The formula works really well - thanks.

Could expand what the '--' part of the formula means? I've never seen this used before.

Best wishes,

Mark
 
Hi Mark ,

This is what is called double negation , and is commonly used to convert logical results ( TRUE / FALSE ) to numeric values ( 1 / 0 ).

The SUMPRODUCT does an array comparison , comparing each of the values in one array ( $D5 through $S5 ) with each of the values in the other array ( $D7 through $S7 ) ; for each comparison , the result will either be TRUE if the two values do not match , or FALSE if the two values do match.

Thus for U7 , the intermediate result of this array comparison , would be the following array :

{FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE}

where the TRUE values correspond to the mismatches in positions 4 , 9 , 10 and 16.

Instead of counting how many values are TRUE , we convert the above logical results into numeric values , with 0 standing for FALSE , and 1 standing for TRUE. This will result in the following array :

{0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,1}

Now , it's a simple matter to use the SUMPRODUCT function to sum the elements of the above array , which results in 4.

The conversion from the TRUE / FALSE results to the 1 / 0 results is done by using the double negation --

You may ask why we need the double negation , why not just one ?

If you have just one minus sign , then the resulting array is like this :

{0,0,0,-1,0,0,0,0,-1,-1,0,0,0,0,0,-1}
Using SUMPRODUCT with this will give -4 as the result.

You can ask why we need to use two minus signs , why not use one plus (+) sign ?

Unfortunately , Excel does not respond to a + sign the way it does to a - sign ; using a + sign does nothing.

Narayan
 
Hi Narayan,

Thanks for taking the time to explain your formula in such detail. It all makes sense now seeing it broken down like that.

Have a good day,

Regards,

Mark
 
Back
Top