• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Dynamic Sum Formula

polarisking

Member
I want to create a SUM formula whose range is fungible. In rows 1 through 49, there's random, unpredictable cells of data. Row 50 is empty. In row 51, I have column titles; in rows 52 though N, I have data.

I'd like to put a SUM formula above each Column title in row 50 - something like

sum(offset(B$52,0,0,counta($A$52:$A$10000),1))

I'm doing COUNTA for Column A because as it contains descriptors.

I'd really like the end of the data in column A (beginning in row 52) to be returned in a more precise way rather than guessing that I'll never have more than 10,000 rows of data. Any ideas?
 
You're current formula looks pretty good, as the COUNTA should be limiting the data...can you expand on what part of the formula you want to improve?
 
I want to create a SUM formula whose range is fungible. In rows 1 through 49, there's random, unpredictable cells of data. Row 50 is empty. In row 51, I have column titles; in rows 52 though N, I have data.

I'd like to put a SUM formula above each Column title in row 50 - something like

sum(offset(B$52,0,0,counta($A$52:$A$10000),1))

I'm doing COUNTA for Column A because as it contains descriptors.

I'd really like the end of the data in column A (beginning in row 52) to be returned in a more precise way rather than guessing that I'll never have more than 10,000 rows of data. Any ideas?
Hi,

It doesn't matter how you do it but you've got to check a number of rows that is larger than the likely amount of data and if you've concluded that you'll never have more then 10k rows of data; it's actually 10k-51 but that's nit-picking, then that's the amount of rows you need to test. I suspect that most people here would think that the solution you have is pretty good.

One small thought, why calculate the last row at all? If the rows below the data you want to sum are empty then what's the matter with.

=SUM(B52:B10000)

or

=SUMIF(A52:A10000,"<>",B52:B10000)
 
Mike & Luke, thanks for the response. I guess I'm trying to be too clever and arrive at the precise end of the data column in some misguided attempt to save several nanaseconds of performance overhead.

Which brings up an interesting (I think) question: Rather than stopping at 10,000, why wouldn't one just extend to the end of the column - 1,048,576 - and insure that you're never short?

Is there meaningful overhead associated with this? I suspect not, inasmuch as Excel only "carrys" as much data as actually exists.
 
Hi ,

Can you not define a named range with the formula :

=MATCH("zzzz",$A:$A)

This will return the last row with data in column A ; you can use this with your formula , as in :

=SUM(OFFSET(B$52,0,0,lastrow-51,1))

Narayan
 
Mike & Luke, thanks for the response. I guess I'm trying to be too clever and arrive at the precise end of the data column in some misguided attempt to save several nanaseconds of performance overhead.

Which brings up an interesting (I think) question: Rather than stopping at 10,000, why wouldn't one just extend to the end of the column - 1,048,576 - and insure that you're never short?

Is there meaningful overhead associated with this? I suspect not, inasmuch as Excel only "carrys" as much data as actually exists.
Mike & Luke, thanks for the response. I guess I'm trying to be too clever and arrive at the precise end of the data column in some misguided attempt to save several nanaseconds of performance overhead.

Which brings up an interesting (I think) question: Rather than stopping at 10,000, why wouldn't one just extend to the end of the column - 1,048,576 - and insure that you're never short?

Is there meaningful overhead associated with this? I suspect not, inasmuch as Excel only "carrys" as much data as actually exists.
Hi,

I increased the data to 50k rows to get some times and as you can see in the attached file the difference between the fastest and slowest formula is 0.00095 seconds. IMO irrelevant. Where it could become an issue is with say SUMPRODUCT and I summed the same range using that and as you can see it's an order of magnitude slower.
 

Attachments

  • Book1.xlsx
    565.7 KB · Views: 2
I think Mike is on the right track. Sometimes, it just doesn't make a difference, so go ahead and call-out entire column. Other functions can take longer. Typically, if the function has to do a more complex analsis, like COUNTIFS, SUMIFS, SUMPRODUCT, where we start dealing with multiple arrays, you want to keep things small.

One other small note, if you still want to worry about <=2003 compatibility, calling out anything greater than row 65536 would cause problems for older versions.

Also on the subject of speed, I would recommend getting rid of OFFSET since it's volatile, and using INDEX instead. something like:
=A1:INDEX(A:A,MATCH("ZZZ",A:A))
Will generate a range from col A going from row 1 to last cell with text in col A.
 
The Match("ZZZ",A:A) was what I was looking for . . .but, I have a question

If the last value in Column A is numeric, the MATCH formula returns #N/A. I also tried CHAR(255) to insure I was testing for the highest possible value (ÿ). Any idea as to why it's failing? Logically, if you change the numeric value to text, you're back in business.
 
If the last entry is going to be number than try 9.9999999E+307 in place of "zzzz".

Or for anything be it text or number try:
=MATCH(TRUE,ISBLANK($A:$A),0)-1

Edit: Note with above formula make sure your data in say column A should not have blanks in between, say data is in A1:A10, so this range should not be blank and A11 should be blank. Same way when data expands the rule should repeat.


Regards,
 
I can't duplicate your results about getting an error. :(
Row 8 is last row with text. All blank cells you see are truly blank.
upload_2014-10-8_13-7-49.png
 
Hi ,

The reason has to do with the way the MATCH function works when the third ( optional ) parameter is either omitted or put to 1.

When the third parameter is omitted , Excel uses the default value of 1 ; in this case , the MATCH function returns the location of the largest value in the range which is less than the lookup value.

When you use the lookup value of "zzzz" , we are using an alpha string which is unlikely to appear in any text data item , and which is greater than any other text string. Thus , when we have any text item in the data range , it is guaranteed that it will be less than the lookup value of "zzzz".

However , when we use a lookup value of even the largest number which is 9E307 , even this value is less than the single character A ! Thus , when we use a number as the lookup value , it fails when there are only text items in the data range.

The rule is therefore as follows :

1. When you use a lookup value of "zzzz" , the data range should have at least one text item ; if all the items are numbers , the MATCH function will return the #N/A error value.

2. When you use a number as a lookup value , the data range should have at least one number which is less than the lookup value ; it can have text items , but if it does not have even a single number which is less than the lookup value , the MATCH function will return the #N/A error value.

As to why using either a number or "zzzz" will return the last value in the range , and not the largest value which is less than the lookup value , this is because the MATCH function expects the data range to be sorted ; it does a binary search , and this can return unexpected results depending on the data.

Narayan
 
Back
Top