You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- Thread starter Hui
- Start date

Hi Hui,

Perhaps an idea for formula forensics on the -blog?

I will be doing a FF post on it in due course.

Last edited by a moderator:

There are actually two key concepts which have been used here ; John Jairo's brilliance is putting them to work in this unfamiliar setting.

1. Using two running totals to firstly find out where to stop , and then to get the exact changeover value.

For example , it is easy to see that we need to calculate :

22,559 x 55.4% +

12,737 x 62.5%

However , the question is how do we calculate the third value , since part way through this third value , we are exceeding the target of 40,000.

This is where the idea of 2 running totals helps.

Consider the first running total starting with the first value ; we will have :

22,559 ; 35,296 ; 47,633

Now consider the second running total which starts with the blank cell above the data in column C ; we will have :

0 ; 22,559 ; 35,296

Now , we subtract these values from the target of 40,000 ; the first set gives us :

17,441 ; 4,704 ; -7,633

while the second gives us :

40,000 ; 17,441 ; 4,704

As long as the first set is positive , we take the original data values of :

22,559 ; 12,737

When it turns negative , we take from the second set , which will give us :

4,704

Now , the issue comes to the second key concept ; how do we ignore the remaining values ? Since 4,704 was the changeover point , all the remaining values in the second set will be negative.

Ignoring these negative values forms the second key concept viz. using the LN and EXP functions , which essentially return positive values as they were originally , and convert all negative values to error values.

These error values are then trapped by the IFERROR.

Narayan

Good catch Hui!

Just for refine a little bit... I really like when all the solutions have a single formula for all the range. So, I make the changes:

[F6] : =MIN(40000,$C32-SUM($E6:E6))

[F38] (CSE) : =IFERROR((SUM(IFERROR(EXP(LN(IF(SUM($E6:F6)-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1))<_r,SUM($E6:F6)-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1)),_r))),)*$D10:$D31)-SUM(N(+$E6:E6)*N(+$E38:E38)))/F$6,)

Both formulas drag it to the right. Blessings!

=SUM(IFERROR(EXP(LN(IF(F$6-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1))<_r,F$6-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1)),_r))),)*$D10:$D31)/F$6

=IFERROR((SUM(IFERROR(EXP(LN(IF(SUM($E6:F6)-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1))<_r,SUM($E6:F6)-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1)),_r))),)*$D10:$D31)-SUM(N(+$E6:E6)*N(+$E38:E38)))/F$6,)

Hi John,

You are great

Both formulas are astonishing and beyond my understanding.

Regards,

Another approach, a bit shorter:

[F38] (CSE) : =IFERROR((SUM(IFERROR(EXP(LN(MMULT(MMULT(CHOOSE({1,2},SUM($E6:F6)-MMULT(N(ROW(_r1)>=TRANSPOSE(ROW(_r1))),N(+_r1)),_r),{1,1;1,-1}/2)^{1,2}^{1,0.5},{1;-1}))),)*$D10:$D31)-SUM(N(+$E6:E6)*N(+$E38:E38)))/F$6,)

Thanks for all your kind words.

By the way... Excellent explanation for @NARAYANK991. Blessings!

Sorry if I have double-posted this but I cannot find it above. I thought I posted it at 9:00 before going out for the day but it seems not. It still needs some tidying given that the workbook was written in a bit of a rush sometime between 8:00 and 9:00. Period 12 is wrong; this morning it was a #REF! error I am not sure whether that is better or worse!

I have worked through the problem again using more basic functions (no MMULT, just SUMIFS). There are no clever formulas, just a sequence of multi-cell array formulas that can be packaged within the corresponding Named formulas. The file with helper cells is the workbook in its original form, as developed, and does go some way to explaining the process.

I am sure no one will wish to adopt this approach but I hope it will be of some interest. The solution answers @Hui's challenge but without actually providing the formula. It is implicit within the solution if one cares to substitute out all of the names.

I am sure no one will wish to adopt this approach but I hope it will be of some interest. The solution answers @Hui's challenge but without actually providing the formula. It is implicit within the solution if one cares to substitute out all of the names.

@Hui

It took me a long time to realise that named formulas were just that, formulas and not variables. Then Charles Williams's observations that named formulas are not evaluated unless they are used within a cell suddenly made sense.

Today's effort would have gone much faster if I hadn't changed strategies mid-stream. I was tempted to use matrix accumulation and differencing operators but then decided that SUMIFS would be more efficient despite doubling the calculation to work with start and end of period cases.

Peter

It took me a long time to realise that named formulas were just that, formulas and not variables. Then Charles Williams's observations that named formulas are not evaluated unless they are used within a cell suddenly made sense.

Today's effort would have gone much faster if I hadn't changed strategies mid-stream. I was tempted to use matrix accumulation and differencing operators but then decided that SUMIFS would be more efficient despite doubling the calculation to work with start and end of period cases.

Peter

Fwiw, here's one more non-cse option referring only to constant cell values filled across (F38):

=SUMPRODUCT(MMULT(ABS(F6-MMULT(MMULT(($C10:$C31+MMULT(MMULT(SUMIF($F8:$Q8,{"<=","<"}&F8,$F6:$Q6)-SUMIF($B10:$B31,{"<","<="}&$B10:$B31,$C10:$C31),{1,1;-1,1}/2)^{1,2}^{1,0.5},{1,-1;-1,1}))^{1,2}^{1,0.5},{1;-1}/2)^{1,2}^{1,0.5},{1,-1;1,-1}/2)),{-1;1}/2),$D$10:$D$31)/F$6

It follows JJV's technique of making pairwise MIN/MAX replacements using MMULT (+ headbanging!) in:

=MAX(0,MIN(F$6,$C10,(SUMIF($F$8:$Q$8,{"<=","<"}&F$8,$F$6:$Q$6)-SUMIF($B$10:$B$31,{"<","<="}&$B10,$C$10:$C$31))*{1,-1}))

Hi ,

You should write a Formula Forensics post on how you develop such formulae !

Narayan

You should write a Formula Forensics post on how you develop such formulae !

Narayan

Hi ,

You should write a Formula Forensics post on how you develop such formulae !

Narayan

I will be Narayan

Just a bit busy at work at present

This will also take a bit more thinking than the usual FF!

@Hui

It took me a long time to realise that named formulas were just that, formulas and not variables. Then Charles Williams's observations that named formulas are not evaluated unless they are used within a cell suddenly made sense.

Today's effort would have gone much faster if I hadn't changed strategies mid-stream. I was tempted to use matrix accumulation and differencing operators but then decided that SUMIFS would be more efficient despite doubling the calculation to work with start and end of period cases.

Peter

Peter

Named Formula are also Array Formula

Which can be handy from time to time

Hi ,I will be Narayan

Just a bit busy at work at present

This will also take a bit more thinking than the usual FF!

I meant a post by Lori herself on how she develops such formulae.

Understanding the formula I can do on my own.

Narayan

In another post, Narayan has spoken of compiling formula patterns for solving problems. The key patterns needed here are arrays of running sums together with combinations of constraints.

1. For the constraints, results need to be positive, less than row and column limits as well as less than cumulative row and column limits.

This led to the second formula from the previous post entered into F10 and filled over the range. One can see the row and column symmetry in the formula, I'll leave details to readers or others with better explanatory skills.

2. Normally, I would be more than happy to have got this far! The real challenge is in making suitable modifications to support arrays of running sums and constraints by making the changes:

$B10 -> $B10:$B31

$C10 -> $C10:$C31

For this purpose, we can break down the formula into pairs of inputs:

=MAX(0,MIN(r,MIN(c,MIN(x,y))))

Here, r and c are row and column quantities and x and y are the overlapping totals in the row and column quantities.

3. Consider first MAX(0,z) where z=MIN(...) is a long formula.

The trouble with the obvious IF alternative is repeating this long formula twice, i.e. IF(z>0,z,0).

A second way, used above by John, is to find an expression that errors for negative values like EXP(LN(z)) or z^0.5^2 or EFFECT(z,1), and then pass to IFERROR.

A third slightly less obvious alternative which ties in with methods in the next section is to use ABS(z) = z^2^0.5 as in:

=(z+ABS(z))/2

=MMULT(z^{1,2}^{1,0.5},{1;1}/2)

4. For MIN(x,y) the obvious replacement =IF(x<y, x, y) again results in formula duplication. Instead we can use:

=(x+y-ABS(x-y))/2

=MMULT((x+{1,-1}*y)^{1,2}^{1,0.5},{1;-1}/2)

5. This method can also be applied to MIN(c,.) and MIN(r,.). Since r and c are both positive the formula can be rewritten:

=MIN(r,MIN(c,MAX(0,MIN(x,y))))

and then the exponents ^{1,2}^{1,0.5} can be replaced by ABS for the r and c parts.

At each step of this formula conversion process, results were checked by filling over the whole range. Last working versions were stored in a text file which proved far too useful!

Just in case anyone is interested, I thought I should document my approach for comparison purposes. This time I have used a verbose notation which may assist readability.

**The starting point is to name all the input data:**

On the supply side -

An index column identifying each parcel: parcel.index

The ore in tonnes for each parcel: Ore.supply\parcel

The percentage iron within the parcel: Fe.percent

The iron content in tonnes for each parcel: Fe\parcel

**On the demand side -**

An index column identifying each period: period.index

The ore in tonnes required over each period: Ore.demand\period

The calculation is based upon matching the cumulative demand to the cumulative supply to identify the parcel used to fulfil each month's requirement.

**The following is calculated -**

On the supply side -

The cumulative weight of ore available prior to the current parcel:*Ore.cumulative\parcel*

= SUMIFS(*Ore.supply\parcel*, *parcel.index*, "<" & *parcel.index* )

The iron content available prior to the current parcel:*Fe.cumulative\parcel*

= SUMIFS(*Fe\parcel*, *parcel.index*, "<" & *parcel.index* )

Because the total ore available might place a limit on the demand that can be met the total is calculated:*Ore.totalSupply*

= SUM(*Ore.supply\parcel*)

**On the demand side -**

The cumulative total ore required at the start and end of each period:

*Ore.cumulativeDemand\period*

= SUMIFS(*Ore.demand\period*, *period.index*, {"<","<="} & *period.index* )

[this borrows from Lori]

Because supply may limit the extent to which the requirement is met the tonnage actually delivered within each period is calculated as a worksheet formula:

= MMULT( IF(*Ore.cumulativeDemand\period*<*Ore.totalSupply*, *Ore.cumulativeDemand\period*, *Ore.totalSupply* ), {-1;1} )

The range is named*Ore.supplied\Period* and is used in subsequent calculations, starting with a variation *Ore.cumulativeSupply\period* on the previous running total

= SUMIFS(*Ore.supplied\Period*, *period.index*, {"<","<="} & *period.index* )

**The ore and iron content delivered for each period:**

At the start and end of any period the total weight of ore delivered to date using entire parcels can be looked up and the balance to be taken from the current parcel*Ore.balance\period *is calculated by:

=*Ore.cumulativeSupply\period* -

LOOKUP(*Ore.cumulativeSupply\period*, *Ore.cumulative\parcel* )

The cumulative weight of iron Fe.cumulativeSupply\period supplied for the beginning and end of each period is given in terms of the total for the last parcel and the weight and iron content of the balance:

= LOOKUP(*Ore.cumulativeSupply\period*, *Ore.cumulative\parcel*, *Fe.cumulative\parcel* ) +

*Ore.balance\period* * LOOKUP( *Ore.cumulativeSupply\period*, *Ore.cumulative\parcel*, *Fe.percent* ) / 100

**The iron content supplied over each period is then given by a final worksheet array formula**:

= 100 * MMULT(*Fe.cumulativeSupply\period*, {-1;1} ) / *Ore.supplied\Period*

What I have used this challenge for is to demonstrate (to myself) that it is possible to develop a solution to a complex problem entirely within terms of the domain of application and with no reference to cells or their relative position. That is to deny the assumptions of end user computing?

On the supply side -

An index column identifying each parcel: parcel.index

The ore in tonnes for each parcel: Ore.supply\parcel

The percentage iron within the parcel: Fe.percent

The iron content in tonnes for each parcel: Fe\parcel

An index column identifying each period: period.index

The ore in tonnes required over each period: Ore.demand\period

The calculation is based upon matching the cumulative demand to the cumulative supply to identify the parcel used to fulfil each month's requirement.

On the supply side -

The cumulative weight of ore available prior to the current parcel:

= SUMIFS(

The iron content available prior to the current parcel:

= SUMIFS(

Because the total ore available might place a limit on the demand that can be met the total is calculated:

= SUM(

The cumulative total ore required at the start and end of each period:

= SUMIFS(

[this borrows from Lori]

Because supply may limit the extent to which the requirement is met the tonnage actually delivered within each period is calculated as a worksheet formula:

= MMULT( IF(

The range is named

= SUMIFS(

At the start and end of any period the total weight of ore delivered to date using entire parcels can be looked up and the balance to be taken from the current parcel

=

LOOKUP(

The cumulative weight of iron Fe.cumulativeSupply\period supplied for the beginning and end of each period is given in terms of the total for the last parcel and the weight and iron content of the balance:

= LOOKUP(

= 100 * MMULT(

What I have used this challenge for is to demonstrate (to myself) that it is possible to develop a solution to a complex problem entirely within terms of the domain of application and with no reference to cells or their relative position. That is to deny the assumptions of end user computing?

...there was a simpler approach...

Someday we might need a debate as to what constitutes 'simplicity'. For me 'simplicity' lies in capturing the structure that underpins the problem domain and carries meaning step by step towards a solution.

You and others here seem to specialise in producing solutions within a single formula, like a little jewel of amazing ingenuity that dazzles and bewitches.

For F1F9 and the FAST standard 'simplicity' lies in pedantry and the use of naïve techniques on the assumption that the user may wish to check the calculation with a pocket calculator.

Can we all be right?

I think Lori's usage of the word

If you remove this constraint , nothing can be much simpler than Hui's original approach which used helper cells.

Narayan

My original example is simple, can manually be followed and audited.

But I was recently confronted with a situation where the client wanted a solution which would have required between 50 and 100 blocks of that solution where each block could have extended to up to 1000 rows and 240 columns. Clearly unworkable.

So yes, it would have been simple to setup, simple to follow and audit, it is very visual and explainable, but it would not have been a usefull tool

Simplifying the problem to 50-100 data areas, with 50 solution rows, and a half dozen summary rows, gave a workable solution, despite the issues of unautitable cell formulas.

In terms of auditability I use my original simple mockup as a sample where they can see both solutions side by side.

But the solution to situations which cannot use helper cells , is not single formulae which can be understood by barely 1% of Excellers.

The correct solution would be to use VBA , which can have code that is well developed , properly commented , adequately tested and which is scalable in future.

There have been enough instances in this forum itself , where brilliant formulae are presented , and which are found to have flaws later ; in some cases , these flaws are pointed out by others , and in some cases by the creator of the formula.

Clearly such a situation cannot be allowed to develop in a working process , where such flaws can prove to be costly.

Narayan

A few reasons for adopting formula based solutions are:

- speed - sometimes they are faster than udfs (sometimes they are slower too!)
- portability - works with eg google sheets and mobile versions (previous formula was constructed on phone while on a train)
- creativity - leads one to think "outside the box" and can help develop skills in problem solving