• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How would you define a relative dynamic range

Peter Bartholomew

Well-Known Member
The assumption is that you need to work on a workbook that has repeating blocks of data of varying length. The start of each block is marked, in general by an Order Number or Project Code or some such. In the attached Office 2016 workbook I have used the presence of a 1 in a
field '
BlockMarker'.

The objective is to create a range name 'block' that is defined relative to the cell that references it and starts with the preceding marker and ends with the row before the next marker. Thus, in the example, = SUM(block value) will only sum of values occurring within the current block irrespective of the cell actually chosen for the formula.

The use case for such would be to allow a layout and analysis that applies to a single instance of an activity to be reused for an arbitrary number of instances with minimal modification. My solution, based upon the use of MINIFS/MAXIFS (hence the requirement for Office 2016) is over-complex and computationally heavy, so there is plenty of room for improvement! Maybe well-chosen helper cells would speed recalculation?
 

Attachments

  • BlockSelection.xlsx
    15.6 KB · Views: 23
First, Change back to Normal cell References
Second Use distinct Block markers in the BlockMarker Column, eg 1,2, 3 or anything unique

Names:

BlockName: =I9
BlockStart: =MATCH(activeBlockStart,Table1[BlockMarker],0)
BlockLen: =MATCH(activeBlockStart+1,Table1[BlockMarker],0)-Blockstart
BlockSum: =SUM(OFFSET(Table1[[#Headers],[Value]],BlockStart,,BlockLen,1))

upload_2018-4-17_23-19-1.png

Enjoy
 
Hui. Thanks for the thoughts. The challenge was prompted by the Excel question:

https://chandoo.org/forum/threads/c...ore-than-50-of-total-value.38180/#post-228933

It occurred to me that there are many instances where it is easy to answer a question for a unique instance but far harder when the blocks repeat. In that case the formula
= IF( ( (SUMIFS( price, category, "Equipment" ) / SUM( price ) ) > 50%, "Equipment", "Aftermarket" )
would provide an easy classification for a single order but is harder to apply to multiple orders within a partitioned table.

Of course 'hand crafting' direct cell references would be an easy option but, if there is an error, how would you detect which of the 900 orders were in error? I abandoned direct referencing several years ago because it failed to capture the intent of the process step, even though the numbers could be right! A sensible option at that point might have been to ditch spreadsheets altogether and move to something more structured -- but I didn't.

It is possible that I should have posted this as a question rather than as a challenge but I thought that a good solution the problem might put off the moment when the only solution is to transform an entire dataset to something more standard in terms of database tables.
 
Peter

Whether the question is a Challange or a Question should be made clear up font, so that people don't waste time unnecessarily on Challenges if they don't want to

There is a large class of Excel problems can be solved much more simply by using better data layouts

This is one such problem

My preferred solution is that every record has a Block Number appropriately associated with it
My next preferred solution is that Block Numbers are incremental
Finally is random block Numbers
There is no way I would use the same block numbers as you provided.
 
@Hui, Thank you for the time you have spent on this; you have already prompted one or two thoughts.

I classified the post as a challenge because it is not connected with a specific problem, more an issue that I have seen arise in a number of different contexts. The basic situation is a worksheet that has a block-repetitive nature with some output metrics that need to be calculated within each block but with negligible needs to aggregate beyond a given block.

Examples that come to mind are survey results from Doodle Poll (that is organised question by question across the sheet), the duty rosters for an organisation (months down the page, departments from sheet to sheet), projects with materials requirements and resources organised by project. In each case, one could treat the source data as a form, extracting the data to a relational data table; analyse the data; place a filtered report back alongside each form. All pretty heavy in terms of IT content?

The 1s used to indicate a new block were simply an abstraction which amounted to 'assume we have some means of recognising the start of a new block' without making any further assumption concerning the nature of any identifiers that characterise the content.

One further issue is that I accept I may have made a rod for my own back in this context by refusing to use direct cell referencing. To explain why I believe Dan Bricklin's original decision to introduce direct referencing rather that 'doing it the programmer's way' relegates spreadsheets to the world of 'ad-hoc' computing as opposed to being a platform upon which reliable solutions may be built, would take an essay which I would not try to inflict upon you.
 
@Hui
Thank you for your prompting. It at least got me to where I wanted to be in being able to deal with block structured data without giving up on the use of names (to the exclusion of direct referencing). I reached a point where I could use name as if they were scoped relative to the block, the Name 'orderNumber' being one such.

Your comment on strategies for identifying the data from externally placed formulas prompted me to try a new solution strategy, namely using Power Query to analyse the structure of the data and create a Table of Contents and an Index Table that provide an effective means of navigating non-normalised data structures. A refresh is only needed if the structure is of the data is changed by increasing block sizes or adding blocks. Changes of values work in the normal immediate response one expects of spreadsheet.

I have noticed some pretty amazing levels of knowledge from your most frequent contributors on this forum. All the knowledge I have needed to move away from grid based thinking and more besides! I suspect that differences in the work practices will make my solutions to problems appear as something of an oddity but, for me, each problem provides another test case.
 

Attachments

  • BlockSelectionETL.xlsx
    39.5 KB · Views: 8
As per Hui's comments, a first step could be to restructure data into a tabular format more amenable to analysis by BI tools thereby avoiding the need for (clunky) dynamic references.

A simple approach, based on the set up in the initial post, could be to define:

blockID: =MATCH(rowNum, rowNum/blockMarker)

(like in challenge 021). In the above formula 'rowNum' could be substituted with 'ROW(blockMarker)' to avoid the extra column, or with 'TREND(1^blockMarker,,,0)' in the case that blockMarker is an array.

For evaluating names, the VBE immediate window can also be useful, eg:
- enter '[transpose(blockID)]' then highlight and drag to the watch window
- for a given (3rd) element enter ?Array([transpose(blockID)])(0)(3)
 
@Lori
I got the first formula working with text markers by using N(blockMarker<>""). I tend to overlook the innovative use of the intentionally generated error!

At the moment I do not understand how the TREND function is used but I will take another look on Thursday. As with the Watch window; that is an entirely new bag of tricks to me!
 
Hi peter, i have added a number of tricks to my collection from your demo files such as a self-referencing shuffle formula that I didn't believe even possible previously.

The TREND suggestion returns an increasing sequence of numbers between 0 and 1.5. I posted on the vault forum some time back and have since found various applications to 2d array indexing and unique ranks. It also leads to possibilities for creating sequences without any sheet references like:

=INT(TREND(2*n+ISERR(FILTERXML("<x>"&REPT("<x/>",n-2)&"</x>","//.")),,,0)/3)

(n<=8192), I'd much prefer if there were a function like SEQ(n) for this purpose, however.
 
@Lori
That seems to live up to the description "the innovative use of the intentionally generated error" in Spades! If I try that, I would definitely encapsulate the method within a named formula. To string some more clichés together, "it does what it says on the tin" but it is most certainly something that "shouldn't take place in front of the children"!

Out of curiosity I tried applying the name 'n' to a two cell row array to see whether I got a ragged list of indices. Something rather strange happened.

I never understood the assumption that your proposed SEQ(n) is not a fundamental requirement; I guess it comes from the fixation on cell location on the worksheet and workaround of relative referencing between two arbitrarily-placed items. 1- and 0-based indexing is a key element of Power Query so why not the worksheet formula?

Such a function couldn't be called INDEX because that name has already been misapplied to a function that returns an element of an array or a component range. Similarly, ROWS that does not return a collection of ROWS. With the excitement of developing an interactive WYSIWYG interface all sorts of basic programming principles got thrown to the wind!

More rigorous constructs have crept in over the years but, as far as I can see, these are deprecated (hated?) by the majority of spreadsheet users.
 
upload_2018-5-18_22-13-15.png

1] Similar to Hui's layout table in post #2, but with one formula solution

2] Click I8, select 1,2,3,4 and 5 from the dropdown list

3] In I10, enter :

=SUMPRODUCT((LOOKUP(ROW(8:37),ROW(8:37)/(C8:C37<>""),C8:C37)=I8)*F8:F37)

Regards
Bosco
 

Attachments

  • BlockSum.xlsx
    14.6 KB · Views: 5
@bosco_yip
Very neat. As I have observed before, I tend to neglect the option of generating a deliberate error and use a somewhat more cumbersome IF clause instead. At least I can read these formulas now, even if I fail when it comes to writing them.

Out of curiosity, what makes you opt for direct referencing over structured references? For me

=SUMPRODUCT((LOOKUP(Table1[rowNum],
Table1[rowNum]/(Table1[BlockMarker]<>""),
Table1[BlockMarker])=$I$8)
*Table1[Value])

is more intelligible but I guess it is the reverse for you.
 
@Haz
Thanks. The lack of MINIFS had made the workbook unusable on my Office 2010 desk machine. I implemented the workbook on my 'new' toy from Christmas (a Microsoft Surface with 365).
 
Because I'm unable to make such formulae (and I'm lazy by nature), I do this with PQ.
Note I added a column to keep static values, it was a copy paste of the values made by the randbetween.
upload_2018-6-1_20-40-50.png

Code to read "parameter", it is basically just a drill down done via the UI. Query is called qSelectBlock.
Code:
let
  Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
  #"Changed Type" = Table.TransformColumnTypes(Source,{{"active range", type text}, {"Column1", Int64.Type}}),
  Column1 = #"Changed Type"{0}[Column1]
in
  Column1
Code to get block data:
Code:
let
  Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
  #"Removed Columns" = Table.RemoveColumns(Source,{"ValueRand"}),
  #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"rowNum", Int64.Type}, {"BlockMarker", Int64.Type}, {"Value", Int64.Type}}),
  AddDistinctBlockName = Table.AddColumn(#"Changed Type", "Custom", each if [BlockMarker] <> null then [rowNum] else null),
  FillDownBlockName = Table.FillDown(AddDistinctBlockName,{"Custom"}),
  GroupByBlockName = Table.Group(FillDownBlockName, {"Custom"}, {{"KeepTable", each _, type table}}),
  AddIndex = Table.AddIndexColumn(GroupByBlockName, "BlockName", 1, 1),
  ApplyDynamicFilter = Table.SelectRows(AddIndex, each ([BlockName] = qSelectBlock)),
  GetBlockStart = Table.AddColumn(ApplyDynamicFilter, "Block Start", each List.Min(Table.Column([KeepTable],"rowNum"))),
  GetBlockLenght = Table.AddColumn(GetBlockStart, "Block Lenght", each Table.RowCount([KeepTable])),
  CalcSumOfBlock = Table.AddColumn(GetBlockLenght, "Sum Of Block", each List.Sum(Table.Column([KeepTable],"Value"))),
  RemoveCols = Table.RemoveColumns(CalcSumOfBlock,{"Custom", "KeepTable"}),
  UnpivotCols = Table.UnpivotOtherColumns(RemoveCols, {}, "Attribute", "Value")
in
  UnpivotCols
Could make the refresh run automatically with a single macro line, but I'm not bothered by the extra mouse click.
 

Attachments

  • Copy of BlockSelection.xlsx
    25.4 KB · Views: 4
Small rework of the previous with a slicer on pivot.
 

Attachments

  • Copy of Copy of BlockSelection_2.xlsx
    28 KB · Views: 1
Back
Top