• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Relative referencing is wrong

Peter Bartholomew

Well-Known Member
Surely not!

After all almost every spreadsheet that has a formula has relative references; they are the default. They are fundamental to the way in which spreadsheet technology is conceived. Each cell may contain a formula that references a number of other cells but it can only return a single value. From the outset this 'many to one' behaviour distorts the way business problems are analysed. Solutions are built bottom up in a piecemeal fashion.

It is, however, unusual for a cell to contain data with unique characteristics; just as sheep tend to come in flocks, so Cells tend to come in Ranges that contain lists or, less commonly, arrays. Unfortunately, 'normal' practice is to ignore the lists and arrays and concentrate on the formula to build a field within a single record of the list or a single element of an array.

Most commonly, one sees large, semantically opaque formulas being built within a cell and then the final instruction is 'and then fill down' (or across or both). There are even special notations to support this by occasionally invoking the mysteries of 'row or column anchoring'.

What other option would be possible?
I believe it is possible to build solutions perfectly well using absolute references only. Instead of
= X2 + Y2
filled down to give
= X3 + Y3
= X4 + Y4

= …
and so on ad-nauseum one could/should simply have an array relationship
= X + Y
The number of values returned should not be left to the user (i.e. how far should the formula be filled down), it is completely determined by the dimension of the arrays X and Y. All the user can bring to the party is error.

If a reference to a single element of an array is needed, the relationship could be built from a meaningful expression. A function
= RELATIVE( A, B, C )
could be an absolute reference to A but with a offsets that matches that of the formula cell to its containing range B (or if it is not in B then is may, optionally, be in C). A compact notation for a relative reference simply encourages excessive use.

When are relative references either meaningful or simply needed?
The use of @ to reference fields within the same record as the formula cell is both meaningful and important. References to other records are not desirable since sorting and filtering are legitimate list operations that can easily destroy other relative references.

The other use for relative referencing is to cope with data dimensionality in excess of two. If one of the dimensions can be eliminated by relative referencing, that enables a spreadsheet solution rather than switching to another application for model building.

If you do not like what I have written please pitch in. Even if you do not convince me, it might still help me get some understanding of the motivation behind normal spreadsheet practice. It took a video of Levi Bailey working on a spreadsheet for me to realise how fast a the conventional 'action-led' development practices could be.
 

Chihiro

Excel Ninja
My take. Excel is visual oriented tool not code/programming oriented tool.

For it to gain acceptance among larger audience. It's beneficial to have numbered relative reference, instead of contextual reference.

In DAX it does use [Column] as reference (i.e. array/list), and does away with relative reference.
 

Peter Bartholomew

Well-Known Member
I agree with each of your observations.

I probably also took the fact that both M and DAX work well without requiring formulas that apply to single values within the list as providing some vindication of the ideas that were gradually taking shape within my mind.

The problem I as I see it being "Is it reasonable to expect a visual, action-led and intuitive process for manipulating numerical data to provide a robust basis for building models that (purport to) form the basis for important decisions?"

… or should spreadsheets be seen as fit only for quick ad-hoc calculations of minor significance? I suspect that is how most IT professionals see it.
 

Hui

Excel Ninja
Staff member
There are some new Dynamic Formula functions about to be released.
They are available on the Excel Fast release version already

They will add a lot of functionality to spreadsheets and nearly remove the Ctrl+Shift+Enter functionality
They also expand as the ranges require

Stay tuned
 

Peter Bartholomew

Well-Known Member
I probably have posted this discussion sooner, since I have long held the view that both relative referencing and the direct referencing notation are unsuitable for serious solution development. Now it may well be that the distinction between array and non-array formula now simply fades away. Unless, of course, users simply ignore the new functionality when it is introduced :(.

I was first alerted to the possibility of the new developments at the ModelOff GTC London in 2016 and have been eagerly awaiting change since that time. As someone who often builds models entirely from array formulae, I am keen to put dynamic arrays through their paces, though I might need to get the case to buy a further 365 license for that purpose past senior management (my wife) :eek:.
 

Peter Bartholomew

Well-Known Member
Hui
Thanks so much for the reference. I have printed it off and read pages 1-34 in detail! The new functionality seems to be far more pervasive than I had expected. Soon, I will have to unlearn my behaviour of committing all formulas with CSE :)!

Over the last year or so I have been publishing a series of Excel articles 'Excel for Engineers and other STEM Professionals' in an engineering magazine for analysts. The subtitle could have been 'If you're so smart and use matrix algebra every working day, why do your spreadsheets look as cr*p as everyone else's?'. I can see a rework being called for in due course!
 
  • Like
Reactions: Hui

Peter Bartholomew

Well-Known Member
I had held off posting this thread because there is a limit to how many people one is willing to risk upsetting and how often. Now, though, I have been well and truly overtaken by events. Check out this 20min conference clip.

https://myignite.techcommunity.microsoft.com/sessions/64705

Do you believe array functionality will now be the starting point for all Excel calculation?

Will 'and copy down' disappear forever?

OR

Will disinterest and reluctance to change mean that things simply stay the same? Backward compatibility would allow that after all.
 

Peter Bartholomew

Well-Known Member
I learnt some new things this week.

The first is that CSE is not needed to make any Excel function or operator evaluate an array formula. The Excel calculation engine works with 2D arrays and always has. This applies to F9 in the formula bar, Named formulas, conditional formatting formulae …

This can be seen from the addition of arrays shown below. The first row might suggest that CSE is needed. The second row shows that the '+' operator is perfectly aware of the arrays and has, in fact, calculated each term in the 3x3 array without CSE.

upload_2018-10-7_11-40-42.png

The problem is that a formula in the grid does not convert a range into the corresponding array whenever a value is expected by the formula. It will instead try to pick a value out of the array by implicit intersection. Most of the errors you see when you forget CSE arise during the implicit intersection process [#N/A - no intersection, #VALUE! - multiple intersections].
CSE is simply the instruction to omit the implicit intersection step and load the array as requested. SUMPRODUCT is different, not because it magically does something extra; it is different because it omits the implicit intersection step. ROWS does the same, otherwise it would always return 1.

For those amongst you for whom this is a statement of the obvious, please accept my apologies.
 
Last edited:

Peter Bartholomew

Well-Known Member
Having spent some of the last month or so looking at the new dynamic ranges (including some great video material by Mike Girvin) I return to the topic of this discussion.

Is there any calculation that really still needs the concept of relative referencing or could it be left to gracefully slide into oblivion along with Excel's XLM macro language? Of course, human psychology might chart a different path.
 

Chihiro

Excel Ninja
I'd argue that there's still place for it.

Particularly in string manipulation. While PQ and new dynamic ranges do handle things. It's often beneficial to just use single cell references to construct SQL string on Excel sheet, making minor adjustments on the fly. Quick copy paste into SQL MS, to insert many records quickly in one shot.

I often use this approach, when I need to insert into multiple dependent tables, and where I need to reference ID generated in the first step in the second.

But in general, for analysis, I've mostly done away with traditional formula and primarily use data model and DAX.
 

Peter Bartholomew

Well-Known Member
I need to reference ID generated in the first step in the second
Chihiro
I am not fully sure I understand the setup but then, I rarely have datasets to analyse with Excel. The snippet above, taken from your description, reminds me of the labels used to tag and reference statements in M.

Something I wonder, is the relative referencing you describe an essential feature of the solution or merely a device that is convenient and quick?

The other possibilities I have in mind are:
1) an absolute reference
2) an array with referencing by index
3) a table with foreign/primary key pairings
4) a linked list
Would any of these work for the use-case you describe?

There are places where I do use relative references but the contexts are very restricted. Ones that come to mind are
1) the '@' operator in structured referencing to reference properties of the current record
2) where the number of array dimensions exceeds two, looking up an index can be achieved by explicit intersection of the row through the formula cell and an index column used as steering data.
 

Chihiro

Excel Ninja
device that is convenient and quick
This. It's just one time insert into db. So, I'm not concerned with rigid integrity (that's handled on db side). Nor do I want to spend time studying pattern for potential exceptions. I just handle exceptions as I find them. It's fast way of updating/inserting 100~1k rows into existing table in db (using text functions to construct SQL insert statement etc).

I've always believed Excel's strength is in it's flexibility and adhoc reporting/transformation. Why I believe there is place for relative referencing along with more structured referencing types.
 

Peter Bartholomew

Well-Known Member
OK. I accept that relative referencing can give rapid results in support of ad-hoc calculation, and can be especially useful in cases where the workbook as not going to be saved for re-use.

When one considers that 96% of workbooks do not contain formulas, that does set the bar pretty low for the next tranche of workbooks that do perform minor or ad-hoc calculation.

At the other extreme, I have seen large, highly complex models build on single-cell relative referencing. The proponents of such workbooks, built using only the most basic (primitive?) techniques, claim that their workbooks are 'simple' and 'can be easily audited'. I agree except for the last two statements!

Simplicity in excessive quantity can almost be considered a definition of complexity.
 

Peter Bartholomew

Well-Known Member
Since I first opened this discussion to express my distaste for traditional spreadsheet practices of direct cell referencing and the use of single cells in preference to entire ranges (or, equivalently, values in place of arrays) the options available within Excel have developed almost beyond recognition.

In the attached workbook, I have unpivoted a crosstab array to produce a two column list with a single formula that spills to display the entire list.
66469

The LET functions allows named formulas to be defined using pairs of parameters and then used within the later parameters the formula. In the picture below, I have pasted the formula into a merged range and annotated it using cells to the left. To be honest, it looks very little like anything one has come to expect of spreadsheet solutions. Do you have any thoughts on this?

66470
 

Attachments

GraH - Guido

Well-Known Member
The same function technique already existed in DAX (VAR?) so for the Chihiros' of this world this is not so new.
I would be needing some practical practice before concluding anything insightful. Having watched some examples on LET - those MVPs battle to be the first one announcing it to the world it seems - it is looking promising. I have made some awful ridiculous formulas in my life that could have benifitted from LET.
Sidenote: I'm hating the insider program more and more :rolleyes:.
 

Peter Bartholomew

Well-Known Member
Sidenote: I'm hating the insider program more and more :rolleyes:.
Why is that? Is it because you feel excluded from the 'good bits' or have you had the opportunity to try it and found problems?

I am somewhat envious of @Chihiro's experience and painfully aware of my own limitations when it comes DAX, as well as diverse areas such as JavaScript or connection objects, but the paths one treads are to an extent conditioned by the problems one encounters. :)
 

GraH - Guido

Well-Known Member
Why that is? 2 years since dynamic arrays were announced and still I don't have them available. So yeah for feeling excluded...

So true and perhaps add Python and R. At least I remember he (@Chihiro) had some add-ons on his ribbon for these.
 

Peter Bartholomew

Well-Known Member
I tried to be patient and wait it out (I was using Excel 2010 on a desktop and Office 365 semi-annual on a laptop) but eventually I cracked and went out to buy an additional 365 subscription license. I wasn't expecting to have to run it on to a 2nd year though.

I had been badgering Microsoft staff at every opportunity to improve the experience of using arrays. I must admit, what I had envisaged was to declare an Array object, just as one might a Table, and have the object resize as necessary and only within the object should everything be governed by matrix algebra; no CSE! I had been 'soothed' with statements like 'I can't tell you about it but something is coming soon and you will really like it'.

I didn't foresee the possibility that the array operation would become the default, unless specific action is taken to prevent it. But then, I had no idea that the only data type recognised by the Excel calculation engine is the 2D array (number, text, Boolean or error).

The new LET function was really unexpected (for us non-MVPs anyway). What I have found is that the majority of the more complicated solutions that I have developed over the years (developed using named helper ranges and then uploaded by placing the formula directly into the name's 'refers to' box) are reproducible using formula-local variable within the LET function. Sometimes I look at the results and think 'What the …!' so I have no idea what others may be thinking!
 

Chihiro

Excel Ninja
LET is most useful when you want to reduce calculation time, in my opinion.

As it assigns names to calculation results. Thus eliminating need to re-evaluate calculation multiple times in same expression (i.e. cell calculation).

Readability benefit... I still question as users will need to be trained on it.

As for my add-in, I don't use R all that much anymore. As Python is more general purpose language and can be used for wide variety of applications.

For python I have following environment and add-in.
xlWings - Excel extension for Python with add-in on Excel side.
python - version 3.7
distribution - Anaconda
IDE - Either Spyder 4.x or VS Code python extension

Though primary use of python is for operation not suited/available in Excel.
- Running Machine Learning Model and quickly evaluating regression curve (using sci-kit learn)
- Some iterative calculation models

I haven't spent too much time exploring new array function behavior, as I spend most of my time in M query and DAX when using Excel.

In my opinion, it doesn't matter what tools/method you use, as long as you can validate result and are comfortable using the tool ;)
 
Top