The A1 notation is an abomination that has no place within serious spreadsheet design -- discuss?

Peter Bartholomew · Jun 17, 2018

My posts strike a slightly discordant note within the flow of discussion generated by 'Ask a Question' so I would like to invite comment on the somewhat contentious statement above.

My involvement with Excel only goes back to the early 2000's (I was using company computers from Crays to VAXStations but not PCs) so I do not have the historical perspective that many of you share. What I have pieced together is that the idea of electronic spreadsheets was conceived at a time when 4GL languages were the way of the future and intuitive, action-led WYSIWYG interfaces were the holy grail. Programming without visible programming steps as it were.

The idea of referencing content by its location rather than by name made for a more fluent interaction and avoided the need for 'tedious' (to quote Dan Bricklin) definitions. This allowed an explosion of 'end user computing' in which users had direct access to their data and had the means to manipulate it.

Now what I perceive as the downside. Single-cell, direct referencing captures the user action and records the result but fails to capture their intent or to link it to the terminology of the problem domain in which the problem to be solved is defined.

The inevitable result of the repetition of simple user actions is the gradual accumulation of errors, leading to the present situation in which only a small proportion (~10%) of workbooks is without significant error. The lack of domain-specific notation makes planning errors (omissions tending to be the most serious in that they are rarely detected) difficult to identify because the corporate standards are not expressed in the same terms as the spreadsheet solutions.

I also believe that further weaknesses are also introduced by the concept of relative referencing but I think I should defer that discussion. There is only so much in the way of opinionated monologue that should reasonably be inflicted upon a reader!

Logit · Jun 18, 2018

Really ? This is an issue for you ?

Hmmm ...

NARAYANK991 · Jun 18, 2018

Hi Peter ,

Humans think of data as data , but in the computer world , we have database systems thinking of it as records , while at the hardware level we have bits , bytes , words ,... Object oriented programming thinks of data as objects.

The basic issue is that we need granularity , and those who developed the worksheet thought of a cell in a worksheet.

If we need to store a person's name and age and other related details , it is clear that we cannot have all of this data stored together in one entity , what ever you may call that entity , since we might want to manipulate these items separately.

One way that Excel allows the user to store this information is in individual cells.

Programming languages allow for abstraction , but as far as a user is concerned , I think they may prefer something concrete. A cell is one such concrete entity , which they can see for themselves.

Narayan

Hui · Jun 18, 2018

I don't see why it is an issue?

If you don't like it don't use it?

I don't like the taste of Guinness Beer, so I don't buy it.

There are a few areas where i like to use R1C1 notion and that is in checking formulas, because it highlights structural inconsistencies when you see a whole worksheet with only a few different formula

eg: Find the odd man out?

vs

Its a lot easier in the second image

Peter Bartholomew · Jun 18, 2018

Hello Hui

My issue is not simply a matter of A1 versus R1C1! I have serious concerns about the wisdom of using any notation that opens up an address space of over 17billion cells on a sheet, only a minute proportion of which actually form a part of the model / database.

Whilst I am about it, I also believe the indiscriminate use of relative referencing is also a mistake. Relative referencing may perform calculations involving the correct numbers (as you have shown) but it fails to recognise the existence of the arrays or tables which provide context and meaning for the references and the associated calculations.

In your example, I would normally go for the array formula
{= numberline * TRANSPOSE(numberline)}
but I would have no objection to implicit intersection
= numberColumnHeading * numberRowHeading
At least those formulas have the decency (integrity?) to return errors when the formula cell takes them out of scope (referencing non-existent negative numbers or numbers greater than 10 in the present context).

Peter Bartholomew · Jun 18, 2018

Narayan

Thank you for injecting some programming facts of life! I hope to build on the points you make later.

One issue arises from "a user … may prefer something concrete". I would argue that the abstraction you talk about does not benefit the computation process (abstraction is simply removed layer by layer during the calculation and it does represent an overhead). It is there to allow the developer to manage problems of greater complexity; its key role is to make solutions more scalable.

Without names, arrays, tables etc., traditional practice is great for ad-hoc, throwaway sums but is seriously lacking when it comes to more substantive work.

NARAYANK991 · Jun 18, 2018

Hi ,

I agree that abstraction is more for the programmer to visualize the real world in a simplistic manner.

Whether Excel could have , in addition to a cell by cell approach , also incorporated bigger entities such as arrays , is debatable. Only the developers of Excel can throw some light on this.

However , since VBA allows this , I think a user who is familiar with VBA can still leverage the power of these structures.

Narayan

Peter Bartholomew · Jun 18, 2018

Whether Excel could have, in addition to a cell by cell approach, also incorporated bigger entities such as arrays, is debatable.

… but that is the debate I am seeking here!

I would contend that Excel shows the hand of many developers and is almost schizophrenic in its ability to be all things to all people. Besides an individual cell notation and relative referencing, it supports the naming of ranges, multi-cell array formulas, the use of defined names to identify the steps of a nested formula, tables with their structured references and their more limited (but more meaningful) form of relative referencing using [@], pivot tables and, now, Power Query.

I haven't used a direct cell reference for five years now (I sometimes use VBA but that is typically to achieve objectives that cannot be performed by worksheet formulas), so it is possible. Perhaps the difference was that when I used Excel (in 2006 I think) to prototype a spectral analysis of the motion of a warship in the North Atlantic, I had no idea that other people used direct cell referencing; I had never read the 'tips and tricks' genre.

Debaser · Jun 18, 2018

Peter Bartholomew said:
I haven't used a direct cell reference for five years now

Purely out of curiosity, what do you use?

Chihiro · Jun 18, 2018

Excel Tables and named ranges?

Though I admit there are some significant benefits to using tables. There are many cases where I prefer simple ranges (such as when working with ADO; when I want to keep raw data separate from reporting workbook; when I'm using code to update sheet etc).

Chihiro · Jun 18, 2018

Oh, and I think of cells as visual representation of elements.

It makes it easy for users to organize data in meaningful way and perform ad-hoc analysis quickly without programming/coding skill. Also, by forcing alignment of column, row, structure it makes it so much easier for users to follow flow of data, and allows for simple referencing scheme for formula (programmatic referencing).

Personally, I believe merged cells to be worst feature in spreadsheets by allowing user to break away from 1 to 1 grid structure.

Peter Bartholomew · Jun 18, 2018

@Debaser

I tend to use Tables for source data and control cells. My default Normal style shows only a pale colour wash so, until I format the range by applying input style or a table, there are no visible cells. Something I like about tables is the way in which they keep the data, cell formats and Structured References synchronised to provide good visual queues as to the regions of the sheet that have been declared as being part of the model.

Where next depends upon the type of model. If it involves lists of data, I stick with the tables, maybe Get and Transform. Relative referencing is limited to using [@[field name]] to ensure the formulas a robust to sorting.

If it involves difference equations (such as running totals), I tend to use multi-cell array formulas to ensure they can contain only a single formula and I can guarantee there are no exceptions (a normal formula based upon cell-by-cell relative references is easily corrupted whenever the end user employs drag and drop to adjust their input data). For this, I use Named Ranges and select formulas that apply to the entire range where there is a choice of 2D, 1D, or named relative references.

Another device I use when I have contiguous subsets of the data to operate on is to define a dynamic named range and use its intersection with existing ranges to extract related subsets of the data. That has the effect of simplifying downstream formulas.

Peter Bartholomew · Jun 18, 2018

Chihiro

I am happy to use a grid and perfectly willing to see cells as a visible representation of an element. Where I have issues with normal practice is that the element is normally an element of something. It is the 'something' that is important and should be the entity to which formulas are applied. The cell formula is simply an instance of the overall relationship and should not require any individual representation of the relationship.

The flow of a calculation is important but I have little need to follow precedent graphs. With names, each formula is readable as is and there is no requirement to be able to track back to find out what '$PQ273' might be. Since following precedents is harder with names, it is probably a good thing that there is little need to do it!

Chihiro · Jun 18, 2018

I don't dispute that point. However, keep in mind, Excel (or Office suite) is designed with very wide audience in mind. It's advantage is in it's flexibility, and it's ease of use, most people can pick it up and start using without heavy investment in learning the software

Hence, I think it is good thing that Excel allows for so many different solutions/approach to wide variety of problems. Though sometime we get frustrated because of it's flexibility.

AliGW · Jun 18, 2018

Hui said:
Its a lot easier in the second image

The green flag is a bit of a giveaway ...

Peter Bartholomew · Jun 18, 2018

Last year I came across Dan Bricklin's TED talk on the origins of VisiCalc.

https://www.ted.com/talks/dan_bricklin_meet_the_inventor_of_the_electronic_spreadsheet?language=en

About 6½ minutes in Dan explains the origin of the grid and the map coordinates that are used to identify cells. Somewhat frustrating for me, the 'programmer's way' of named variables was the first thing he thought of but dismissed it a 'too tedious' since the user would have to provide the names. The next idea was to represent values in formulas by computer-generated names such as Value1, Value2 … That was dismissed on the grounds that the user would rapidly lose track of what was what. Hence the grid and the A1 coordinates that describe content by its location but provide no clue as to its significance.

In reality, is it that much more time-consuming to name a range than to annotate it with a label? Or is the suggestion that a spreadsheet should contain only the calculation?

If one looks at Shapes, Tables, Charts etc. one sees a different strategy in play. Names such as 'Rectangle 1' or 'Table 1' are generated automatically but the developer is free to replace them with something more meaningful that would serve to tie the object to some concept within the problem domain.

Instead the default action of 'click and pray' when building a formula populates the formula with single-cell relative references. Dragging the formula down, right or even up will propagate the formula as far as the developer chooses with no regard to the significance of the reference or its dimension. Subjects like 'partial anchoring' become an advanced mystic skill but little regard is paid to the properties of array algebra or database tables that make the relationship useful.

Peter Bartholomew · Jun 18, 2018

Despite being a 4GL environment, Excel recreates some of the problems of 1GL machine code in that data is referenced by location rather than content.

It took me a long time fighting the system before I could convince myself that 'the programmer's way' was still possible without leaving Excel and adopting some other code such as Quantrix or Analytica.

The granularity mentioned by Narayan is important but I seek more in the way of abstraction layered over the base functionality. The new BI tools offer such abstraction but they are not primarily aimed at model building.

shrivallabha · Jun 19, 2018

Interesting discussion!

a] I agree with Chihiro's viewpoint (post #14) in general. When an application is catering to wide range of audience then one should expect it to be far more general in its design and approach. There will definitely be something which will irritate one user.

b] Even applications designed specifically for an intent sometimes leave/miss something which someone will deem critical. I deal with multiple 3D CAD tools which serve to build a 3D model which is then used to generate various deliverables but each CAD suite comes with its own advantages and disadvantages.

To overcome this, the software licensing companies provide ability to customize the application to individual or organizational needs by exposing Component Object Model for developers.

c] In my opinion, your choice of application (rather than contesting the way selected application is designed) shall be the entry point. What shall be the choice of application to build the model per your requirements?

d] With respect to formulas you are referring to from a single standpoint. That is relationship of variables and their process to reach the final calculated value.

Let's take example of simple interest. If three individual cells were named as principal, nrofyears, rateofinterest then the formula would read quite clearly:
=(principal*nrofyears*rateofinterest)/100
Clearly intuitive when you are looking at the formula cell along with calculated value. However, when you have to trace the precedents which gave the resultant value you will have to either use Goto(F5) or name manager or some other indirect technique. But the way people build normally is by using another cell next to formula cell which acts as a label (of course it isn't relational but putting next suffices human intuition) which informs e.g. Simple Interest, Grand Total etc and the formula simply is:
=A2*B2*C2/100
and beauty of human mind is if it is copied down for hundreds of rows it quickly understands the relative positioning without having to state explicitly.

Using alphanumeric representation to identify the grid is a practice which is common in Engineering drawings as well. References are an easy way for people to locate information. It is easier to interpret. I concede that it becomes quite ugly when one is looking at reference like XFB1048575.

e] Finally, here's a TED talk which doesn't directly relate to our discussion
https://www.ted.com/talks/lera_boroditsky_how_language_shapes_the_way_we_think
but it does shed light on the way we think / interpret with respect to sentences being formulated. Perhaps, the architects of MS Excel gave more emphasis on relative positioning of references as that's what seemed more natural and intuitive to them.

Peter Bartholomew · Jun 19, 2018

@shrivallabha

Thank you so much for your comments. I will respond but at the moment I think I have swamped any discussion by 'laying out the stall' in too great a level of detail. I apologize to anyone who finds it difficult to know how to respond to an essay.

One thing I do take from Lera Boroditshy's talk is "Language guides reasoning". That would suggest that the notation we adopt for creating spreadsheets is important because it will facilitate some reasoning tasks at the expense of others.

Lori · Jun 19, 2018

Some like to plan trips from guidebooks; others like to work routes out from maps (with an A1-style grid)

Different work practices may help reduce errors but then A1-notation improves productivity for the many users who find it easier to work with. So from a company perspective there's a risk-reward trade off.

And while names are generally considered preferable from a developer standpoint, i take Shrivallabha's point that it's not a big mental leap from a cell reference to a variable name shown in an adjacent cell.

The 2d grid paradigm is more of a limitation than the notational convention in my mind.

Peter Bartholomew · Jun 19, 2018

Lori

The 2d grid paradigm

It is sufficient to hold database tables, 1d and 2d arrays, lists of lists (i.e. ragged-right arrays). Surely that exceeds the requirements of most users?
I agree that, once the dimensionality of the problem exceeds two, the going gets tough and Excel lacks operators that will sum over the 3rd index of four (say).
[Have I understood your point correctly?]

What I dislike about the A1 notation is that none of these data structures are explicitly recognised within the syntax. It breeds a bottom-up mind-set in which it is difficult to see 'the wood for the trees'.

Programming languages allow for abstraction , but as far as a user is concerned , I think they may prefer something concrete.

As Narayan pointed out this may suit many users but such approaches impose limitations and I would contend, increases the probability of error.

Lori · Jun 19, 2018

NarayanK makes very valid points relating to a general need for data granularity. I must confess I didn't have time to read the whole thread.

My comment above derived from the imperfect dichotomy of a siteseeing map (with a grid overlaid and annotations) versus a guidebook with numbered instructions of things to do. The former being more like a spreadsheet approach and the latter being more like structured code.

A 2d grid can work well as the basis of a UI but, depending on the model, it can quickly become cluttered with lots of variable sized input ranges. Functional code (based on array variables) may then be a better option..

Peter Bartholomew · Jun 19, 2018

The reason I find the A1 notation abhorrent is that by removing names one strips out meaning (derived from concepts within the problem domain) and strips out structure.

Putting the serious stuff aside for the moment, I also have a number of petty gripes that prove I was a grumpy old man long before my time!

Firstly A1 defies mathematical convention. There is a standard nomenclature for naming the elements of a matrix. Running across the top row one has
A₁₁, A₁₂, A₁₃, A₁₄ …
followed on the next row by
A₂₁, A₂₂, A₂₃, A₂₄ …
That is the row index is followed by the column index.

A1 notation? The other way round AND it uses letters for sequencing that should be done with numbers. If the columns are not in numerical order then at least I can assume they are in alphabetical order. Oh no, another fail.
AB should come between A and B -- but it doesn't.

Switching letters and numbers can be used as a naïve code. I could have greeted you with 8,5,12,12,15 (HELLO) but what is the point; that would simply be irritating. In a similar way using letters in place of numbers creates trivial but irritating problems.
e.g. Starting with the 21st column, allocate 50 columns for a table.
We all know without thinking that the table starts in column T but where should it end? Column BR of course, how could I have a problem with that!

It is not only as a developer that I found the A1 notation trite. I also took exception to it as a user. Many years ago, I used to assess research proposals submitted to a now defunct department (DTI) of the UK government. As an assessor I had to score each proposal against 10 questions to determine how well the bid met the criteria. A warning drawn from and incomplete form read
Current Scoresheet Status: Incomplete
The following cells require input to I38 I43 E45
complete the scoresheet:

Why on Earth should I be expected to look up cell references on a hidden grid?
All the questions were numbered. Wouldn't the following communicate the requirement better?

Current Scoresheet Status: Incomplete
Scores are needed for Questions: 4 and 5.

Justification is required for your assessment of Question: 5.

Why not simply use the question numbers? Then there was the 'tasteful' Excel 1997 color palette … but no more of that!

DO NOT TAKE THIS GRIPE SERIOUSLY. IT IS INTENDED AS 'TONGUE IN CHEEK'!

shrivallabha · Jun 20, 2018

Could you describe, if you had the chance to re-structure the spreadsheet grid, how would you want it to be?

Peter Bartholomew · Jun 20, 2018

@shrivallabha

Wow! That's a challenge; no longer working within the constraints of Excel!
What I would ditch is the notation (A1 and R1C1) that provides a pre-defined address space of 17billion cells.

To follow Dan Bricklin's discarded ideas, each time you enter a value or a formula, the next unused name in the sequence 'Cell 1', 'Cell 2' would be created. Using a fill handle to extend the range might cause the default name to morph into 'Row 3'; 'Column 4'; 'Range 5' etc. in much the same way as a Table can be resized / reshaped [Something similar is already used for Shapes].

For small ad-hoc calculations that may not even be saved, this might suffice. For anything more substantive, the developer could be expected to rename the range to provide a meaningful description expressed in terms of the problem domain. In the case of the example you introduced:

If three individual cells were named as principal, nrofyears, rateofinterest then the formula would read quite clearly:
=(principal*nrofyears*rateofinterest)/100

this would start as a single cell formula but extending the named ranges to make them columns 100s of cells long would require the resulting formula interestPaid

to become a similarly dimensioned column range. The precedents to such a formula range are the entire ranges principal, nrofyears rather than single cells. rateofinterest might be a similar sized range or a single cell. There is little point in following precedents because the name already describes the content. Good programming style might suggest that the Name definition includes the following:
1] the location of the named range
2] a full description of the name in terms of the problem domain terminology
3] an explanation of the way it is calculated
4] a description of its intended usage
That begs the question of
"How would one define the location without the A1 notation?"
Typically named ranges would be defined interactively by entering content and adjusting the range with the fill handle. For reporting, the most succinct way might be as a parameter string
Sheet1, (3, 8), 365x1
to be read as
SheetName, (rowNumber, columnNumber), rowCount x columnCount

What I have found using such techniques, where already possible within Excel, is that one's thought process changes. Instead of creating a single formula and replicating it, the objective is to create a formula that relates entire ranges and then the individual numbers are just instances of the overall calculation.

When checking the workbook, following precedents is a moderately unusual occurrence. One is not navigating the precedent tree to find out the significance of a term; rather it represents a positive decision to examine the precedent formula. The logical alternative is to go from source to result and examine the dependent formulas but that is the auditor's choice.

I hope this helps explain the ideas even though you may have thoughts such "What [planet / drug of choice] is he on?" or simply "It will never fly"

.

The A1 notation is an abomination that has no place within serious spreadsheet design -- discuss?

Well-Known Member

Active Member

Excel Ninja

Excel Ninja

Well-Known Member

Well-Known Member

Excel Ninja

Well-Known Member

Well-Known Member

Excel Ninja

Excel Ninja

Well-Known Member

Well-Known Member

Excel Ninja

Well-Known Member

Well-Known Member

Well-Known Member

Excel Ninja

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Excel Ninja

Well-Known Member