The A1 notation is an abomination that has no place within serious spreadsheet design -- discuss?

NARAYANK991 · Jun 21, 2018

Hi Peter ,

When Linux was invented , it was intended not to be like Windows.

I wonder why all the other packages that compete with Excel have consciously imitated Excel.

I came across these other packages which seem to have implemented some of your ideas ; have you tried them ? I wonder why they failed to gain traction.

https://en.wikipedia.org/wiki/Javelin_Software

https://en.wikipedia.org/wiki/Lotus_Improv

https://en.wikipedia.org/wiki/Spreadsheet_2000

Narayan

GraH - Guido · Jun 21, 2018

Domination of MS products might be an explanation and it is used in many offices. It is somewhat in human nature to get hooked and stay loyal to a brand or product (even when knowing better).

As per discussion... - it is an impressive thread you've started @Peter Bartholomew - On the provocative side, let me say that I like A1-style and that I liked it as a user from the beginning. Note I'm not a programmer nor an engineer and I have no IT background whatsoever. Customer Simplicity is more likely to be my domain of expertise or at least comfort.

Somehow those "A1" cell addresses made logical sense too me when I started using Excel in the Y2K year - remember that madness? Perhaps it reminded me of the battleship game grid. "Cruiser down!"
Talking about a grid, the spreadsheet, took me back to primary school where I made my first calculation on "squared paper" (I hope that is a right word). It all looked rather familiar, though I was clueless back then on both the power and consequences of cell referencing. But it made me want to use Excel.
And perhaps early Lotus developers were seeking such a "user response" with there R1C1 reference style. Looking back from about 40 years later, it is easy to say they made a conceptual mistake. I'm wondering, and doubting, if your insights were already that common back then.

Sure I've evolved from those early days and I gradually started to use named ranged, structured table references and even DAX and M. I like them all, still I find myself reverting back to A1 quite a lot. For reasons already explained by others - very impressive replies btw folks, sometimes beyond my understanding. I apologize if you find me repeating things you've already pointed out.

For so far I'm able to understand your position on ditching A1-style, and I do follow some of your arguments, I can't help but wondering if Excel developers aren't doing that as well. Look at DAX. It as structured as it can get. Yet many run rapidly into trouble because of lacking insight on how the reference actually works. Though it looks easy enough. EARLIER(), EARLIEST()? And I'm not going into the details of row, query, filter and even shadow context. The latter is something new the Italians came up with to explain a behavior of DAX they could not do before. Hmm,... Even the highest of experts get lost in it. Perhaps it is a good idea to stick to A1 for a little while longer for normal users like myself.
To be fair: named ranged are just a layer above A1-references. They are great for array formulas, as you have already proven in other threads.

Also in M there are new kinds of references that can play tricks with your head. Tables, lists, records, values... All come with a specific notation that open a vast variety of possibilities I often wish I had them earlier at my disposal in native Excel. But if I'm honest to myself, if some-one would have presented Power Query to my younger me, I would have ran away very fast.

I recently had a discussion with an experience database guy at work - my team lead - for a project in which table design matters. He explained me a concept of using variable columns over in Oracle. Briefly explained you would have a column 1 that contains a context/label and a second column that contains a value for the given context/label. The data type here is driven by the context and not the column definition. This table design is not the commonly known Row By Column stored records.

The point I want to take home, is that I see value and pitfalls in all reference style possible in native Excel and even in the added "power" features. It is perhaps a challenge in finding the best practice, the most efficient and effective for the specific use cases at hand. Or perhaps settling for the one you know and can handle. I don't think there is any reference that would completely prevent design and thus calculation errors. I believe I'm following @Lori in #22. But we also need guides that show better alternatives and open our perspectives. Please continue to poke.

shrivallabha · Jun 21, 2018

There are around 750 million users of Excel per this blog post:
https://irishtechnews.ie/seven-reasons-why-excel-is-still-used-by-half-a-billion-people-worldwide/

What will be the %age of users who will use formula features? I feel it will be a figure which is much lower. Within formula users, advance level users will be less. So there will be many people (including me before reading this post) who won't even bother to think that the A1 convention shall/can be something else.

So what are the chances Microsoft listen and take notice? Have you posted it on MS Excel Developer forum as well? It will be interesting to see what the MSFT team has to say about this.

Like @GraH - Guido has mentioned, RDB will have flexible grid as well as cross-linkages and custom views to see the information and user input can be handled with tighter control through GUI. SQL will come in picture and there won't be formula flexibility.

I think one should select the package which suits the most to one's requirements and then ignore the flaws /drawbacks as trade off.

GraH - Guido · Jun 21, 2018

shrivallabha, I can't recall the exact reference, but I've read somewhere that only about 3% of Excel users also "design" Excel solutions. So a bit more would be able to write formulae. But far less would be able to write advanced stuff and even lesser few would "evaluate" the A1-style. It seems we have a white raven in our mids.

Peter Bartholomew · Jun 21, 2018

@shrivallabha

True, the majority of spreadsheets do not contain formulas; they are more likely to be basic lists. Rather worryingly, auditing has suggested that, of those that contain formulas, more than 90% contain significant errors. Spreadsheets are known for being a tool that empowers end user computing, so it is hardly surprising that the techniques that are employed are primitive.

When I presented some of these ideas at EuSpRIG one response was along the lines of: "The ICAEW considers that only about 1% of users are capable of authoring workbooks; only about 1 in a thousand would be considered to be developers. You are therefore addressing a minute proportion of spreadsheet users. Even if your methods substantially reduced error when applied to creating solution, they would have a negligible effect upon the number of decisions made on the basis of flawed spreadsheets". I do see the logic in that.

I evaluated the methods I used largely as result of a disagreement with the FAST financial modelling standard. When I saw such statements as

Do not use Names except … They are only good for small problems …
Do not use Array formulas …

and found myself thinking

Always use Names … they enable solutions to scale without increasing complexity
Always think in terms of array formulas … only decompose the problem to components if the problem dimensionality is too high or you are in a desperate battle to optimise for speed (much the same justification for using machine code)

Oddly, we both believed 'simplicity' to be important …

Logit · Jun 21, 2018

This conversation reminded me of my oldest brother, decades ago when he was a junior in High School. He was caught talking in class. The teacher gave him an essay assignment as discipline, to be presented the next day in class. "Write a 3,500 word essay on the Ping Pong ball".

He completed the task with lots of research the night before.

Even with all the research, writing, editing, good ole sweat and class discussion ... in the end ... it was still a ping pong ball.

Peter Bartholomew · Jun 21, 2018

Simplicity

To me, simplicity is to have one theory that allows you to predict the outturn of many experiments. It involves abstraction and may be difficult to master.

To the creators of the FAST Standard 'simplicity' meant something that anyone could understand. They preferred long lists of primitive formulas with references repeated so that one could check the spreadsheet with a pocket calculator (and infinite time?).

Perhaps the conflict between thinkers (System 2 reasoning) and doers (System 1 gut-feel beliefs) is at the heart of the problem.

Ref: Kahneman, D. (2011) Thinking, Fast and Slow, New York: Farrar, Strauss and Giroux.

Peter Bartholomew · Jun 21, 2018

@Logit Have you ever played 'ping-pong' against someone who knows how to play? I swear the ball has a mind of its own! One touch and it's gone - anywhere but where you hoped to see it!

Peter Bartholomew · Jun 21, 2018

@NARAYANK
I have come across only one company that traps all direct references as an error under their quality procedures and a Hong Kong based trader that models entirely with row and column intersection using named ranges.

I was beginning to feel something of an oddity (white raven

)! The systems you draw attention to and their successors at least suggest that others do not regard the spreadsheet paradigm as offering perfection).

Why do they not cut it commercially? I guess I haven't had the need to buy one; and would my clients have the requisite licences anyway? When I hear the sales pitch ridiculing Excel though, I do sometimes think 'I understand what you are saying but I can already do that in Excel. It is there if you persevere and know where to look!'

The other thing I take encouragement from, despite being a complete novice with Power Query and Power Pivot

, is that if the newer functionality does not rely upon the ability to input formulas record by record, it is likely that it was never needed in Excel

.

shrivallabha · Jun 21, 2018

Logit said:
This conversation reminded me of my oldest brother, decades ago when he was a junior in High School. He was caught talking in class. The teacher gave him an essay assignment as discipline, to be presented the next day in class. "Write a 3,500 word essay on the Ping Pong ball".

He completed the task with lots of research the night before.

Even with all the research, writing, editing, good ole sweat and class discussion ... in the end ... it was still a ping pong ball.

It took man pretty long to understand the importance of round objects!

shrivallabha · Jun 21, 2018

@Peter Bartholomew
I see your point. However, my opinion is: it is bad design and not the notation which should be held as culprit. All applications, however smart and rule based, follow GIGO rule.
Garbage In = Garbage Out
Using named ranges can bring in error as well.

I remember this being discussed at length in “Professional Excel Development” where tight and robust design is discussed. The term if I remember correctly they used was dictator application. In VBA context the same is discussed in “VBA Developer’s Handbook” where they use term bulletproof coding. Obviously, even the best won’t be able to stop from error creep.

My manager uses a phrase “fit for purpose”. I agree to it. I will accept a result within tolerance if perfection is coming at a hefty price.

With this, I rest my case.

Peter Bartholomew · Jun 21, 2018

Hello shrivallabha

I agree with the garbage in / garbage out observation. No matter how robust the software, plausible but wrong data will give garbage. I remember being told of a research engineer that was tasked with investigating the potential for an improved material spec to improve their product. The task went fine but the engineer forgot to reset the reference value on the corporate database when he finished

!

What I gain from an array formula is that I can guarantee that the formula is absolutely consistent over all time periods (none of Hui's little green triangles, though it comes at a price) and from names I stand a chance of detecting a planning error in which domain knowledge is not correctly represented. For example, I believe 'rate' may be given by either
= APR / 12
or
= 100 * ( 1 + APR/100 )^(1/12) - 100
according to geographical location.

The use of a standard designation such as APR should make the intention clearer to the auditor than
= $G$4 / 12
which, I would contend, is more likely to slip through unnoticed.

There are no miracles on offer though. I seem to remember reading from an EuSpRIG paper that moving from spaghetti code to structured code roughly halves the error rate. Since I see no reason to believe that spreadsheets will fare any better (they start at a lower base but the genre doesn't lend itself to being completely structured) that still leaves plenty of opportunity for error.

To be honest, the spreadsheets I have seen are nowhere near perfection and I would suggest underinvestment is rife

.

Peter Bartholomew · Jun 22, 2018

it is bad design and not the notation which should be held as culprit

I would respectively suggest that some of the blame must fall upon the notation.
Unless you use multi-cell array formulas, the solution is predicated on the idea of a formula within a single cell that references a number of other cells using relative references. That is, all formulas are many (inputs) to one (output value).

A frequent consequence of this is not 'bad design', it is a total failure to provide any design. It is the computational equivalent of a brick-layer that starts at one corner of the proposed building, sets a brick, and then replicates the process until he chooses to stop. As long as the bricks are going down OK, the design can look after itself.

What is missing in each case are the more abstract concepts which allow the solution architect to plan the overall design - concepts like rooms, walls, doorways. In the IT context, maybe "a function of time called xxxx, accumulated monthly over a period of 5 years".

I believe that is a stronger starting point than "the cell $BQ$246 that contains a value £25,334.90 and, if you want to know what it is, scroll back to cell $B$246 where you might find some relevant annotation that will help"

Chihiro · Jun 22, 2018

...more than 90% contain significant errors.

I wonder if similar study was done on databases and/or applications that has db as back-end.

I have done quite a few audits on these systems, and chance of finding significant error in these system are still high (around 80% of system had major error, either in code/stored proc, or in data itself).

No matter what tool/system is used, without regular manual audit process, there is going to be errors that creeps into system. A good system should always have at least 2 different way of arriving at the same result, and giving warning when there is mismatch.

Peter Bartholomew · Jun 22, 2018

Chihiro

Ray Panko was the source of much of my background understanding of errors in spreadsheets. I have forgotten the extent to which the spreadsheets were data analysis or model building.

Panko, R. (2014a) Audits of Operational Spreadsheets, http://panko.com/ssr/Audits.html.

Panko, R. (2014b) Human Error in Simple but Nontrivial Cognitive Actions, http://panko.com/HumanErr/SimpleNontrivial.html

Where I have mentioned Qualitative Errors my intention was to remain consistent with Ray's taxonomy

Panko, R. (2015). What We Don’t Know About Spreadsheet Errors Today: The Facts, Why We Don’t Believe Them, and What We Need to Do, EuSpRIG Conference Proceedings, https://arxiv.org/ftp/arxiv/papers/1602/1602.02601.pdf

Peter Bartholomew · Jun 22, 2018

GraH - Guido

Concept of using variable columns over in Oracle

What I have seen of relational databases is that they work well when the data can be constrained to conform to the schema. It gets to be more of a struggle when the data to be captured is not predicable. At worst, with PLM systems, I have been told of systems in which the response time for adding a new data type can exceed a year; meanwhile from the user perspective it's more a case of "Take your time, as long as it's ready for me to use by tomorrow!"

If flexibility is the main driver, switching to an RDF triple-store could provide an answer. Each statement is simply a triple

Subject (URI as primary key)
Predicate (attribute type / field name )
Object (URI or attribute value)

where I have used the parentheses to suggest the relational equivalent. An instance of an unknown attribute type can be stored simply by adding the new predicate on the fly. If the user can classify the instance of a new attribute then this is declared with a further statement

Subject (attribute URI)
Predicate: Type
Object (attribute class)

As you suggest, there probably is no universal panacea but having the right data model for the task in hand is surely a great step forward.

https://www.w3.org/2001/sw/wiki/RDF

Craig Hatmaker · Jun 25, 2018

I'm a fan of Peter's work. So much so that I dropped cell references completely around 2010. I have rewritten several ModelOff challenges (I don't compete) without cell references so I know it is possible to do Financial Modeling and many other types of spreadsheets without them.

Why?

Names and structured references (SRs) make formulas self documenting. I write solutions for other people. In that context documenting formulas is a best practice and in some environments, a requirement.
Excel's auto-complete of names and SRs speeds development and prevents errors. Unlike cell references, if I fat-finger formulas with names Excel bitterly complains so my chances of hidden errors due to bad references is greatly reduced.
Names and SRs are dynamic. I can expand tables and SRs automatically expand. I can rename columns and my formulas automatically change to the new name. I can move things around, even between worksheets, and my formulas don't break.

For my work, I prefer Names and SRs over cell addresses.

Peter Bartholomew · Jul 2, 2018

I'm a fan of Peter's work

Those of you who do not know @Craig Hatmaker may not realise what a compliment that is. Craig is a leader when it comes to Excel, not a follower. To form your own opinion take a look at his blog

https://sites.google.com/site/beyondexcel/home

I could spend months there! It is what happens when an IT professional meets and adopts Excel. A fairly recent and, to me, startling example is Craig's MAKE utility but his thoughts on 'Excel Model Transparency' are also interesting.

When we discussed the idea of using Names (often combined with multi-cell array formulas) quite a number of years ago, Craig's immediate reaction was 'That is important' and proceeded by articulating the reasons better than I ever had. What one sees in Craig's work is mainly structured references but named ranges and named formulas are also exploited where relevant.

I had the pleasure of meeting and spending time with Craig last year at the EuSpRIG meeting in London UK. This year's meeting is on Thursday (July 5).
Anyone going?

http://www.eusprig.org/Eusprig-2018-programme.pdf

p.s. Good thing this is the lounge; I would probably get chucked out of any other part for 'deviation from the topic under discussion'!

chirayu · Jul 10, 2018

My personal viewpoint. Names and Arrays are good but it honestly depends on two things
1) How quickly a solution needs to be developed
2) Whether the user is capable of maintaining such a file
I know for a fact that in my regional office. I'm pretty much the only guy who understands em so there is no point in me bothering with the setup for it if the end user is unable to maintain the file

Chihiro · Jul 10, 2018

Whether the user is capable of maintaining such a file

That's why I'd typically use dynamic named ranges using Ref:INDEX(). Along with business rule sheet and training session.

Usually no maintenance is needed after initial set up, with dynamic named ranges.

In addition, I have in the past used VBA & Userform to ensure data entered conforms to rules. Though of late, I've pretty much moved all data entry into web based form, using database as back-end.

In any sort of data entry/dashboard workbook, it's my belief that business rule sheet is a must. Outlining basic set of rules to follow, along with assumptions used, parameters defined, threshold values set etc.

Peter Bartholomew · Jul 12, 2018

Whether the user is capable of maintaining such a file

I guess I have never fully bought into the end user computing paradigm. I have tended to see the developer and user roles as very different, even in the circumstance where the developer is creating an Excel-based application for their own use.

I do not expect an end user to know anything about the grid, cell references or the ribbon, though, if they do, that is fine. My expectation of the user is that they should have good domain knowledge and be able to lay out the requirements and a few test cases, with clarity. Having the user dabble with formula is about as much use as having the diners wandering into the kitchen to season the food; the chances of success are greater if one leaves the task to the chef.

Of course, I do agree that some support arrangements need to be in place. For me, it is a mark of success when I am called in less as the in-house support builds in competence and confidence.

Dynamic named ranges

These days I tend to use a Table as the basis of dynamic named ranges so that the resize handle can be used even though the headings, filters, stripes and other decorative features are all hidden.

bobhc · Jul 20, 2018

Excel has more ways to skin a cat than needed, but the choices are there for the users, why restrict the options the only thing that should be consigned to the waste bin is the merge option.

Peter Bartholomew · Jul 23, 2018

Why restrict the options …

I believe that the options selected should be fit for purpose. The spreadsheet defaults are hugely biased towards ad-hoc computing, the emphasis is upon speed, with a plethora of 'tips and tricks' which provide shortcuts.

Where I begin to have problems is when I see the same techniques used for the development of corporate applications with lifetimes of 5 years or more and distributed to multiple users.

In software development, one sees a progression from the monolithic spaghetti code, developed naturally by a novice programmer, towards the far more constrained and structured approaches used by professional programmers. I do not see a similar change of strategy when one looks at spreadsheets developed by novices and those of experts. In fact the FAST financial modelling 'standard' deliberately restricts the techniques it advocates to the most primitive so that a basic user could understand each formula and check the calculation with a pocket calculator.

The A1 notation is an abomination that has no place within serious spreadsheet design -- discuss?

Excel Ninja

Well-Known Member

Excel Ninja

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Excel Ninja

Excel Ninja

Well-Known Member

Well-Known Member

Excel Ninja

Well-Known Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

Excel Ninja

Well-Known Member

Excel Ninja

Well-Known Member