• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Excel function reference

Lori

Active Member
The first sheet of the FuncList workbook attachment contains a formatted list of all functions and arguments and was created by copying the last two columns from the List sheet as text and pasting into the table.

Array types are shown in bold;
Reference types are shown in italics
Value types are underlined (except for plain values)

Based on the combinations of input and output types in the table I have found this to provide a useful aid in understanding how formulas are evaluated so I thought I would share it here.

The other FuncTypes workbook is macro-enabled and provides some further details and methodology of how this data was compiled.
 

Attachments

  • FuncList.xlsx
    226.2 KB · Views: 87
  • FuncTypes.xlsm
    704.8 KB · Views: 78
Ah, thanks for that clarification.

i tried to make the list as comprehensive as possible - 600 functions in all (including little known ones such as NUMBERSTRING and DATESTRING used in far east versions.)

A slicer was added to show only the more common ones but i had forgotten this is only compatible with later versions.
 
Thank you for this. It will save much trial and error, not to mention frustration when a function that appears relevant simply returns a value error.

Something that makes a function appear very different is when an array is used as a criterion turning a single cell aggregation into an array calculation. For example COUNTIFS accepts array criteria as in
= COUNTIFS(lastName, lastName, firstName, "<="&firstName)
to order family members by first name.
 
@Peter Bartholomew -
Indeed array processing can be applied generally to such value arguments - whether that's a good thing or not is open to debate. I noticed on the EuSprig website an article by an author with the same name deprecating the use of such array formulas in practice.

Your COUNTIFS example is an interesting case in point. The criteria arguments are fundamentally value types, as shown in the function wizard. Entering that formula in a column adjacent to the data returns the same values with or without ctrl+shift+enter. The OpenFormula standard describes the two kinds of evaluation as "implicit intersection" and "implicit iteration" respectively.
 
I am happy with implicit intersection provided it references data within the same data structure (equivalent to the @ operator in a list object). Once it goes beyond that and requires a particular alignment of distinct data objects on their respective worksheets I object to the concept.

I have not come across the term implicit iteration until now but I take it that it covers the normal rules of array arithmetic. I believe that multi-cell formulas should be applied to a range (so making implicit iteration possible?) and not created by replicating single cell formulas. These ideas led me to using array formulas for simple formulas and only resorting to the complexity of single cell formulas for extremely irregular problems in which little structure is evident.

As you point out, committing the formula with CSE is unnecessary in my example above, at least until you remove the helper range from the worksheet. Then the named formula 'rank' could refer to

= 1 + COUNTIFS(lastName, "<"&lastName)
+ COUNTIFS(lastName, lastName, firstName, "<"&firstName)

and a pointer to the data in sort order would be returned by searching the array

= MATCH( k, rank, 0 )

The worksheet would show the sorted list as

= INDEX( fullName, pointer, {1,2} )

It is the array form of the formula that permits the use of named formulas as intermediate variables.

Notes: 1) I realise an event-driven Range.Sort method would be far more efficient for long lists.
2) I did present a EuSpRIG 2016.
3) I have made many assertions which go beyond the scope of the original discussion; please feel free to disagree.
 
I'd agree that when a function has named formulas as arguments it is often preferable to enter over a range of cells with CSE than to enter individual cell formulas. This can cut down on time consuming recalculations of names like in your definition of pointer.

Open document format specifies that:
• When evaluating a formula in 'matrix' mode, and a non-scalar value is passed to a function that expects a scalar, the function is evaluated multiple times, iterating over the non-scalar input(s)
• Functions returning arrays are not eligible for implicit iteration

From the function listing, examples of functions returning arrays are INDEX, HLOOKUP, VLOOKUP, CELL, HYPERLINK, and FILTERXML. Cell formulas containing any of these functions need extra functions inserted to be evaluated in 'matrix' mode. For example the following formula would allow for a sorted array to be returned that can also be passed to another function:

=INDEX(fullName, N(pointer), N(INDEX({1,2},)))

The function listing shows reference types in both of these functions. The INDEX result together with the value argument of the N function force an array result, as shown in the Evaluate Formula tool. UDFs and names are also variable reference types (i.e. pointer ) that can be coerced to arrays via the N function.
 
Lori

Thank you so much for taking the time to educate me.
Your first paragraph captures some conclusions I had come to but I have never seen the use of named formulas presented (other than in the context of dynamic ranges), much less evaluated from an efficiency viewpoint.

Coercion is very much a 'black art' to me so seeing a coherent explanation is something of a revelation. Have you published similar material to which you could direct me? I implemented your suggested formula and variants in which only one index was coerced and I am happy to say that I obtained the relevant row sums and column sums. I had rather thought that once I had used INDEX, I would need to write the formula to a range before I could aggregate it further.

Peter
 
@Lori

As a digression from a digression, I have recently been trialling some solutions in which I deliberately introduced the INDEX function to break a 2D array formula down into a sequence of 1D formulas.

For example, the use of INDEX to return a list of countries means that the array matches is only treated as a sequence of row arrays by the SMALL formula ('k' is just a counter, as is 'p').

upload_2017-11-11_0-3-28.png

The names of athletes representing each country are then returned by
= IFERROR( INDEX( athlete.name, athlete.index ), "" )

In a different workbook I introduced a somewhat odd-looking named formula k-prime
upload_2017-11-11_0-12-44.png
which allowed me to build and search arrays =COMBIN( p, k′ ).

There seems to be so much unused functionality in Excel, in particular. I makes me wonder where it came from and why.
 
Your digressions sound interesting - perhaps you could attach a demo file?

In a simplified set up containing first and last names in A1:B5, this multi-cell version:

=IFERROR(IF(1,INDEX(A1:B5,{1;3;4},)),"")

can be entered into D1:E5 using CSE to return a subset of these names. I'm not quite sure why the extra IF function is needed to avoid errors showing up in D4:E5?

For example to return just unique names from the list, replace {1;3;4} with a dynamic name like 'pointer.unique' defined as:

=MODE.MULT(IFNA(pointer,{"",""}))

Anyway, glad the analysis here is of some use. A few related posts are:
https://chandoo.org/forum/threads/something-informative.25700/#post-155252
https://chandoo.org/forum/threads/if-within-sumproduct.35744/#post-214351
https://excelxor.com/2014/11/04/coercing-array-returns-from-cse-resistant-formulas/#comment-248


Since dedicated formula sites like Excelxor and Excelhero are now inactive, i guess it's up to the rest of us to keep this line of investigation alive...
 
@Lori
Thank you for the links to related posts. I was familiar with the ExcelXOR article but not the ones on this site.

I have uploaded two copies of the workbook in which I constrained the SMALL function to operate over the rows of a two dimensional array built within a named formula. The workbook ...build... contains the solution in its development state with the arrays calculated (or simply replicated) using worksheet ranges whilst ...final... shows it 'packaged' using a hierarchy of named formulas.

There is an underlying agenda in my work of getting rid of direct cell referencing and replicated single-cell formulas (fills), which I regard as product of end user computing and the cause of the 'ultra-low fidelity' solutions that characterise 'normal' spreadsheet development. These are contentious points which might be better left to one side but I though some explanation was due.
 

Attachments

  • Sorted List IAAF 2017 final Article 2.xlsx
    32.1 KB · Views: 18
  • Sorted List IAAF 2017 build Article 2.xlsx
    44.3 KB · Views: 15
The second 'digression' is a formula that interpolates missing data and forecasts beyond the last 'actual', both using exponential growth, to produce an extrapolated chart. This is one of the solutions that contains the slightly strange formula
Name Refers to
pᵢ = INDEX(p, p)
This allows me to use terms such as
= MAX( IF((p<=pᵢ)*v?, p) )
which tests previous periods to see whether an actual value is present and returns the latest period for which that is true.
 

Attachments

  • chart - interpolation - period based - documented.xlsx
    27.2 KB · Views: 18
Nice! I like the unicode naming convention and the technique to reference shapes from formulas. Combining INDEX with IFERROR to remove errors in the return array is new to me and could be useful in lots of situations (by inserting an extra IF function if necessary as mentioned previoisly).

From an auditing perspective, while the function wizard may be unavailable for named formulas, the ability to select a formula cell and step into the formula names with the Evaluate formula tool can be very useful as is the F9 key. To ensure arguments are correctly evaluated using these formula tools a few minor alterations to name definitions can be made - one way is to modify INDEX arguments to use ROW() or COLUMN() instead of uncoerced arrays.

In any case, a well structured and thought provoking set of sample files. Thanks for sharing, i hope others also benefit from this work.
 
Playing around with these ideas further, another option is to define 'p' as,

=ROW()-MIN(ROW())+1

This returns a reference-free 1-based when array-entered in a range. Then array-enter:

=IFERROR(INDEX(...,p),"")

where ... is any formula that returns an array (which can be much more efficient than filling down individual formulas.)
 
Lori

I am happy that you found some things of interest in the workbooks. Very often they just create shock and bewilderment! Mind you, I have only just sorted out the significance of your
=MODE.MULT(IFNA(pointer,{"",""}))
A distinctly devious piece of lateral thinking to achieve its objective!

Having decided that I was not prepared to use reference by location, only by name, I often needed row or column indices and have fallen into habits, depending whether I wish to index a range or simply create integers 1,...,n.
The first I tend to do with
k: = ROW(rangeName) - ROW(INDEX(rangeName,1,1) + 1
[ though I would far prefer that ROW took a 'relative to' parameter so that I could write =ROW(rangeName, rangeName) ]

For indices not related to a specific range, I tend to define the current (relative) row as the 'numberline' and then pick off a suitable range using INDEX
k: = COLUMN( INDEX(numberline, 1) : INDEX(numberline, n) )
If n is a column vector this will produce an index block with a ragged right border.

These are just habits so I will explore what
=ROW()-MIN(ROW())+1
has to offer.

The insight you have provided has been of huge value to me in that my empirical knowledge of array formula is now backed by some understanding of the reasons behind the odder behaviours.
 
Back
Top