• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

"Summary" Report needs array INDEX / MATCH formula - but it's slow.

Mark Gravo

New Member
My manager is looking to create a summary of dates per project using scheduling data output. I'm getting date output from the master schedule in Primavera P6 from the scheduler, so naturally I get multiple records (rows) of data per project (each activity in P6 represents a different date and record). I have a solution (attached) but when I try to filter the Summary by say Program Coordinator, the multiple array formulas take quite a while to process. The P6 schedule is updated and republished fortnightly, and hence this summary would be updated fortnightly.

This morning I've been researching online various Excel help sites in regards to speeding up an array lookup formula. Despite a couple of tweaks to try and reduce the array sizes (now references the correct single columns for lookup and returning data), it still takes a long time to process any changes. The corporate version of Excel is 2013 (15.0.4997.1000) 32 bit as part of MS Office Professional Plus 2013.

I'm thinking the best way is to have a macro that at every fortnightly update creates this summary worksheet as values only (i.e. populates the summary page, rather than the summary page being a lookup), unless there's something I'm missing. Of course, that's a whole other set of problems for me as I'm a real novice at VBA!

Note I'm currently part-way into development, there's another database I need to interrogate for some of the missing summary info which I was going to add in as a new data source worksheet.
 

Attachments

  • PTS Summary (redacted sample).xlsx
    154.7 KB · Views: 11
There are things that concern me about the workbook.

The first is the number of times you search for the Project ID using the Project Number on the Summary sheet. Since you always get the same record, I would make sense to capture the result in a helper column.

In lookups within columns further to the right the problem is compounded by having exact matches rather than approximate. I have a personal distaste for traditional spreadsheet strategies such as whole-column referencing. Fortunately, Excel usually does what you mean, rather than what you write, and ignores anything outside the Used Range. That might not be the case when you use array formulas and match the product against 1 as TRUE. Reworking the formulae to lookup, then test the result rather than searching for a desired result that may not exist should save time.

As for VBA, there are places where a more procedural coding approach pays off but I do not think this is likely to be such a case. I believe worksheet formulas are considerably faster than VBA where they apply.

I suspect that order of magnitude performance improvement are possible with the worksheet formulas you presently have but I would also consider Power Query ahead of launching into VBA.
 

Attachments

  • PTS Summary (redacted sample) (PB).xlsx
    163.9 KB · Views: 3
Another approach, a pivot table. No formulae at all.
In the attached I've put a table which gives more or less the same values as your table. Many headers can be changed to match your headers.

I haven't done it, but an almost exact version of your table could be put together with GETPIVOTDATA formulae, and that should still be faster and easier to maintain than longish sheet formulae.
 

Attachments

  • Chandoo39524of PTS Summary (redacted sample).xlsx
    162.2 KB · Views: 2
Hi Mark
The attached contains the following changes/recommendations:
  1. Helper fields have been introduced to hold index numbers for the first and final records of each project (sorting Data by project ID is critical)
  2. This allows one to build a dynamic relative range 'project' comprising all the records associated with the given 'PROJECT Number'. The formula is shown in Cell A14 but the range itself is selected starting in AJ3 (this is for demo purposes only and is not used)
  3. Columns K and J show an original formula and a new formula to replace it. Instead of searching the entire data table against the product of two criteria (Project ID and Reporting) a much smaller range is searched by intersecting the ranges 'project' and 'reporting.
  4. If the string "2.BH" is found, [@FirstMatch] - 1 is added to the index returned to make it relative to the entire table rather than the rows associated with the project.
=IF(ISERROR(
MATCH("2.BH",( project Data[Reporting] ),0)),

"No 2.BH date",
INDEX(Data[Finish],
MATCH("2.BH",( project Data[Reporting] ),0) + [@FirstMatch] - 1))


In the past, I have found the formula to be faster to extract data for a specific project than a pivot table but Power Query or Pivot Tables should will hands down when it comes to a full analysis.
 

Attachments

  • PTS Summary (redacted sample) (PB).xlsx
    166.5 KB · Views: 4
I'm thinking you should be able to query Primavera directly and use it as a pivot data source without importing the source data en masse.
You can certainly make an ODBC connection to it:
http://blog.msllc.biz/2015/03/30/p6-sdk-not-what-you-think/
http://blog.msllc.biz/2015/03/30/p6-sdk-not-what-you-think/
also:
https://datamonkeysite.com/2017/03/11/using-powerquery-to-read-an-xer-file/
and:
https://www.planacademy.com/p6-project-dashboard-in-excel/

I'm pretty new to the organisation and not sure I'd get the resources to do this just yet. At a previous organisation I pretty much was able to go from an Excel snapshot report (data extracted from a Notes DB and output to a macro enabled Excel workbook), to a live dashboard which was in Tableau, so I get what you're driving at here.
 
Another approach, a pivot table. No formulae at all.
In the attached I've put a table which gives more or less the same values as your table. Many headers can be changed to match your headers.

I haven't done it, but an almost exact version of your table could be put together with GETPIVOTDATA formulae, and that should still be faster and easier to maintain than longish sheet formulae.

The scheduler suggested a pivot table and I did play around with it, but I didn't quite like the layout. :)

I've not tried GETPIVOTDATA yet and might look at this approach at some stage.
 
Hi Mark
The attached contains the following changes/recommendations:
  1. ... [
Hi Peter,

Thanks for this reply - I am going to work through the suggestions. Some clarifications though:
  1. Helper fields have been introduced to hold index numbers for the first and final records of each project (sorting Data by project ID is critical)
Just to clarify, all source data should be sorted by Project ID?
In the Name Manager I see in your spreadsheet that you've defined the source data as a table named "Data" and the summarised as a table named "Summary" - correct? This is what allows the formula in the dynamic range called "project"?
This allows one to build a dynamic relative range 'project' comprising all the records associated with the given 'PROJECT Number'. The formula is shown in Cell A14 but the range itself is selected starting in AJ3 (this is for demo purposes only and is not used)
The "table" over AJ1:AY11 is not needed for the function of the alternate formula?

I do like this approach - add in a couple of helper columns to allow creation of a dynamic name range which reduces the size of the array being searched.

While others think I'm pretty advanced with Excel, there's a ton of stuff I don't know or use. I'll let you know how it goes!
 
@Mark Gravo

I am please that you have got something of value from the exchange. Don't forget to document the range intersection bit. Whilst it is not an Excel secret, neither is it exactly mainstream. I can confirm your interpretations of my statements. It is not absolutely essential to use named ranges (or to generate them as structured references using Tables); you could use direct cell references such as $A$2:$P$21, if you consider that to be an improvement!

If you take the black and grey table to the right and drag it down a row and a time you will see that the entire table is evaluated relative to data on the first row. Once you have played with it, just delete it.

Whilst you are implementing changes, it may be that IFERROR( X, Y ) could be used to simplify the syntax of your IF( ISERROR(X), Y, X) formulae that requires two passes through a non-erroring formula.

= IFERROR(
INDEX(Data[Finish],
MATCH("2.BH",( project Data[Reporting] ), 0) + [@FirstMatch] - 1)
,
"No 2.BH date")

If you try to implement the pivot table solutions with GETPIVOTDATA(), don't forget to replace the hard-wired fieldnames by references to the relevant list of values. Otherwise it will feel like a life-sentence of hard labour.
 

Attachments

  • PTS Summary (redacted sample) (PB).xlsx
    165.8 KB · Views: 15
Back
Top