• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Display the very next top value of a row

tango

Member
Greetings!

I have a thousands of item that requires me to identify which item is part or belong to what item/level.

I have attached an excel sample where
a) Column B (level), C (Item Number) and D (DESC) are the given data.

b) Column A is the output/result that I would like to have which the results are taken from Column C based on the next top value of Column B.

Hope I explain it well but the sample will show/explain it better ;-). Appreciate if someone can provide a formula to achieve the result as per sample attached.

Thanks in advance.
 

Attachments

  • TEST2-chandoo.xls
    27.5 KB · Views: 33
Can you please walk through each value in Column A and explain the logic why it is there
 
In Cell E2 put following formula (and copy down) and see if it gives desired results with your original data. It seems to work with sample posted.
=IF(B2=1,"",LOOKUP(2,1/ISERROR(1/($B$1:B1-(B2-1))),$C$1:C1))
 
@Hui, column A is manually added, just to get an idea of the objective. It is a result of condition from Column B and C. Thanks.

@ Shrivallabha, Amazing! it works exactly as required... SOLVED ;-) Big thanks Shrivallabha.
 
Just to bump this thread to whoever can provide an alternative formula for

=IF(B2=1,"",LOOKUP(2,1/ISERROR(1/($B$1:B1-(B2-1))),$C$1:C1))

My excel/computer hangs due to a file of mine that contains tens of thousand of rows being processed. Please help.
 
Hi @tango

The problem is not with the formula but with the way in which you have structured your data. The problems are
1. The interpretation of your data depends upon the order in which the records are stored (the BOM has to be traversed depth first to give the correct sequence)
2. To determine the parent of any node required a search over part or all of the data. This gives rise to an n² calculation time.

Instead of calculating the key corresponding to the assembly of which the product is a part, that should be the stored data from which the BOM may be inferred. The level should be the calculated quantity, not the assembly.

With data stored the way I have suggested it is crucial that the records are stored with primary keys are sorted so that bisection search can be used in place of full a search.

Does it matter?
Only if you wish to speed up your calculation by a factor of 1000 or so.
 
I have had some further thoughts on the problem. A solution might lie in the use of a helper field corresponding to each level in the hierarchy. The formula
=IF([@LEVEL]>n,priorEntry,[@[ITEM NUMBER]])
would require only a single pass through the dataset for each level, n. The name 'priorEntry' is a relative reference to the cell immediately above the formula cell.
The final lookup of the parent assembly is of the form
= IF( [LEVEL]>1, INDEX( Table1[@[Column1]:[Column4]], [@LEVEL]-1 ), "" )
 
Just to bump this thread to whoever can provide an alternative formula for

=IF(B2=1,"",LOOKUP(2,1/ISERROR(1/($B$1:B1-(B2-1))),$C$1:C1))

My excel/computer hangs due to a file of mine that contains tens of thousand of rows being processed. Please help.
Or………….

Try this modified shorter formula instead.

=IF(B3=1,"",LOOKUP(1,-FIND(0,$B$1:B2-(B3-1)),C$1:C2))

Regards
Bosco
 
Thanks Peter.
Thanks Bosco.

I recorded a Macro and run it on the original attachment above and it works Bosco.... Would it be possible to have the code below converted in VBA with an objective that every output/result in each cell will simply contains 'text' and not with "code/function '=IF(C3=1,"",LOOKUP(1,-FIND(0,$C$1:C2-(C3-1)),D$1:D2))'?

Also, please consider the 'number of records' which will not be limited to A2:A22 and list would depend on every file that I work on hence should be flexible.

Thanks again and appreciate your reply. thank you.


---------
Sub Macro3()
' Macro3 Macro
'
Columns("A:A").Select
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
Range("A1").Select
ActiveCell.FormulaR1C1 = "Bosco-Code"
Range("A2").Select
ActiveCell.FormulaR1C1 = _
"=IF(RC[2]=1,"""",LOOKUP(1,-FIND(0,R1C3:R[-1]C[2]-(RC[2]-1)),R1C[3]:R[-1]C[3]))"
Range("A2").Select
Selection.AutoFill Destination:=Range("A2:A22")
Range("A2:A22").Select
Range("B23").Select
Application.Left = 97.75
Application.Top = 22
End Sub
 
Hi @tango
The modified macro still writes the formula to the spreadsheet but then replaces the calculation by the resulting values.

Code:
Option Explicit
Sub Macro3()
Dim rng As Range
' Macro3 Macro
    Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
    Range("A1").Value = "Bosco-Code"
    Set rng = Range("A1").CurrentRegion
    Set rng = rng.Offset(1).Resize(rng.Rows.Count - 1, 1)
    With rng
        .FormulaR1C1 = _
        "=IF(RC[2]=1,"""",LOOKUP(1,-FIND(0,R1C3:R[-1]C[2]-(RC[2]-1)),R1C[3]:R[-1]C[3]))"
        .Value = .Value
    End With
End Sub
 
Its me again.... I have asked a friend to modified Peter's code but one way or the other, the process still takes so much time (131711 rows) and need to wait for so looong to get the results everytime I run the code, and most of the time, my computer hangs.

Is there an altrnative way/code to have a quick result please. Appreciate Peter or someone could improve it please. Thanks.

----------
Dim rng As Range
Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
Range("A3").Value = "NAME"
Set rng = Range("A3").CurrentRegion
Set rng = rng.Offset(3).Resize(rng.Rows.Count - 1, 1)
Dim i As Long
For i = rng.Row To rng.Row + rng.Rows.Count
Cells(i, 1).FormulaR1C1 = _
"=IF(RC[1]=1,"""",LOOKUP(1,-FIND(0,R1C2:R[-1]C[1]-(RC[1]-1)),R1C[2]:RC[2]))"
Cells(i, 1).Value = Cells(i, 1).Value

' With rng
' .FormulaR1C1 = _
' "=IF(RC[1]=1,"""",LOOKUP(1,-FIND(0,R1C2:R[-1]C[1]-(RC[1]-1)),R1C[2]:RC[2]))"
' .Value = .Value
' End With

End Sub

---
Im using Excel 2013, i7-6700 CPU, 3.4Ghz, 8GB ram, windows 10-64bit
 
Nearly a month late, but this should be a somewhat faster. It runs on the Active sheet and overwrites column A. It was designed to work on data as arranged in your attachment in msg#1:
Code:
Sub blah()
LR = Cells(Rows.Count, "B").End(xlUp).Row
vals = Range("B1:C" & LR).Value
myResults = vals
myResults(1, 1) = "Parent"
For rw = 2 To UBound(vals)
  If vals(rw, 1) = 1 Then
    myResults(rw, 1) = Empty
  Else
    myLevel = vals(rw, 1) - 1
    For upRw = rw - 1 To 2 Step -1
      If vals(upRw, 1) = myLevel Then
        myResults(rw, 1) = vals(upRw, 2)
        Exit For
      End If
    Next upRw
  End If
Next rw
Range("A1:A" & LR).Value = myResults
End Sub

edit post posting:
the process still takes so much time (131711 rows) and need to wait for so looong to get the results everytime
<snip>
Im using Excel 2013, i7-6700 CPU, 3.4Ghz, 8GB ram, windows 10-64bit
There was me trying to test for 130k+ rows but you only supplied a .xls file with 65k rows!
Anyway, I tried on more than 170k rows and it took half a second. On your more powerful machine it should take less than a quarter of a second.
 
Last edited:
You are the best. Your code rocks and execute 131k rows in a blink of an eye. Big thanks.

Furthermore, willing to wait for a month or so ;-) if somehow the line of codes could be explained for learners information. Thank you.
 
The speed increase is chiefly down to a few main reasons:
1. Only one read from the sheet and only one write to the sheet
2. No repeated referring to the sheet during the macro
3. No formula calculation time because there are no formulae; the formulae added by the other code toward the bottom of your table refer to quite large ranges which often slows things down. All the calculation happens in the macro blah itself and it's not really a calculation, just a comparison for equality amongst a few values for each row.

It's point number 3 that, in this case, is the most time consuming.


The first line:
LR = Cells(Rows.Rount,"B").end(xlup).row
puts a value into LR being the row number of the last cell in column B with any data in.
It's the same as selecting cell B1048576, then pressing End on the keyboard, then pressing the UP cursor key on the keyboard, and reading out what row the selection ends up at. (Lets say 131711, in your case)

The next line:
vals = Range("B1:C" & LR).Value
puts the values found in range B1:C131711 into an array:

upload_2018-8-24_19-30-36.png

I was lazy with the next line, I made a copy of the vals array, so that myResults array would have the correct vertical dimensions. This array will hold the results.
I should have instead written:
ReDim myResults(1 to UBound(vals),1 to 1)
The complete revised code is at the bottom of this message.
myResults(1, 1) = "Parent" just puts a header into the first slot (member) of the myResults array. This will end up in cell A1.

The next section:
For rw = 2 To UBound(vals)
'some code
next rw

will loop, with rw starting at 2 and adding 1 each time until it gets to 131711.
rw will be like a row number of the array, used like this:
vals(rw, 1)
equivalent to vals(row number, column number)

So within this loop we ask whether vals(rw,1) (1 is the column of levels) is equal to 1, if so, because 1 is the highest level, there's no parent to look for, so we makes sure the result is blank (with my lazy assignment of myResults we have to explicitly make it blank in order to overwrite what's already in there, but with my later ReDim, all the members are already blank so there is nothing to do (that's why I ask instead that vals(rw,1) <> 1 to avoid needing an Else part).
So in the case of a level not being 1, we need to search the levels column, one row at a time upwards, looking for the first occurence of a level which is one less that the level at row rw. That's what the next mini-loop does:

myLevel = vals(rw, 1) - 1
is the level less 1 that we're looking for.

For upRw = rw - 1 To 2 Step -1
'some code
Next upRw

We start searching at rw-1, the row above the current row we're examining, and will search right up to the 2nd row of the array if necessary. upRw will decrement by 1 at each loop because of the Step -1.

In the middle of this mini loop is:
If vals(upRw, 1) = myLevel Then
myResults(rw, 1) = vals(upRw, 2)
Exit For
End If


which asks 'does this row contain a level exactly 1 less than the level of the row I'm trying to find the parent of? If so then we've found the parent and we can update the myResults array with its name. Crucially, for speed, if the parent has been found, we abandon this mini-loop with Exit For, which takes the code to the line after Next upRw. That line is End If, and the one after that is Next rw, that is the next row down in the list that needs its parent finding.

Once rw reaches 131711, that loop terminates and we're left with an array in myResults that need placing on the sheet.
With my lazy method, myResults contains 2 columns but we're interested only in the first column, and certainly don't want to paste the second column anywhere at all. When we paste the array, it will only paste to the range we specify, so if we paste to only one column, the second column won't be pasted at all.
With the second method of initializing myResults (ReDim) there's only one column anyway.
All we need to do for this last line is to ensure the numbers of rows pasted is correct. That information is held in LR still, so we use that:
Range("A1:A" & LR).Value = myResults

QED.

Revised code:
Code:
Sub blah2()
LR = Cells(Rows.Count, "B").End(xlUp).Row
vals = Range("B1:C" & LR).Value
ReDim myResults(1 To UBound(vals), 1 To 1)
myResults(1, 1) = "Parent"    ' header
For rw = 2 To UBound(vals)    'starting at the top, run down the rows
  If vals(rw, 1) <> 1 Then    '1=top level already, no higher levels to search for.
    'do a search up from that row:
    myLevel = vals(rw, 1) - 1    'the level you're looking for.
    For upRw = rw - 1 To 2 Step -1    'starting from the row above, look upwards for the first instance of myLevel
      If vals(upRw, 1) = myLevel Then    'if TRUE then it's found
        myResults(rw, 1) = vals(upRw, 2)    'add the contents of the adjacent cell
        Exit For    'no need to keep looking further up, already found a match
      End If
    Next upRw
    'this is where the code jumps to once Exit For is executed.
  End If
Next rw    'next row down.
Range("A1:A" & LR).Value = myResults    ' write the array to the sheet.
End Sub
 
The speed increase is chiefly down to a few main reasons:
1. Only one read from the sheet and only one write to the sheet
2. No repeated referring to the sheet during the macro
3. No formula calculation time because there are no formulae; the formulae added by the other code toward the bottom of your table refer to quite large ranges which often slows things down. All the calculation happens in the macro blah itself and it's not really a calculation, just a comparison for equality amongst a few values for each row.

It's point number 3 that, in this case, is the most time consuming.

@p45cal
I tend to think of your point 2 as the more significant whereas you have identified 3.

I intentionally reduced the interactions with the worksheet to a single insertion of formula, followed by a read/write of values (requiring 3 large data exchanges whereas your code only requires 2). The question is then 'does a worksheet formula evaluate faster or slower than the equivalent in VBA?' I do not know and would be interested in any evidence or even opinion you have to offer.

Note: I was surprised to see in #14 that @tango had changed setting the range's .formula property to a loop applying it cell by cell. There must have been a reason for it but it is a disaster when it comes to optimising for speed.
 
@p45cal
I tend to think of your point 2 as the more significant whereas you have identified 3.

I intentionally reduced the interactions with the worksheet to a single insertion of formula
I did try using the cell by cell approach and you're right, it is a disaster. I was able to interrupt the macro and it was making extremely slow progress. I made the mistake, having interrupted the cell by cell approach at around row 5000 (of a 43,000 row test sheet), of double-clicking on the bottom right little square to copy down the formula - it took half an hour to update and give me back control.
I was using your code from msg#10, so sheet access was kept to a minumum. I concluded that it was the calculation that took the time rather than the writing of the formulae to the sheet - I could be wrong.
I tried this formula at row 43000:
=IF(B43000=1,"",LOOKUP(1,-FIND(0,$B$1:B42999-(B43000-1)),C$1:C42999))
and copied it down 10 cells. Instead of returning with results in the blink of an eye, there was a noticeable delay of a fraction of a second, two or three blinks of an eye you might say, which multiplied up to 131,000 I'm guessing might acount for the long time taken for the whole column.


followed by a read/write of values (requiring 3 large data exchanges whereas your code only requires 2). The question is then 'does a worksheet formula evaluate faster or slower than the equivalent in VBA?' I do not know and would be interested in any evidence or even opinion you have to offer.
I'm struggling to picture what you mean by the equivalent in VBA and whether you'd want it to refer to cells on the sheet or an array of data in memory. Could you expand a little more on this last?
Sometimes, if you can convert a sheet formula to an array formula you can create an array from it entirely in vba ready to write to the sheet en masse, but I'd hate to try and convert these sheet formulae to an array-entered one!
 
Last edited:
The equivalent in VBA

Sorry, you are right. I had simply used the formula as given without evaluating it. The current processing by formula is an n² process in which every cell is evaluated by searching all cells above it, whereas an accumulation-based process would require helper cells but would be computationally efficient.

Array entering isn't an issue per se. (LOOKUP hides an array calculation); it is merely a case of setting the range's .formulaArray property rather than its .formula property but the operation count is hugely different in this case.
 
@p45cal

I had forgotten what this discussion was about. I notice that after moaning a bit in #6 I went on to implement a linear-scaling solution (order n rather than n²) in #7 that used helper ranges like they were going out of fashion (that means freely).

The attached implements a hybrid strategy. That is it creates and populates a helper range with the necessary formula as well as writing a formula to the result column to read results back from the scratch calculation.

The macro finishes by hard wiring the results and clearing the intermediate calculations from the workbook. This is certainly a strategy that works but how it performs relative to a straight VBA calculation I have no idea. I think it holds its own but I have no evidence.
 

Attachments

  • test product hierachy.xlsm
    22.5 KB · Views: 2
7 secs for 170k rows.
There's a problem with:
.FormulaR1C1 = "= IF( [@LEVEL] > 1, INDEX(scratch,[@[RECORD'#]],[@LEVEL]-1 ), """" )"
where you've used scratch which is an unchanging hard-coded range. You'll need to add:
rng.Name = "Scratch"
somewhere above that.
It does take some setting up.
 
You'll need to add:
rng.Name = "Scratch"
Yes, I fully agree. It should come immediately after
Set rng = rng.Resize(n, 4)
I nearly did it simply to be tidy and consistent but I forgot and, since I didn't actually change the data, the issue never came the top of the action list.

I don't know how you generated 170k rows. I didn't get around to thinking that one through. How does the 7 sec compare? It strikes me as a long time but then, it is a lot of records so perhaps it is acceptable.

Using the spreadsheet as a temporary scratch space for calculation is a somewhat odd concept but it does seem to work. I am usually slightly reluctant to move to solutions where the calculation is performed using VBA, not because I have any objection to using code but, rather, because it becomes a case of "Is the spreadsheet necessary if it is doing no more than providing a blank canvas for results?".

My use of VBA tends to be more limited, typically providing a custom user interface to allow the user to treat spreadsheet-based solutions as though they were self-contained apps (no knowledge of Excel is expected of the client).
 
Here's a variation based on p45cal's observation of VBA providing faster results than formula.

In your test data there were 5 levels. If number of levels are more then change number to suit.

In terms of performance, I'd hope that it will not change anything. It is just fewer lines of code and that is all.

Code:
Public Sub FillPrevLevel()
Dim arInput
Dim arLevelVal(5) As String 'Set the number to suit maximum number of levels
Dim i As Long
arInput = Range("A2:C" & Range("C" & Rows.Count).End(xlUp).Row).Value
For i = LBound(arInput) To UBound(arInput)
    arLevelVal(arInput(i, 2)) = arInput(i, 3)
    arInput(i, 1) = arLevelVal(arInput(i, 2) - 1)
Next
Range("A2:C" & Range("C" & Rows.Count).End(xlUp).Row).Value = arInput
End Sub
 
@shrivallabha
Straightforward!
It wouldn't be very resource hungry if you made it:
Dim arLevelVal(99) As String

I did some time tests (on 43k rows), expecting your variation to be faster, but I was astonished to find it 1.8 times slower! But I was able to bring that difference down to 1.2 times slower by changing one letter of your code (in red below):
Range("A2:A" & Range("C" & Rows.Count).End(xlUp).Row).Value = arInput

Further tweaks made it faster than my offering (by 1%):
Code:
Sub FillPrevLevel()
Dim arInput
Dim arLevelVal(5) As String    'Set the number to suit maximum number of levels
Dim i As Long
arInput = Range("B2:C" & Range("C" & Rows.Count).End(xlUp).Row).Value
ReDim myResults(1 To UBound(arInput), 1 To 1)
For i = LBound(arInput) To UBound(arInput)
  arLevelVal(arInput(i, 1)) = arInput(i, 2)
  myResults(i, 1) = arLevelVal(arInput(i, 1) - 1)
Next
Range("A2:A" & Range("C" & Rows.Count).End(xlUp).Row).Value = myResults
End Sub
Which shows that writing and reading from the sheet takes significant time.
Setting Application.ScreenUpdating to False only knocked a further 1% off the time.
 
How does the 7 sec compare?
With the other methods of putting a formula into the range then converting the results to plain values, testing on 130k rows, took 20 to 30 minutes here.
7 seconds is peanuts.

(Just looking at that last sentence; everything is plural yet it sounds right to use is!)
 
Last edited:
@p45cal

Point noted about 99 or some bigger number.

Your observations tell me two things!

1. Shorter doesn't necessarily mean faster code. Don't pick something up at its apparent face value, especially if it is VBA code! I know it is almost bromide and has been said so many times.

2. It is memory that affects performance in this case.
 
Back
Top