• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Array to lookup based on two different criteria, without duplicates and sorted?

cacos

Member
Hi everybody!


I'm working on an array to lookup categories based and create a list based on two different criteria. What i'm struggling with, is to have the results avoid duplicates, and also sorted alphabetically. All with the same array.


Currently I'm using two arrays to do the work, one for lookup, one for removing duplicates and sorting.


Here's an example on the data (is 35,000 rows in reality and growing)


Column A - Column B - Column C - Column D

June Belgium Building House

June USA Vehicle Car

May Spain Vehicle Bus

April UK Building Appartment

January Spain Building Land

February UK Vehicle Bike

February USA Etc Etc

April Belgium Etc Etc


I'm trying to pull what's on column D, based on criterias that match with columns B and C.
 
Hi Cacos,


Assuming your data is present in between A1:D8 like bellow

[pre]
Code:
June	     USA	Building	House
June	     USA	Vehicle	        Car
May	     Spain	Vehicle	        Bus
April	     UK	        Building	Appartment
January	     USA	Building	Bike
February     UK	        Vehicle	        Bike
February     USA	Etc	        Etc
April	     Belgium	Etc	        Etc[/pre]
you can get the result with one helper column, write:


In E1, [code]=IF(AND(B1="USA",C1="Building"),ROW(A1),"-"),

In F1, [code]=IFERROR(INDEX($B$1:$B$8,SMALL($E$1:$E$8,ROW(A1)),0),"")

In G1, =IFERROR(INDEX($C$1:$C$8,SMALL($E$1:$E$8,ROW(A1)),0),"")[/code]

In H1, =IFERROR(INDEX($D$1:$D$8,SMALL($E$1:$E$8,ROW(A1)),0),"")[/code]


...Drag down, hope that works.


Side Note: If you search the forum, you will find similar solution in abundance, some will be using macros as well so see what suits your need.


Regards,
 
Pivot table!


I tried to do something similar to this with several criteria... with an array formula working on about 20,000 rows of data it ended up taking about 2 minutes to look up. I realised that what I was trying to do could be run practically instantly by a pivot table; only thing is it won't automatically update, so if you want it to you need a macro to refresh it.
 
Yes that's the same i'm experiencing. It does the job, but it takes a REALLY long time to update.


I'm not that good with vba so I'll keep on looking to see if I find some other solution on the forum.


Thanks!
 
In fact the vba to refresh can be really simple - just put something like this on the worksheet that shows your output:


Private Sub Worksheet_Activate()

ThisWorkbook.RefreshAll

End Sub


Then it will update when you open that sheet.


Gets slightly more complicated if you want to change the filters based on other cell values (what I was doing), had to use named ranges and a slightly cumbersome macro...
 
Mbegg, do you have an example that I can adapt to my template? Because that's exactly what I need, since the filter is based on cells that change according the user's input
 
Or else, does anyone know how to make this array (that pulls from a list according to two criteria), avoid duplicates?


=IFERROR(INDEX($A$1:$I$30377,SMALL(IF($D$1:$D$30377=$N$8,IF($F$1:$F$30377=$N$9,ROW($A$1:$A$30377))),ROWS($J$4:$J4)),7),"")
 
@cacos,


You could try the following formula (courtesy of Haseeb A, adapted to support additional criteria):


First of all, here are my assumptions:

1. You are trying to get a list of the items in columnD, based on criteria for ColumnB and ColumnC

2. You do not want duplicate of items listed from columnD


For ease of reading, I have used some named data ranges:

colD is the range in column D

colB is the range in column B

colC is the range in column C

colBCriteria is the name for the cell where the criteria for columnB is stored. (e.g. $N$8)

colCCriteria is the name for the cell where the criteria for columnC is stored. (e.g. $N$9)


Put the following formula in cell G2, and copy down. Since it is an array formula, you need to enter it with Control + Shift + Enter.


=IFERROR(INDEX(colD, MODE(IF((colB=colBCriteria)*(colC=colCCriteria)*(colD<>"")*ISNA(MATCH(colD, $G$1:$G1, 0)),MATCH(colD,colD,0)*{1,1}))), "…")


(If you need to change the location for the list, change the reference to $G$1:$G1 as needed)


It will list the first most frequently occurring item from columnD that meets the criteria for colB and colC, then the next most frequently occurring, etc. When no more items are found, it will display "..."


I tested this formula with the following sample data:

[pre]
Code:
#	ColA	ColB	ColC	   ColD
1	June	Belgium	Etc	   House
2	June	USA	Vehicle	   Car
3	May	Spain	Vehicle	   Bus
4	April	UK	Building   Appartment
5	January	Spain	Building   Land
6	FebruaryUK	Vehicle	   Bike
7	FebruaryBelgium	Etc	   Etc
8	April	Belgium	Etc	   Etc
Criteria for ColB that I used was "Belgium"

Criteria for ColC that I used was "Etc"


I got the following results:

Etc
House
…
…
[/pre]
Hope this helps.


Cheers,

Sajan.
 
Hi,

The following is a tweak to the formula to sort the results as well. (i.e. it will list the most frequent item first, then the next most frequent, etc. However, if several items occur with the same frequency, it will present them sorted.)


The formula is based on ideas by Haseeb A and Oscar Cronquist. (Thanks for introducing me to the MODE(MATCH(...)) and MATCH(MAX(COUNTIF(...),COUNTIF(...)) techniques.)


Sample data:

[pre]
Code:
#	ColA	ColB	ColC	  ColD
1	June	Belgium	Building  House
2	June	Belgium	Building  Car
3	May	Spain	Vehicle	  Bus
4	April	UK	Building  Appartment
5	January	Spain	Building  Land
6	FebruaryBelgium	Building  Bike
7	FebruaryBelgium	Building  Etc
8	April	Belgium	Building  Etc
[/pre]
The formula returns the following results, when "Belgium" and "Building" are selected as the criteria for ColB and ColC, respectively.

'

Etc

Bike

Car

House







`


The following formula is in cell G2, entered with Ctrl+Shift+Enter. Copy down to additional cells (like G3, G4, etc.)


=IFERROR(INDEX(colD, MATCH(MAX(MODE.MULT(IF((colD <> "") * (colB=colBCriteria)*(colC=colCCriteria)*NOT(COUNTIF($G$1:$G1, colD)), (COUNTIF(colD, ">"&colD)+1)*{1,1}))), IF((colD<>"")* (colB=colBCriteria)*(colC=colCCriteria),COUNTIF(colD, ">"&colD)+{1}), 0)), "…")


Perhaps someone will suggest ways to simplify the formula.


Cheers,

Sajan.
 
Hi Sajan! Thank your for post, I'm sorry I wasn't able to reply sooner. Both formulas are great, the 1st one is the one I need.


I'm still struggling with the response time until it actually runs the formula, but I guess I'll have to distribute the list or something of the sort since there doesn't seem to be any other way.


Thanks!
 
Cacos, please send me an email - malcolm[dot]begg[at]gmail.com and I will send you an example file.
 
Hi cacos,

I read your original post again and realized that you are working with a large data set. Since the formulas above perform repeated calculations on the full data set, it will slow down as the data set size grows.


If you are not able to use pivot tables, etc, and require a formula based solution, we will need a different approach. In order to deal with the large data set, the solution should ideally return the complete result set at once, so that Excel has to calculate just once. I have an idea about using the FREQUENCY() function to do just that. Please give me a day or so, and I will find some time to explore this idea.


Regards,

Sajan.
 
Maybe this:

Data in A1:D9999 (A1:D1 --> Column Headings)

E1: Column B Match

F1: Column C Match.


Formula (Array):

=INDEX($D$1:$D$9999,SMALL(IFERROR(IF(MATCH($E$1&$F$1&$D$1:$D$9999,$B$1:$B$9999&$C$1:$C$9999&$D$1:$D$9999,0)=ROW($B$1:$B$9999),ROW($B$1:$B$9999),10000),10000),ROWS($A$1:$A1)))


Note 10000 is value outside your database. Copy down until you get errors. Results will be unsorted (first found --> Reported first). If you can sort database then you will get sorted results.
 
Hi shrivallabha, thanks for your post. I've tried it and it freezes excel, it's just that it's too much information I think.
 
Hi cacos,

I think your best bet is a VBA based solution, such as the one you were exploring already (to refresh a pivot table).


I used the following formula with a 5000 row data set, and it took a few seconds. Even simple Excel formulas such as COUNTIF seems to take a long time for larger data sets.


=ROWS(List)+1-ROUND(MOD(LARGE(IFERROR(ListFrequency + LARGE(IF(ListCriteria, ROWS(List)+1-ROW($A$1:INDEX(A:A,ROWS(List)))), ROW($A$1:INDEX(A:A,SUMPRODUCT(ListCriteria)))) / ROUNDUP(1, -(LEN(ROWS(List)))), 0), ROW($A$1:INDEX(A:A, SUMPRODUCT(N(ListFrequency <> 0))))), 1)*ROUNDUP(1, -(LEN(ROWS(List)))), 0)


Entered with Ctrl + Shift + Enter


"List" is a Named range for your source list

"ListCriteria" is a Named range for the criteria to apply to your list

e.g. =(List <> "")


"ListFrequency" is a Named range that indicates the frequency for each item, and uses the following formula to calculate it:

=FREQUENCY(IF(ListCriteria, MATCH(List,List,0)),IF(ListCriteria, MATCH(List,List,0)))


To get the few second response time for 5000 rows, I array entered the first formula over 5000 rows. (That ensures that Excel has to calculate the results only once.)


Again, for larger data sets, your best bet is to use a combination of the Pivot table, and VBA to refresh the pivot table on demand.


Regards,

Sajan.
 
Back
Top