• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Select optimum value from an array of data

sameergaur

New Member
Hi there,


My data is in the following format:

Job# Location

1002612 L610

1002619 L610

1002622 L610

1003360 L560

1003360 L871

1003470 L2925

1003470 L244

1003470 L120

1003498 L2925

1003498 L244

1003498 L120

1003523 L2925

1003523 L244

1003523 L120

1003527 L140

1003537 L3545

1003537 L1637

1003549 L2129

1003549 L1893

1003550 L2925

1003550 L244

1003550 L120

1003555 L244

1003555 L281

1003555 L3545

1003561 L2925

1003561 L140

1003561 L2148

1003566 L2925

1003579 L244

1003580 L281

1003580 L3545

1003580 L2148

1003582 L2925

1003582 L244


I want to come up with five optimum locations which will help me complete maximum number of jobs.


As you could see in some cases a job needs only one location but sometimes it could need more than 1. Also one location could be associated with multiple jobs. Usually the data ranges in 100s of records and its hard to manually run pivot tables and try to come up with five optimum locations.


Could anyone help?


Thanks

Sameer
 
Sameer


Running a Pivot Table with Count Job# and looking at the Total Column/Row should give you the answer your after
 
Running a pivot table tells me how many jobs are associated with a particular location. But like I said, those jobs could require more locations to be completed and those locations again might have more jobs associated with them and so on.


Running the pivot gives me the details of 50+ locations. I only want 4 or 5 optimum locations which will be able to complete maximum number of jobs.


From the above example you could see that location L610 could complete top three jobs as they require only one location. But in complete data L610 is also associated with 7 other jobs which again require multiple locations.


Another example is location L2925 which is associated with multiple jobs but does not complete them. Those jobs require other locations too.
 
Hi, sameergaur!


Give a look at this preliminary approach:

https://dl.dropbox.com/u/60558749/Select%20optimum%20value%20from%20an%20array%20of%20data%20%28for%20sameergaur%20at%20chandoo.org%29.xlsx


It only lacks of the handle of locations with equal numbers of jobs, I guess, but with your posted data it works fine. I'll get back to you later if I can find a Carlseberg... excuse me... a solution.


Regards!


EDIT: Hmmm... I think it doesn't work as expected... gonna re-test it... please perform your own controls too.


EDIT 2: Re-uploaded correct workbook, solved as expected, still remains duplicates issue. I apologize, mixed two workbook names.
 
Hi, sameergaur!

Please download again the updated final version of the file from the same previous link.

Just advise if any issue.

Regards!
 
Hi Sameer ,


This is a wonderful problem ; however I wonder whether any formula based solution alone can give you the really optimal result. Please correct me if my understanding is wrong.


My point is what is your definition of optimality ?


Do you want a list of locationss , each of which , in isolation gives you the maximum number of jobs , or do you want an optimal list of those 5 locations , which taken together will allow you to complete the maximum number of jobs ?


To take your data as an example , consider any of the supposedly optimal locations 2925 , 244 and 120 ; the fact is that since all three locations are required to complete one job such as 1003470 , 1003498 , 1003523 , 1003550 , the fact that each of them is capable of doing 6 , 7 or 4 jobs has no meaning ; together , these 3 locations can complete only 7 jobs ; in contrast , a location like 610 can complete 3 jobs.


Suppose we look at the supposedly optimal locations :


2925 , 244 , 120 , 610 and 3545


The jobs 1003537 , 1003555 and 1003580 which are associated with location L3545 can never be completed , since they also require other locations which are not in the above list of 5 locations !


I think a really optimal solution will need a VBA solution.


Narayan
 
Hi, sameergaur

After reading NARAYANK991's post I'm wondering if you actually mind leaving incomplete jobs but maximizing the completed jobs at the selected locations, as you stated "I want to come up with five optimum locations which will help me complete maximum number of jobs.".

I assumed that this was for scheduling a trip and that you wouldn't be obliged to do all the jobs at all the top 5 selected locations, but only those that would let you achieve your goal. That's to say, travel less, do only the necessary and completable jobs, and come back home smiling.

Just to check if the only formula based solution posted might be updated or not.

Regards!
 
Following Narayan's lead, I think we can use solver to find the optimum. My assumptions are,


- You want to find any combination of 5 locations that lets you complete maximum number of jobs. A job is completed when all its locations are in the 5 locations selected.

- You do not mind if a location can work on some other job too but that job remains unfinished. Your intention is to just maximize completed jobs and you do not care if few jobs are left unfinished.


This means, we need to find the optimum 5 location combination out of all unique locations. There will be COMBIN(count of unique locations, 5)


The next step is to set up a solver problem.


Unfortunately your original data structure makes it difficult to set this up easily (still possible, just that the formulas become too lengthy).


So, I did a transformation. I converted your table to something like this.

[pre]
Code:
Job Location1 Location2 Location3 ...
... ......... ......... .........

As per your data, maximum locations needed for any job is 3, so I stopped there, but you can easily extend this.


Then, I extracted all unique jobs and filled the above table using lookup formulas.


Then, I extracted all unique locations and filled them in below table structure.


Location Included?
L610     1
L560     0
L871     1
............
[/pre]
The included column is binary, 1 for inclusion, 0 for exclusion.


Then in 5 cells, I extracted the 5 chosen locations by doing lookups and counted how many jobs can be completed by each using array countif formulas.


Finally, I used solver to set up the optimization problem.


We need to maximize the count of jobs that can be completed.


Subject to below constraints.


The job inclusion column should be binary (1 or 0 alone)

The number of jobs included should be 5


The cells in inclusion column can be changed to optimize the solution.


This seems to work ok for your data set. Excel tells me the combination of

L610

L560

L871

L2925

L244


will let you finish 7 jobs

1002612

1002619

1002622

1003360

1003566

1003579

1003582


See this file:

http://img.chandoo.org/playground/best-5-job-locations-problem.xlsx


My thoughts on this approach:



It is too time consuming to set this up manually everytime your data changes. You should try automating this thru VBA (that is paste data, click a button that will create separate sheet with transformed data & set up solver table and launch solver)


Another option is to skip solver and use VBA to iterate thru all combinations. I would not recommend this if you have anything more than few dozen locations as this can be terribly slow.
 
Hi All ,


I think the problem is more complicated than we can imagine , since we are at present only going by the sample data that has been posted. Considering that the data can vary in any fashion , I doubt that anything less than a full-fledged optimization software can give us the optimal solution.


I have not gone through Chandoo's workbook , but I had just created a matrix out of the columnar data posted , with unique jobs as row headers and unique locations as column headers ; we can now fill in this matrix with a 1 wherever a job can be performed at a location.


If the location is the only one needed for that job , then there will be only one entry in a row ; also , if a location can take up multiple jobs , there will be several entries in a column.


Given such a setup , it is logical to expect that we first look at those locations which can give us the maximum number of jobs on their own ; in the given setup , 610 is one such , since it can be used to complete 3 jobs ; however , I doubt that this is in any way significant , since 2 locations on their own may not be able to complete any job , but together can be used to complete even 20 jobs !


My point is that everything depends on the data ; an algorithm will have to explore all combinations of 1 , 2 , 3 , 4 and 5 locations to maximize the sum total of jobs that can be completed using a particular set of 5 locations.


Just to prove this , consider Chandoo's result from the solver , which gives a result of 7 jobs using 5 locations ; surely , this can be improved upon , since 610 itself gives 3 jobs ; using 2925 , 244 and 120 together allows us to complete another 7 jobs ; adding 140 to these 4 will now give us a total of 11 jobs , using 5 locations which are all within the set.


Narayan
 
You are right Narayan... I found that Solver is excellent for solving linear or usual optimization problems, but is very poor when it comes to integer or binary problems. I am sure a brute-force (or any other search) program can reveal the best possible 5 location combination.
 
SirJB7, Narayank991, Chandoo,


Thank you all for replying!


That is right, I am looking for the best possible set of 4 or 5 (preferably 5) locations that would be able to complete maximum jobs, while some of the jobs associated with these locations are left uncompleted.


However I did have an impression that this might need vba. The data changes everyday, it could have 20-40 locations and 100-200 jobs. A code would be of great help as then anyone can run it when they get a new file for the day.


I just read through all your posts, I will try these options and let you know how it goes.


PS: I am in the early stages of my Excel learning, it might take me few hours to understand and try these options!


Thanks again.


cheers

Sameer
 
SirJB7, I just tried your sheet, it only gives me top 5 locations with maximum number of jobs associated with them. However it does not tell me how many of those job would I be able to complete through those 5 locations. I think its similar to running a pivot on the data and getting the number of jobs associated with each location and taking the top five.


As Chandoo pointed out.. A job is completed when all its locations are in the 5 locations selected.


I will try Chandoo's approach now. I will need another cup of coffee though as this one seems to be a bit beyond my Excel understanding so far!
 
Hi, sameergaur!


Well, now that you confirmed my earlier assumptions, and considering the number of locations and jobs, I think you could give a try to my only-formula solution and then check the calculation time, which I suppose it'd be acceptable. I tested it with 500/50 and it worked fine.


However, if you happen to have a much greater (10X, e.g.) number of items it'd be advisable to think on another approach.


Just advise if any issue.


Regards!
 
Thanks SirJB7. Although I tried Chandoo's workbook and populated it with the entire data and modified the tables and formulas accordingly and it seems to be working. I understand Narayan's point that it could be improved upon but it still gives pretty good results. Thanks a lot Chandoo! I am reading your tutorial on Solver right now to understand it better!


I will however have to look into the option of somehow automating so anyone can run this as a report.


Thanks again guys.


cheers

Sameer
 
Hi, sameergaur!

Glad you solved it, no matter what solution you chose the important thing is having a wider scope to make the more suitable election.

Thanks for your feedback and for your kind words too. And welcome back whenever needed or wanted.

Regards!
 
Thanks SirJB7.


Chandoo, I ran the solver on the file using all three solving methods, Simplex LP, GRG Nonlinear and Evolutionary.


Simplex LP always selected the top 8 (8 Locations - 28 Jobs) and GRG Non linear always selected the bottom 8 (8 Locations - 22 Jobs). However Evolutionary method took sometime to run (unlike the other two which came up with a solution in a sec), it looked like it did optimize the solution. The 8 locations it selected could complete 36 jobs.


I am good with the solution, just wanted to bring this up.


cheers

Sameer
 
Hi, sameergaur!

Thanks for your feedback again. Did you get 36 for top 8 with your posted data? I only arrive to 30.

Regards!
 
Back
Top