• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Multiple Lookup Problem

Deepak Singh

New Member
Hi,
I have a source data which provides information in a fixed format (Have attached the sample file). As of now, I have been using a lot of manual calculations to analyze the data in order to present in the required format. While going in for automation, I am stuck at 1 point.
In the source data, there are individual activities with their outputs. The outputs are alphanumerical.

I am trying to calculate the availability of the output date-wise as well as hour-wise, which I am able to do for the individual activities by using the Countifs formula, however, when I am trying to do the same for the groups, am unable to do the same.
I am not sure how to go about doing the same.

Request you to please look into the attached sheet and help me out with the same.

Thanks,
Deepak Singh
 

Attachments

Don't think this can be done with COUNTNIFS, but SUMPRODUCT is powerful enough to handle it. That lets us use an MATCH/INDEX to compare the values in the Raw table with the columns in the Group table.

Formula in K8, copied to K8:L10:
=1-SUMPRODUCT((ISNUMBER(MATCH(raw[Tasks],INDEX(group,,MATCH($J8,group[#Headers],0)))))*(raw[Date]=K$2)*(raw[Output]="" &$F$10))/
SUMPRODUCT((ISNUMBER(MATCH(raw[Tasks],INDEX(group,,MATCH($J8,group[#Headers],0)))))*(raw[Date]=K$2))

Formula in M8, copied to M8:N10:
=1-SUMPRODUCT((ISNUMBER(MATCH(raw[Tasks],INDEX(group,,MATCH($J8,group[#Headers],0)))))*(ROUNDUP(raw[Time],0)=M$2)*(raw[Output]="" &$F$10))/
SUMPRODUCT((ISNUMBER(MATCH(raw[Tasks],INDEX(group,,MATCH($J8,group[#Headers],0)))))*(ROUNDUP(raw[Time],0)=M$2))
 
Hi Luke,
Thanks for the reply. However, I believe I did not explain the problem statement in its entirety.
I will try to simplify the same using the example, below.
1. Consider Set1 in the data file, for activity1 and activity2, which are components of set1, the output values are not relevant, however, these are mutually inclusive (for want of a better term). If either or both of the activities sports an error, then the whole output of the set is an Error and the same is considered to be unavailable.
2. Hence, for the date 1st Jan, Set3 will have the availability as 50% as 1 out of 2 datapoints was an error and the output was not available.
This was the reason for using Countifs for the individual activities.

The formula result as per your reply gives me 90% for 1st Jan and 80% for 2nd Jan for all Sets, which is not the desired result.

Hope I have simplified the problem statement.

Regards,
Deepak Singh
 
Hmm. In that case, each Set will have the minimum % of the activities that make up the set. That means the single array formula is:
=MIN(IF(ISNUMBER(MATCH($M$3:$M$7,INDEX(group,,MATCH($M8,group[#Headers],0)),0)),N$3:N$7))

Confirm formula with Ctrl+Shift+Enter, not just Enter.
 
I tried the formula mentioned above, however, it is pointing to a circular reference. Also, I am getting the output as 0 all the times, even though I am changing the source values.
Am I missing something here?
 
Hi Deepak ,

I am not sure what exactly you wish to do , but one way to get the counts of the sets is as follows :

=SUMPRODUCT(MMULT(--(raw[Tasks]=TRANSPOSE(INDIRECT("group"&"["&J8&"]"))),ROW(INDIRECT("group"&"["&J8&"]"))^0)*(raw[Date]=K2)*(raw[Output]="E"))

which will give the count of entries satisfying the following criteria :

1. The tasks match the tasks in Set1
2. The dates match the date in K2
3. The outputs equal "E"

Note that the formula will have to be array entered , using CTRL SHIFT ENTER.

Narayan
 
@Narayan,
Thanks for the help. You have understood my requirements correctly upto point 2. For point 3, the output that I require is the percentage of time when there is no error in the Output column corresponding to the referred Activity or Group.
Suppose for Set1, which has activity1 and 2, if either or both of them has an error, then the whole set was not available for that instant and the % availability as per the sample sheet will be 50% for 1st Jan as there were only 2 activities done that day. In a similar vein, I am trying to find the availability hour-wise, day-wise, monthly summary.
The error codes are variable and the activity groups may also be variable.
If you see the formula that I have used for calculating availability for the individual activities, it may be able to help you understand my problem.
I hope I have been able to explain the problem in a better way.
 
Hi Deepak ,

Sorry , but I am still not able to understand your requirement ; if you see the data , Set3 will have an error entry on Jan-1 , while Set2 will have an error entry on Jan-2.

Therefore , the numerator is 1 in each case ; what is the denominator ? There are a total of 6 entries on Jan-1 by items from Set3 ; is the denominator supposed to be 6 or is it supposed to be 2 ? Your data is very symmetrical with two entries each from Activity1 , Activity3 and Activity4 ; what if this had been asymmetrical , with only one entry of Activity1 , two entries of Activity3 and say 5 entries of Activity4 ?

Rather than giving a sample file , it would be better if you could upload a file with real-life data , and preferably lots of it.

Narayan
 
@NARAYANK991 , Thanks.
Have uploaded the working file.
1. In the summary tab, I am unable to work out the way in which to populate the greyed area.
2. In the Analysis_Txn tab, I need help on everything.

There are multiple Error codes possible in my data.
I am able to automate for the individual pages, however, when I am trying to do the same for the Transaction groups, I am unable to do the same.

I would really love to know how to do this. This is the first time I am attempting such a task.
 

Attachments

I tried the formula mentioned above, however, it is pointing to a circular reference. Also, I am getting the output as 0 all the times, even though I am changing the source values.
Am I missing something here?
The 0's are due to a circular formula. However, I apologize, I copied the formula wrong. :(
Formula in cell K8 should be:
=MIN(IF(ISNUMBER(MATCH($J$3:$J$7,INDEX(group,,MATCH($J8,group[#Headers],0)),0)),K$3:K$7))
 
Hi Deepak ,

Thanks for uploading your working data file.

Now , can you answer my earlier posted question ?

What will be the figures that should come in the areas where you want formulae to do the job ? Unless I understand the logic behind deriving a value , I am not able to come up with a valid formula.

I have added some helper columns to your raw data ; now if we filter on the run dates and error codes , what values should appear in at least one or two cells in the greyed area ? Can you explain ?

http://speedy.sh/uSeZp/Working.xlsx

The file size has gone above the 1 MB limit , which is why I have used the public file-sharing site.

Narayan
 
@Luke M ,
Thanks. The formula is working exactly as desired.

@NARAYANK991 ,
I am unable to download the file uploaded by you as file sharing sites are blocked in my office. I will download the same at home in the evening today and check out the same.
Regarding the queries in your earlier post, the logic used is as follows:

1. We are measuring 4 KPIs in this sheet:
a. Performance - Measure of the output of the individual pages and transactions. I have used basic average function for the individual pages and for transactions, have modified Luke's formula using Sum, Index+Match, which is giving me the desired output. Apart from the average, I also need to calculate the percentile values for the entire data range. For example, the 90th percentile value of the output. The percentile value required will be dynamic and gets changed as per the flavor of the month.

b. Availability - Percentage No. of times the individua pages and transactions were available to users. For individual pages, I am using the formula in C2 on Summary Tab, (=1-(COUNTIFS(raw[page_name],$A2,raw[response_final],"E")+COUNTIFS(raw[page_name],$A2,raw[response_final],"SE"))/COUNTIF(raw[page_name],$A2). For transactions, I have tried modifying Luke's formula and is giving me the desired result for one condition. The logic used for availability of transactions is, "If any or all of the pages of the specific transaction for a specific time instance are showing an error, then the entire transaction is an error.

c. End-User Availability - It is the same as Availability, however, this takes into consideration all the error codes, whereas for Availability, we are considering specific error codes.

d. UEI - User Experience Index - We are trying to ascertain the percentage of users who were satisfied, tolerating or frustrated with the experience on the individual pages or transactions. In this case, supposed we decide the threshold for satisfied users is 30 and for tolerating users is 60. So any transaction which has an output of less than 30 will be satisfied, between 30-60 will be tolerating and greater than 60 will be frustrated. These thresholds are dynamic for each page and transaction and are dependent on the SLAs desired.

2. Raw Data: For each time instance, the complete set of pages is used and the output of the same is calculated by the system and is populated in the format provided. In case of any error, there is a possibility that the pages post the page on which the error is encountered may not be used. The actual raw data provided by the system is Page_name, Date, Time, Response_Time and Error_Code. The rest of the columns are helper columns which I had created for the purpose of calculations.

I have attached the working file again, where I have modified and used the formula provided by Luke in the last post.

If you have reached till here, thanks for your patience with me and I hope I have been able to explain in proper detail.

Sorry for the delayed reply.

Regards,
Deepak Singh
 

Attachments

Back
Top