# Tell me about an analysis problem that you couldn’t solve with Excel?

Posted on September 9th, 2016 in Analytics - 38 comments

Time for a quick show & tell.

**Tell me about an analysis problem that you couldn’t solve with Excel?**

It can be because you didn’t know how to solve the problem or Excel isn’t the tool for it or any other reason.

Go ahead and speak up. Post your tricky analysis problems in the comments section.

### Leave a Reply

Announcing 50 ways to analyze data course to make you an awesome analyst – We are open |
Stacked Bar and Indicator Arrow Chart – Tutorial |

## 38 Responses to “Tell me about an analysis problem that you couldn’t solve with Excel?”

HOW TO PICK AN ARRAY BY SPECIFIC CRITERIA (TO COMBINED WITH MATCH, MAX, OR OTHER FUNCTION), ALL THIS TIME I ALWAYS USE CHOOSE FUNCTION TO PICK SOME CRITERIA (EX: =CHOOSE($A$1,$B$1:$B$100,$B$101:$B$901)) BUT THE WEAKNESS OF THIS FUNCTION IS I HAVE TO SORT THE CRITERIA. I WISH YOU GET MY POINT AND SORRY FOR THE CAPSLOCK.

I think I understand what you mean. I also wanted something similar once- to create a array or a list based on one condition and could not find an out of box solution. I hope someone gives you a solution 🙂

You can use one of two approaches.

=INDEX((range1, range2, range3, range4, range5...),,4) will return the range corresponding to the number you pass at the end.

=CHOOSE(MATCH("a",{"a","b","c","d","e"},0),range1, range2, range3, range4, range5...)

I suggest using the INDEX approach with 4th parameter to pass on the range number.

Deduplication. I agree that data deduplication is itself not an analysis, but data cleansing is part of the process. We were trying to estimate the amount of time it would take to deduplicate some entity records (like Contacts , Addresses) in a software system. We wanted to estimate how many duplicate records were present, how many times, so we could give an accurate estimate to the customer. This is because we also have to test the validity of the final dataset, and that depended on the number of duplicates removed.

We have done these kinds of estimates previously using de-duplicate feature in excel, but this one project had a legacy system, and had millions of records to be processed. Excel kept crashing, even 64bit version.

I finally used a unix command line shell to write a small script to do the deduplication for me.

To count "Data duplication" I use countif function, which works fine.

Syntax:

=COUNTIF(Column Fixed Range, "Criteria")

If you could send me the datafile, I will see it accordingly.

My dataset consisted of 33,000 rows and 163 columns. Each row had one invoice, so in total there were 33,000 invoices. There were 3 columns for each "part" of the invoice.

"Part Name 1" + "Part Price 1" + "Part Description 1" ... etc

The invoices were rather scrambled. So I wanted to summarize how many of each "Part" was in the full dataset. This was challenging as each part for each invoice could be in a different column. How could I have gone about summarizing this data knowing that each unique part in each invoice could be in any column?

I would create an extra three columns to organize your initial data

Ex: Let's say that "Part Name 1" + "Part Price 1" + "Part Description 1"

are contained in the column A,B,C

For Price new Column , you can use the folowing formula : IF(ISNUMBER(A1); A1; IF(ISNUMBER(B1);B1; C1))

And after that you can do a simple pivot table to sum up the invoices

I believe there is also something that helps you to recognize the name field or the description field. Hope it helps

I've got partial-year data from B10:B200, increasing or decreasing sequentially by row, formatted as a number (i.e., 3.2, 2.2, 1.2, 0.2, -1.2, -2.2, etc....). In the adjacent C column, I've got a number I need to find (let's say the result needs to go in A1), based on when the number in column B is >=0 and =0 and B <1. I hate making unnecessary columns, even if the column can be hidden or the text colored white. Please help.

1) Create an Excel 2013 Tree Chart

2) Make slicers that can be hidden, toggled on/off or scrolled through. Once, I make my selections, I want them to go away. Or I want to allow the users to select which slicer(s) they see.

3) Show me a list of all of the filters & slicers the user has made

Hi Jon,

Ugh, I prefer to use form controls and pull the data via formulas just because slicers are so difficult to control and work with. (and you cant restrict multiple selection which can ruin your chart formats).

2. Sometimes to show and hide things I just use a white square and cover the thing - maybe that trick can work for you? That means I don't hide/show the thing, but the white square (or whatever is background color).

3. I go to the source pivot table and reference the exact cell where the filter is - so for example C5 (not the long reference you get if you click on a pivot table cell). Then I would use concatenate, if, etc. to get the list of filters...

Best, Iva

I'll never forget the first time I hit a wall with Excel. I ended up doing it in Excel any way... which was probably a mistake. A co-worker helped me with some stats work and used R and calculated confidence intervals using bootstrap. I wanted to be able to repeat the analysis in Excel so I wrote my own bootstrap algorithm in VBA. It was quite a bit of code compared to the single command in R. That's when I decided to learn R. I still use Excel for most of my work but jump into R when I have more complex analysis.

Thanks Ian for insight, kindly tell me what is the use of bootstrap and how it works in excel or R.

If you give example it will be good.

I'd like excel to be able to solve some integral calculus problems. I can plot data in a scatter chart and excel will define a trend line and equation for a polynomial line (as well as others). I would like to have a relatively straight forward way to integrate the generated equation to find the area under the curve between the points I select. Any thoughts on this?

Rank an array of rows in date/time order for each transaction ID.

Multiple transaction IDs, with multiple transactions for each ID.

Example:

1234 ]

1234 ]]]> Which came first and last for ID 1234?

1234 ]

1235

1235

1236

1237

Sara is thinking of opening a copy shop, and needs to decide how many copiers to lease. It costs $5,000 to lease a copier for a year. Operating the copier costs $0.020 per copy for paper, ink, and other variable expenses. Fixed costs for rent and store operations add up to $300 per month. Sara plans to charge customers $0.11 per copy. The store will be open year round (365 days), and each copier can make up to 100,000 copies per year (i.e., 100,000/ 365 copies per day).

The copy shop’s revenues depends on the daily demand for copies (i.e., the total number of copies that arriving customers will want to make) and the available copier capacity. If all available copiers are fully utilized in a day, any excess demand is lost, i.e., customers will go to a competing copy shop. Sara thinks that the daily demand will be stable (i.e., assume that demand on every day of the year will be the same or very close to the daily average), but she is not sure what value this demand will take. She thinks that the daily demand can take one of four possible values -- 500, 1000, 1500, or 2000 copies per day.

Develop a spreadsheet model to address the following questions:

(i) Provide a mathematical model formulation to evaluate the total profit for a given daily demand and a given number of copiers leased. Be sure to clearly define and use general notation (not specific numbers), and write equations to calculate the outputs (and any intermediate quantities).

(ii) To determine the “optimal” number of copiers to lease, Sara wants to evaluate profits as the number of copiers increases from one to six. For each of these six options, compute annual profit for each value of daily demand (500, 1000, 1500, or 2000 copies per day).

I have created a draft model, it will give some initial understanding, may require further polishing, if I understood it well.

There are several ways using "goal-seek" Excel module to arrive at required optimum. Likewise you can also use "solver" as well.

If you would like to see the draft model send me your email.

Thank You. This is my email.

c.d.carr@tx.rr.com

Applying a hoover effect to graph, so that defined value only appears on data point when hoover over that point.

Hi there! First time sorry by the english, i don´t speak english fluently, but i try with the help of the google. However i try by my self, only when i can´t, i call google. So, i don´t know how i can count, in a pivot table, the distincts registers by some column. I need to do a calculation between two columns, one needs have the count of distincts registers of the customers, another column needs have a count of all occurrences of the registers of the customers. One more time sorry by the english. I´do make my self clear? Thank you very much!! Hugs!!

Hi chandoo sir,

I want to restrict my site person only to enter data on a daily basis in excel. That person should be unable to edit that data.

I tried VBA but it shows something wrong please help me on this

Thank you

I have two problems which might not quite be data analysis. One is finding a best match instead of an exact match for labels. This is useful when looking up names that are mis-spelled, such as Mississippi, Misissippi, Missisippi, and allows the grouping of the similar records which should be considered the same. I want to look up the mis-typed label and return the lookup which is the closest fit. (I have UDF to do this, but for large tables it can be slow, and I need to include the UDF in each spreadsheet, or in the Excel file)

The second is selecting (in order) unique records from a list of records using formulae. So I want a compressed list which includes a single instance of e.g., each US States from a table which has all the states included several times. For the English readers among you - choose Leicester, Lester, Lescester, and UK Counties 🙂

I think both of these fall under data cleaning when it comes to analysis.

thanks for this opportunity. I deal with customer follow up after recording Date, Name,TelNo, Course enquired and remarks.

Each time i call to followup and interested in first retrieving a record of the last date we called the client and what he said

You can use filters for your data

Here is a challenge:

Input: Vehicle deliveries per region/country and year (see below)

Calculation tools: PowerQuery/Pivot only! (Already solved with Excel formulas)

Output: Add 2 columns;

Column4: 5yr vehicle parc (sum of 5 full yrs) per region/country/yr,

Column5: Average parc age per country/yr

--------

Example (Column1-3):

let

Source = Excel.Workbook(Web.Contents("https://www.scania.com/group/en/wp-content/uploads/sites/2/2015/09/Monthly-truck-registrations.xlsx"), null, true),

Europe_Sheet = Source{[Item="Europe",Kind="Sheet"]}[Data],

#"Changed Type" = Table.TransformColumnTypes(Europe_Sheet,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type any}, {"Column5", Int64.Type}, {"Column6", Int64.Type}, {"Column7", type any}, {"Column8", Int64.Type}, {"Column9", Int64.Type}, {"Column10", type any}, {"Column11", Int64.Type}, {"Column12", Int64.Type}, {"Column13", Int64.Type}, {"Column14", type any}, {"Column15", Int64.Type}, {"Column16", Int64.Type}, {"Column17", Int64.Type}, {"Column18", Int64.Type}, {"Column19", type any}, {"Column20", type any}, {"Column21", Int64.Type}, {"Column22", Int64.Type}, {"Column23", Int64.Type}, {"Column24", Int64.Type}, {"Column25", type any}, {"Column26", type any}, {"Column27", type any}, {"Column28", type any}, {"Column29", Int64.Type}, {"Column30", Int64.Type}, {"Column31", type any}, {"Column32", type any}, {"Column33", type any}, {"Column34", Int64.Type}, {"Column35", type any}, {"Column36", type any}, {"Column37", Int64.Type}, {"Column38", Int64.Type}, {"Column39", type any}, {"Column40", type any}, {"Column41", type any}, {"Column42", Int64.Type}, {"Column43", Int64.Type}, {"Column44", Int64.Type}, {"Column45", Int64.Type}, {"Column46", Int64.Type}, {"Column47", Int64.Type}, {"Column48", Int64.Type}, {"Column49", Int64.Type}, {"Column50", Int64.Type}, {"Column51", Int64.Type}, {"Column52", Int64.Type}, {"Column53", Int64.Type}, {"Column54", Int64.Type}, {"Column55", Int64.Type}, {"Column56", Int64.Type}, {"Column57", type any}, {"Column58", type any}, {"Column59", Int64.Type}, {"Column60", Int64.Type}, {"Column61", Int64.Type}, {"Column62", Int64.Type}, {"Column63", Int64.Type}, {"Column64", Int64.Type}, {"Column65", Int64.Type}, {"Column66", Int64.Type}, {"Column67", Int64.Type}, {"Column68", type any}, {"Column69", Int64.Type}, {"Column70", type any}, {"Column71", Int64.Type}, {"Column72", Int64.Type}, {"Column73", Int64.Type}, {"Column74", type any}, {"Column75", Int64.Type}, {"Column76", Int64.Type}, {"Column77", type any}, {"Column78", Int64.Type}, {"Column79", Int64.Type}, {"Column80", Int64.Type}, {"Column81", Int64.Type}, {"Column82", Int64.Type}, {"Column83", type any}, {"Column84", type any}, {"Column85", type any}, {"Column86", type any}, {"Column87", type any}, {"Column88", type any}, {"Column89", type any}, {"Column90", type any}, {"Column91", type any}, {"Column92", type any}, {"Column93", type any}, {"Column94", type any}, {"Column95", type any}, {"Column96", type any}, {"Column97", type any}, {"Column98", type any}, {"Column99", type any}, {"Column100", type any}, {"Column101", type any}, {"Column102", type any}, {"Column103", type any}, {"Column104", type any}, {"Column105", type any}, {"Column106", type any}, {"Column107", type any}, {"Column108", type any}, {"Column109", type any}, {"Column110", type any}, {"Column111", type any}, {"Column112", type any}, {"Column113", type any}, {"Column114", type any}, {"Column115", type any}, {"Column116", type any}, {"Column117", type any}, {"Column118", type any}, {"Column119", type any}, {"Column120", type any}, {"Column121", type any}, {"Column122", type any}, {"Column123", type any}, {"Column124", type any}, {"Column125", type any}, {"Column126", type any}, {"Column127", type any}, {"Column128", type any}, {"Column129", type any}, {"Column130", type any}, {"Column131", type any}, {"Column132", type any}, {"Column133", type any}, {"Column134", type any}, {"Column135", type any}, {"Column136", type any}, {"Column137", type any}, {"Column138", type any}, {"Column139", type any}, {"Column140", type any}, {"Column141", type any}, {"Column142", type any}, {"Column143", type any}, {"Column144", type any}, {"Column145", type any}, {"Column146", type any}, {"Column147", type any}, {"Column148", type any}, {"Column149", type any}, {"Column150", type any}, {"Column151", type any}, {"Column152", type any}, {"Column153", type any}, {"Column154", type any}}),

#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),

#"Filtered Rows" = Table.SelectRows(#"Promoted Headers", each ([Year Month] = "Scania")),

#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"External Use", "Year Month"}),

#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns","-","",Replacer.ReplaceValue,{"201607", "201606", "201605", "201604", "201603", "201602", "201601", "201512", "201511", "201510", "201509", "201508", "201507", "201506", "201505", "201504", "201503", "201502", "201501", "201412", "201411", "201410", "201409", "201408", "201407", "201406", "201405", "201404", "201403", "201402", "201401", "201312", "201311", "201310", "201309", "201308", "201307", "201306", "201305", "201304", "201303", "201302", "201301", "201212", "201211", "201210", "201209", "201208", "201207", "201206", "201205", "201204", "201203", "201202", "201201", "201112", "201111", "201110", "201109", "201108", "201107", "201106", "201105", "201104", "201103", "201102", "201101", "201012", "201011", "201010", "201009", "201008", "201007", "201006", "201005", "201004", "201003", "201002", "201001", "200912", "200911", "200910", "200909", "200908", "200907", "200906", "200905", "200904", "200903", "200902", "200901", "200812", "200811", "200810", "200809", "200808", "200807", "200806", "200805", "200804", "200803", "200802", "200801", "200712", "200711", "200710", "200709", "200708", "200707", "200706", "200705", "200704", "200703", "200702", "200701", "200612", "200611", "200610", "200609", "200608", "200607", "200606", "200605", "200604", "200603", "200602", "200601", "200512", "200511", "200510", "200509", "200508", "200507", "200506", "200505", "200504", "200503", "200502", "200501", "200412", "200411", "200410", "200409", "200408", "200407", "200406", "200405", "200404", "200403", "200402", "200401"}),

#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Replaced Value1", {"Country"}, "Attribute", "Value"),

#"Extracted First Characters" = Table.TransformColumns(#"Unpivoted Columns", {{"Attribute", each Text.Start(_, 4), type text}}),

#"Changed Type1" = Table.TransformColumnTypes(#"Extracted First Characters",{{"Value", Int64.Type}}),

#"Grouped Rows" = Table.Group(#"Changed Type1", {"Country", "Attribute"}, {{"Vehicles", each List.Sum([Value]), type number}}),

#"Filtered Rows1" = Table.SelectRows(#"Grouped Rows", each ([Attribute] "2016")),

#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows1",{{"Attribute", "Year"}})

in

#"Renamed Columns"

I had three columns - Plant, Material, Price. Basically same material was supposed to have same price in all plants.. but it might not be the case always. So i wanted to arrange MATERIAL on left and have PLANT on TOP and then Price in middle so that i can do a simple comparision formula and then filter wherever it is false. How can i get to arrange data in that required way from what i have...

In data cleansing and db compare analysis, Levenstein functions are very powerful. Saw them being used by a fine colleague of mine and I was impressed.

In short levenstein calculates the distance between 2 strings, (can be a percent as well). If you would compare names "Chandoo" with "Chandeep", Levenstein tells you that in 3 moves you can standardize both names: replace "o" by "e" twice and add a "p". The percent match is 62,5%. A third one even looks at positions of strings, allowing you to compare 'INC, Google" with "Google Inc". Really handy to compare street names in different databases.

Really wishing I knew those in my days of business analyst.

So these functions are available on the i-net in XL-VBA. Challenge to make these with regular excel formulas. 🙂

Try doing a Shopping basket analysis using Solver

Visualizing outlier points in large data sets using Excel scatter charts. It will not let you do it if you have more than 250 points.

@Antonio

Those limits were lifted I think when Excel 2007 was introduced

Scatter and Line charts can now chart Thousands of points and is limited by your available memory

The number of series is still limited to 255 series per chart.

Refer :

http://peltiertech.com/chart-point-limits-in-excel-2010/

&

https://support.office.com/en-us/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

Sir,

I want to use REPT Function to show 5 star rating.. Not able to show star which is available in shape i.e. 5 Point Star.. Can you please help me..

Thanks and Regards,

Kundanlal

You can just use Wingdings 2 Font characters.

For Frecnh users, it would be ê which appears as a star with the Wingdings 2 Font

Hi, I had a list of names in excel. I had partitioned these names into three different categories/headers in the same excel sheet. I was trying to figure out which names occur multiple times(the same name can be present in two or three categories but won't occur more than once in the same categor) and if they occur twice, I would colour the row and if they occur thrice, then I would give them a different colour. Plus, if they occur in 1st and 2nd category then a colour(eg red) to denote that and if 2nd and 3rd category then blue nd 1st and 3rd category then green.

Hi, my problem is with transferring visual text data to useful data.

To explain further I have to prepare a ghant like overview, where the colomns represent the weeks and rows represent the segments. By merging the cells (I know, ew) the duration of the project is indicated and then project description is input into this merged monstrosity. Engage cell wrap and hey presto, everyone retypes stuff out of this file, because there is no better way to pull content from this merged cells.

I'd like a macro that would analyze a (lengthy) list of keywords that are in one column than in a nearby column populate a word repetition list from most used to least used. The macro should be adjustable to accommodate various types of URL's or a simply list of words. In other words, think a word cloud in list fom.

I'd like a macro to perform three functions:

1. Analyze a list of various types of URL's or simply list of words to understand repetition;

2. Than in a nearby column list from most used words to least used words;

3. Next to the list of words the count of occurrences.

The visual portion of the word cloud wouldn't be needed. I'm not very technical but I understand that a VB macro can consider multiple tasks than produce a result but Excel filters only perform one task at a time.

I've searched and searched for this but I don't have an answer.

I have data that looks like this:

Client

Hey all,

I have a tracker with a master tab which has all historical agreements between a pharma company and its many hospitals (for clinical trials). Columns include: Contract signature date, Hospital Name, Protocol Number and 4/5 others.

New, updated data for ongoing agreements need to be added to the master tab. This new data comes from a report that has different column names for the same fields, e.g. Hospital name = Trust Name, and Protocol Number = Study Number.

I wanted a tracker that when the new data is exported from the report and moved to a new tab in the same workbook as "master" tab, it will do two things: 1) automatically extract the new data from the correct columns in the fresh data tab and put them in the master data tab - ie line up data of "Protocol Number" and "study Number", and 2) add the new data to the existing historical data, at the bottom of the data block, ie do not overwrite the old data, just add it to the existing stuff.

For my sins, I have not been able to achieve this after many attempts.

🙁

Let me know if you can help! 🙂

Hi Chandoo & all..

I would like to create a trending line chart that has the feature of highlighting the line on weekly basis... almost similar to the charts mentioned in http://chandoo.org/wp/2012/12/12/highlight-best-week-month-in-charts/

Here i am panning to do is to select the week from drop down and the line will be highlighted..

Thanks and Regards

UNK

I'm trying to find the overlap down time of machines in a factory. To make things simple for myself, I sort related columns by date and time. Then based on user request, rows are compared with the next row and only those rows that overlap are highligted. I faced a problem that one row can overlap with more that one other row. As a solution, I added columns to find and calculate the overlap time which worked fine UNTIL a new row is added and I started to get "Inconsistent column formula".

to try it:

stand on the last cell of the last row ..

Press TAB to move to the row which will be automatically added..

notice the green error icons start to appear on some cell .. like P458, Q458, T457, T458...etc..

sample sheet is here:

http://www.mediafire.com/download/odtcjh3si1d33db/record.xlsx

Thanks for the help