• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Help Automate Raw Data Processing

Hello,

I have a large amount of raw data coming off a instrument. The workbook consists multiple worksheets of data with Date, Sample ID, Time Header (seconds) and Signal (Columns A:DE).

I then need to apply the same basic calculations (mean, standard deviation, median, min, max, etc) on the first 7 seconds, last 15 seconds of data and the rest of the data in between this range. But the number of columns and rows for a specific sample id will vary for every experiment.

How do I reduce the workload of manually trying to do this with formulas that will not copy over from one previous file to the other. Is this a pivot table or macro solution?

Help please :)

Best

Frank (I am overloaded with data!!)
 

Attachments

  • RawDataTemplate.xlsx
    149.9 KB · Views: 1
A few questions (Please answer them all)

Q1. Should DK4: =MEDIAN(C4:C4)
be: =MEDIAN(C4 : P4)

Q2. Should EB4: =AVERAGE(AF4:CA4)
be: =AVERAGE(Q4:CA4)

I realise that you want to automate the column selection, but just want to make sure my logic is right first ?

Q3. You want the statistics for the following periods
1. 0-7 Seconds
2. 7-45 Seconds
3. 45-60 seconds

or is 45 the maximum time minus 15 seconds as your column coloring suggests ?
 
You may wish to have a look at Tables 1-3
The Yellow cells you may need to adjust
I think all the logic is right based on my comments above
 

Attachments

  • RawDataTemplate_Hui.xlsx
    156.8 KB · Views: 0
Hi,

You are amazing. Thank you so much for your help!!!

You are correct and the range you selected were the areas that I need to calculate.

If I have 1000 line of similar data but of variable times. How do I automatically apply these formulas to the entire worksheet?

I need to apply similar calculations to the normalized data in Table 4. Are these array formula or can I can just copy and paste to Table 5?

Again, thank you so much, I have been drowning in excel numbers with no quick way to summarize a huge amount of numbers.

Best,

Francis.

A few questions (Please answer them all)

Q1. Should DK4: =MEDIAN(C4:C4)
be: =MEDIAN(C4 : P4)

Q2. Should EB4: =AVERAGE(AF4:CA4)
be: =AVERAGE(Q4:CA4)

I realise that you want to automate the column selection, but just want to make sure my logic is right first ?

Q3. You want the statistics for the following periods
1. 0-7 Seconds
2. 7-45 Seconds
3. 45-60 seconds

or is 45 the maximum time minus 15 seconds as your column coloring suggests ?
 
Hi,

Here is an example of a large spreadsheet with the length of time for every line of data in the TimeMatrix spreadsheet. I will have 1000's of rows with variable Times (from 60seconds to 360 seconds in this particular example). I need to be paste the formula and quickly drag them to bottom of the spreadsheet for 20 worksheets.

https://www.dropbox.com/s/tmw03fgeqccmvow/RawDataTemplatev2.xlsx?dl=0

So manual editing of the ranges for each spreadsheet will be really time consuming. What do you thing would work?

Thank you so much for your help so far and I am nearly close to my goal of getting all this data processed.

Best,

Francis


Copy the bottom row of formula down
 
Francis

That link isn't working?

I would change using the ranges of C4:AP4 to something like Index(Data,Row()-Row($A$3))
Where data is a Named formula referencing your data

I have time tomorrow to look at this so a new link will be great
 
Hi,

Here is an update link to the latest version of the spreadsheet.

https://www.dropbox.com/s/yds9zqvanqkr3ys/20150514_DataTemplate.xlsx?dl=0

The raw data is located at C2 and the calculation that I am trying to carry out calculations at SA2. I included a new spreadsheet called timematrix which has the scan times for every line of sample data. That way the variable time for every scan is solved. I need to be able to transfer all the formulas to 19 worksheets in the worksheet. Once that is done I need to be able to filter the data in every worksheet according to Line Name.

Best,

Francis


Francis

That link isn't working?

I would change using the ranges of C4:AP4 to something like Index(Data,Row()-Row($A$3))
Where data is a Named formula referencing your data

I have time tomorrow to look at this so a new link will be great
 
Hi,

I am including a typical example of the raw data which I loaded into a powerpivot table (it is 92mb in size!). I was hoping maybe powerpivot could be used to apply your calculation formulas in a more efficient way. I then wanted to create summary pivot tables based on specific questions. For example What is the average of the background and signal for the first 7 seconds of data for each Sample A line and for all the Sample A lines combined?

https://www.dropbox.com/s/bjzyxf2mq8uv8zv/PivotTableForum.xlsx?dl=0

Best,

Francis


Hi,

Here is an update link to the latest version of the spreadsheet.

https://www.dropbox.com/s/yds9zqvanqkr3ys/20150514_DataTemplate.xlsx?dl=0

The raw data is located at C2 and the calculation that I am trying to carry out calculations at SA2. I included a new spreadsheet called timematrix which has the scan times for every line of sample data. That way the variable time for every scan is solved. I need to be able to transfer all the formulas to 19 worksheets in the worksheet. Once that is done I need to be able to filter the data in every worksheet according to Line Name.

Best,

Francis
 
Can I delete the Pivot Table and TimeMatrix worksheets and just work on the raw data?
 
When you say Sample A lines combined, do you mean ?

A. Sum the data then apply the functions
B. Average the data then apply the functions
C. Combine the summary data by Average or Sum?
 
In Table 1 Raw Data. SampleA_Line 1 is a measurement of signal intensity versus time. Each column represents 0.5 seconds intervals and the time numbers for that particular line are in the time matrix sheet.

We need the "TimeMatrix" sheet as it contains the time data for the experiment and the can vary in length depending on the sample analysed.
The sheet has calculations (see cell reference SA1) to define the specific regions of the raw data to analyse.

In the "RawData" sheet We then apply those time range rules to generate Table 2 Background (First 7seconds) ,Table 3 Background (Last 15 seconds) and Table 4 Signal (>7 and <15 of the max time) for every line of raw data (See cell reference RY2) .

I then need to apply calculations to all of the SampleA_line1-x in the dataset (average, min, max, standard deviation, median percentile) for each column of data (e.g. columns c to de) and place those results in C500 to C505.This process will be repeated for Sample B, Sample C etc. These calculations will allow me to chart the average signal versus time for the sample.

You can delete the pivotsheet but I loaded the 20 other raw data sheets into a powerpivot part of the file using excel 2013. The trick will be applying these calculations across all the worksheets and not have excel choke on his massive amount of numbers and formulas.

Best,
Francis




Can I delete the Pivot Table and TimeMatrix worksheets and just work on the raw data?
 
The calcs in the Time Matrix sheet simply calculate the Max Time, Max Time minus 15 secs and 7 secs for each Row
Why couldn't they stay on the data sheet with all the other calcs?
 
So the instrument data creates a spreadsheet with all the times for every line of data in the "Timematrix" sheet. Some sample id will be 50 seconds while others could be 200 seconds in length. I am using this sheet to calculate the max, middle range values and I pull those numbers into the raw data calculations worksheet (see cell SA1). I manually created time row in the raw data sheet to have the largest time range in the experiment but doing this every time would be a pain.

I will have to apply these calculation's to multiple experiments with 20 worksheets of data so avoiding manual copying of formulas if possible is key.

Best,

Francis

In Table 1 Raw Data. SampleA_Line 1 is a measurement of signal intensity versus time. Each column represents 0.5 seconds intervals and the time numbers for that particular line are in the time matrix sheet.

We need the "TimeMatrix" sheet as it contains the time data for the experiment and the can vary in length depending on the sample analysed.
The sheet has calculations (see cell reference SA1) to define the specific regions of the raw data to analyse.

In the "RawData" sheet We then apply those time range rules to generate Table 2 Background (First 7seconds) ,Table 3 Background (Last 15 seconds) and Table 4 Signal (>7 and <15 of the max time) for every line of raw data (See cell reference RY2) .

I then need to apply calculations to all of the SampleA_line1-x in the dataset (average, min, max, standard deviation, median percentile) for each column of data (e.g. columns c to de) and place those results in C500 to C505.This process will be repeated for Sample B, Sample C etc. These calculations will allow me to chart the average signal versus time for the sample.

You can delete the pivotsheet but I loaded the 20 other raw data sheets into a powerpivot part of the file using excel 2013. The trick will be applying these calculations across all the worksheets and not have excel choke on his massive amount of numbers and formulas.

Best,
Francis
 
Back
Top