• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Cleaning up Messy Data

electricmice

New Member
I have a data export that comes in a rather difficult format to work with. I looked at creating a macro and/or formulas to clean it up but could not come up with a good way to get a nice clean data output. I have attached a sheet with the source data (messy) what I thought would be a good clean format (goal format) and then how I was going to use the data within a pivot chart to display it neatly. The goal is to be able to download the messy data and then easily create a chart that displays the rank over time, which can be selected by keyword and engine.


https://www.dropbox.com/s/09ucnc1uxi1qyvs/ranking%20sample.xlsx
 
Hi ,


An interesting problem.


Having downloaded your file , the first 7 columns have the following captions :


Keyword ID , Keyword , Engine , Relative Volume , Date of Interest , URL , Types


After these 7 columns , the last 3 columns viz. Date of Interest , URL , Types are repeated for successive dates.


You have given your Goal Format , but I would have thought rather than have the dates going down one column , you could have them across , so that comparing how one Keyword has moved over time would be easier to chart.


The only numeric value in your data is the Ranking ( I am excluding Relative Volume since it does not repeat over the dates ) , so the only possibilities are to see :


Use Keyword / Engine / Date / URL interactively , to see how any particular combination fares with respect to the other variables.


If you use 3D charts , you can display more data , but the volume is such that your chart will become cluttered.


Is it possible to upload your source file ? Are you at present just opening your source file within Excel ? If so , it may be possible to write a VB procedure to open the source file and read data so that Excel gets populated in the required format. This will depend on the source file format being absolutely fixed.


Narayan
 
I am open to other ideas for the end format, the source file is simply a CSV of the source data tab. ultimately , i do not really care much about the types columns, the keyword ID number or the relative volume. The URL while may be handy to reference later, i am less interested in charting, but it may still be helpful to have in the organized data.


The reason i had the dates going down a column is that i saw it as an easier opportunity to use the pivot chart vs. traditional chart, in terms of using slicers to help narrow down the charted view.


the source file format is fixed with the exception of the number of columns, as sometimes the data will be recorded for 3 or 6 month time frames. the first 7 columns will remain the same as well as the spacing for the information above the main data set.
 
Hi ,


Certainly ; you will find all of us are willing to help !


The problem is one of specification ; if you can spell out exactly what you want , I'll certainly do my best.


Assuming that you want to do more than one thing with your data / workbook , can you list the first job that is to be done ? We'll take it one step at a time.


Narayan
 
Task one, is to graph ranking over time. Task to is to be able to do a lookup and correlate ranking to traffic from another report by keyword.
 
Just took a quick look, is there anyway you can upload the .csv file? Seems odd to have the dates going across instead of down. Since you got a table how you want it. If the newest data will always becoming in the same way, seems relatively straight forward to able to create a macro to import that data and only put the data you want into the clean table.


And from there I would look into using PowerPivot. It's really a great product that leverages the power of cubes without having the user doing any of the work. And since it's using cubes you can do a lot more with it then you can with standard pivot tables.


So...

1) Automatically import new data

2) Automatically only put the newest data in formatted table

3) Refresh Linked Table in PowerPivot so that your pivot table/chart will have all the latest data
 
Here is the sample CSV:

https://www.dropbox.com/s/9wh0tw1yxu1ee4x/sample%20csv.csv


new downloads will have the same keywords, however the dates will change and potentially the number of dates will change. when running the report there is an option for 1 month, 3 months, 6 months, or custom, so it has the potential to vary in column width.
 
Hi ,


Can you see this file ?


http://speedy.sh/Tm9wS/ranking-sample.xlsx


The chart plots the google ranking of one keyword over the period 09/11/2012 to 10/11/2012. Is this anything like what you are looking for ?


Eventually , we should have dropdowns so that the user can decide whether to plot using Keyword ID / Keyword or Engine ; this is the reason for the Calculations sheet , which has the unique distinct values of the Keyword IDs , Keywords and Engines.


Please comment.


Narayan
 
Yes, that is like what I am looking for, I would like to be able to select the keyword and engine. beyond the plot, laying out the data is also important so that I can lookup the keyword in other data sets to pull in additional data.
 
Back
Top