Introducing Data from Hell:
Watch out, its data from hell. In this new video series, we are going to look at some nutty, frustrating and fun data reshaping challenges and solve them using Excel. We will use Power Query, Formulas, VBA or other features as needed to free this data from damnation.
For our first installment, let’s reshape unevenly spaced list of values to a table.
Unevenly Spaced List -> Tabular format
Let’s say you have a list with some values. They are separated by blank cells. You want to convert this to a table, so related values are on same row. But the table can have any number of columns depending on how big the chunks are.
Something like this:
Enter Power Query (Get & Transform Data)
You guessed right. This data is too much out of whack for formula based solution. So let’s reshape it using Power Query aka Get & Transform Data.
The process goes like this:
- Load the data in to query editor
- Replace nulls (blank cells) with something that is not part of your data, like %^%
- Using Text.Combine, concatenate all values to a single value, separated by some symbol that is not part of data, like =
- Convert this single value to a table and split by the symbol used in step 2.
- Calculate number of = symbols in each row using Text.Length and Text.Replace functions in to a new column
- Figure out the maximum of this new column as a variable. Name it as MaxCols
- Remove the extra column created in step 5
- Split the first column by delimiter set in step 3 with number of columns set to MaxCols.
- Load data back to Excel.
Data from Hell – Convert unevenly spaced list to table – Power Query tutorial
As this process requires a bit of tinkering with Power Query functions and a few false starts, I made a tutorial. I woke at 4AM to record this. Just kidding, I woke up because my nose was all blocked up and couldn’t go back to sleep. So I opened up Excel for some play and recorded this.  Check it out below.
You can also watch this video on our YouTube Channel.
Download Example Workbook
Click here to download the example workbook for this Power Query tutorial. Try reshaping the data yourself first, you will learn a lot about PQ and how to use it for your advantage.
More Power Query Recipes for you
Power Query (or Get & Transform Data as it is called in newer versions of Excel) is an incredibly powerful tool to extract, reshape and merge your data sets. Check out below tutorials and recipes for more.
24 Responses to “Convert unevenly spaced list to table [Data from Hell]”
Thanks for the tutorial Chandoo!
Thanks for sharing Chandoo. Never knew about #shared use as source to list M functions.
It's very useful tool as PQ is function driven language, and makes looking up function syntax so much easier than using Web search every time. 🙂
Get well soon, Chandoo.
When I'm feeling ill, the last thing that comes to my mind is: "I'm firing up Excel and play for a while."
You are a bit of a weird dude! 🙂
Thanks for starting to share some fine PQ wizardry.
Below is an alternative solution. It takes a set-based approach by separating each group of cells into their own table. This avoids the string manipulation and allows for more flexibility if you want to take the data in different direction:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "IndexFillDown", each if [Index] = 0 then [Index] else if [Data] = null then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"IndexFillDown"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"IndexFillDown"}, {{"SeparatedData", each Table.FromList(List.RemoveNulls(_[Data])), type table}}),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "Transpose", each Table.Transpose([SeparatedData])),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom1",{"Transpose"}),
#"List of All Columns" = List.Distinct(List.Combine(List.Transform(#"Removed Other Columns"[Transpose], each Table.ColumnNames(_)))), // This step dynamically gathers all column names, e.g. column1..columnN, so the next step is not hard-coded.
#"Expanded Transpose" = Table.ExpandTableColumn(#"Removed Other Columns", "Transpose", #"List of All Columns",#"List of All Columns")
in
#"Expanded Transpose"
Neat trick. I didn't know about Table.ColumnNames(_) thingie. Will certainly use it next time I need to keep the column names dynamic.
Nice Simon 🙂
Instead of one blank row, imagine there are uneven blank rows e.g. sometimes 2 or 3 or 4 blank rows. We would need to add one more step of removing blank rows. Rest seems to be great.
[…] https://chandoo.org/wp/2017/08/30/d4h-unevenly-spaced-pq/ […]
I have no idea about power query however i love the new tab in excel 2016 tried to learn but there is less documentation available about power query and power pivot.
Sub Test()
Dim I As Long, LRow As Long, Lcol As Long
Lcol = 4
LRow = 4
With Sheet1
For I = 4 To .Cells(Rows.Count, "B").End(xlUp).Row
If Not VBA.IsEmpty(.Cells(I, 2)) Then
.Cells(LRow, Lcol) = .Cells(I, 2)
Lcol = Lcol + 1
End If
If VBA.IsEmpty(.Cells(I, 2)) Then
LRow = LRow + 1
Lcol = 4
End If
Next I
End With
End Sub
You could also add another column to the right of the data labelled "<table><td>" then fill down the formula:
=IF([@Data]="","<tr><td>","<td>")
The result should be as below (note closing tags are not needed.) Copy these two columns to notepad then copy and paste to another cell on the sheet.
Data <table><td>
Arif <td>
Lala <td>
<tr><td>
Donny <td>
Kay <td>
Chou <td>
<tr><td>
Marry <td>
John <td>
Smith <td>
something <td>
<tr><td>
Harry <td>
Matthew <td>
Rery <td>
Brave <td>
Let's <td>
<tr><td>
Sea <td>
Earth <td>
Ace <td>
Few <td>
More <td>
things <td>
<tr><td>
at <td>
the <td>
end <td>
That was an awesome Workaround.
Thanks for sharing.
The above data didn't come out as clearly as hoped. For another example of this method (with data formatting) see:
https://chandoo.org/forum/threads/excel-function-reference.35686/
Amazing trick. We could use the idea in PQ as well, by first constructing the table code using Text.Combine() and then loading that as Web.Content().
Thanks for sharing this. I prefer this method and would be using it a lot.
Checkout this solution:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Demoted Headers" = Table.DemoteHeaders(Source),
#"Replaced Value" = Table.ReplaceValue(#"Demoted Headers","Data",null,Replacer.ReplaceValue,{"Column1"}),
#"Added Index" = Table.AddIndexColumn(#"Replaced Value", "Index", 1, 1),
#"Added Conditional Column" = Table.AddColumn(#"Added Index", "Custom", each if [Index] = 1 then "1" else if [Column1] = null then [Index] else null ),
#"Filled Down" = Table.FillDown(#"Added Conditional Column",{"Custom"}),
#"Changed Type" = Table.TransformColumnTypes(#"Filled Down",{{"Custom", Int64.Type}}),
#"Inserted Subtraction" = Table.AddColumn(#"Changed Type", "Inserted Subtraction", each [Index] - [Custom], type number),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Subtraction",{"Index"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Column1] null)),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Filtered Rows", {{"Inserted Subtraction", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Filtered Rows", {{"Inserted Subtraction", type text}}, "en-US")[#"Inserted Subtraction"]), "Inserted Subtraction", "Column1"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Custom"})
in
#"Removed Columns1"
Also checkout video on my youtube channel.
There's an error in your script, #"Filtered Rows" should read:
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Column1] null)),
Chandoo's comment tool is filtering out our code. There should be a "not equal" sign between "[Column1]" and "null".
Hmm, interesting, the "" ("not equal") sign is getting filtered out of our code when we submit our comments. Chandoo - just a heads up, this seems like a bug.
Imagine a new data set like this
name app date
may 1-Jan-16
Abhay 7-Jan-16
justin 1-Jan-16
john 9-Jan-16
hugh 4-Jan-16
Abhay 5-Jan-16
will 1-Jan-16
justin 1-Feb-16
jackie 5-Jan-16
jackie 7-Jan-16
john 1-Jan-16
Abhay 4-Jan-16
Abhay 1-Jan-16
may 3-Jan-16
john 2-Jan-16
No blank rows in between. The final output will be similar to above case study like this...
name 1 2 3 4
Abhay 1/4/2016 1/1/2016 1/5/2016 1/7/2016
hugh 1/4/2016
jackie 1/5/2016 1/7/2016
john 1/1/2016 1/2/2016 1/9/2016
justin 1/1/2016 2/1/2016
may 1/1/2016 1/3/2016
will 1/1/2016
Seems to be one of the challenging example for power query
See below for a solution:
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"Count", each Text.Combine(List.Transform([Date], each Text.From(_)), "|"), type text}}),
MaxColumns = List.Accumulate(#"Grouped Rows"[Count], 0, (state, current) => let count = List.Count(Text.Split(current, "|")) in if count > state then count else state),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Count", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), MaxColumns)
in
#"Split Column by Delimiter"
Here's another solution completely from left field. I decided to challenge myself and do the whole thing using a variety of List functions. Enjoy!
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content][Data],
Transform = List.Transform(Source, each _ & "|" ),
ReplaceValue = List.ReplaceValue(Transform, null, "#(lf)", Replacer.ReplaceValue),
Combine = Text.Combine(ReplaceValue),
Lines = Lines.FromText(Combine),
TransformAgain = List.Transform(Lines, each try Text.Start(_,Text.Length(_)-1) otherwise ""), // "try" catches nulls
MaxColumns = List.Accumulate(TransformAgain, 0, (state, current) => let count = List.Count(Text.Split(current, "|")) in if count > state then count else state),
#"Converted to Table" = Table.FromList(TransformAgain, Splitter.SplitTextByDelimiter("|"), MaxColumns)
in
#"Converted to Table"
very cool solution 🙂
I have been working into Excel 2007. How can I use it?
I am very late but wanted to post my solution 🙂
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ProgCol = Table.AddColumn(Source,
"Prog",
each if
[Data] = null
then
null
else
"Data",
type text
),
Grp = Table.Group(ProgCol,
{"Prog"},
{{"Tbl", each Text.Combine([Data],"|"), type text},
{"ItemsCount", each List.Count([Data]), Int64.Type}
},
GroupKind.Local),
SplitCol = Table.SplitColumn(Grp,
"Tbl",
Splitter.SplitTextByDelimiter("|"),
List.Max(Grp[ItemsCount])
),
SelRows = Table.RemoveColumns(Table.SelectRows(SplitCol, each [Prog] null),{"Prog","ItemsCount"})
in
SelRows