Say you want to combine multiple Excel files, but there is a twist. Each file has few tabs (worksheets) and you want to combine like for like, ie , all Sheet1s to one dataset, all Sheet2s to another dataset…
To make matters interesting each sheet has a different format.
Of course Power Query to the rescue.
This is an advanced example of Power Query. If you are a beginner, start with these pages.
Combine multiple Excel files – the problem
Imagine you work in Finance. Your job involves paying employees for their business travel expenses. Every time someone goes on a business trip, they submit a trip expense report. This is an Excel template with two tabs.
- Travel details tab: for gather personal and travel details
- Expense details tab: for itemized expense details
As you have a lot of employees, you don’t want to manually scan the files and combine the data. Here is a sample of how these files look.
You want to combine all the expense files in to one big, consolidated & refreshable travel expense workbook.
Using Power Query to combine files
Some of you may already know Power Query’s “Get data from Folder” feature. This helps us easily get & combine multiple excel files in a folder. Unfortunately, this alone will not be helpful for us as our file has two different tabs and we need to combine them separately 😉
Here is the process we need to follow.
Start by placing all the expense reports in to one folder. This can be a folder on your computer or on a network / shared drive.
Now go to “Get Data > From File > Folder”
Point to the folder path and Power Query will show all the files in that folder.
Once satisfied with the list of files (don’t worry if you need to exclude some files, you can do that while editing the query by applying filters), click on “Combine & Edit”.
Now you will get another screen asking you choose which tabs / tables you want to bring. As we have two sets of consolidations, let’s start with the first one – travel details tab. Select that and proceed.
At this point, Power Query will create a folder called “Transform sample” and places a few things in it. PQ will also create a query for all the merged data. This is how your Power Query window could look.
Editing the Transform sample query
As you can see, the default combined query data can be useless for our situation. So let’s proceed by editing “Transform sample file from reports” query.
What is Transform sample really?
In this sample query, you can make any changes and PQ will apply them to all the files in the folder before combining them to one gain data set.
Steps to turn travel details to a table
Our travel details sample needs to become one row table so that we can effectively merge multiple files. To do so, follow these steps:
- Remove blank / heading rows on the top.
- Remove any nulls or unnecessary rows from column 1
- Transpose the table
- Promote first row to headers
This is how the output would look after the process.
Combine all files
Now that we have edited transform sample, time to go back to the “reports” query to see the output. If you are happy with it, rename the query and load it in to Excel (or Power BI).
Combined travel details
Combining expense details
The process is same for expense details consolidation. Start by creating a fresh “from folder” query. As expense details are in a table, there is no need to do any additional changes to the transform sample. Simply combine everything from “expenses” tables and you are done.
Combined expense details
Download sample files to practice this
Power Query can be tricky to explain with blog posts alone. That is why I made few sample files and consolidated workbook. Click here to download everything.
Try to merge the files in “reports” folder using your own logic / transformation steps. Share your story / tips in the comments.
I get an error when merging data from files
There are many reasons why Power Query may show an error when connecting to a folder. Here is a check list to help you.
- Make sure the folder path is valid and accessible. If you created the query on one computer and try to refresh it from another, chances are it won’t work. Use shared network drives or change path in Power Query steps before refreshing.
- Files are loaded, but merged query errors. This can happen if you edited the transform sample. Usually Power Query adds “Changed type” steps automatically after you do something. These changed type steps refer to column names in the query and change data types. If you edit the transform sample and alter the column structure of table, then the query will fail. The solution? Simple, delete all the automatically added changed type steps.
- Some files should not be loaded, but they load and mess up the results. Before making any transformations, set up filters based on file type or names. This way you can prevent loading unnecessary files.
Do you merge / combine files with Power Query?
I do this all the time. My recent win was to merge 24 PDF credit card statements (2 types of cards over last 12 months) to one big table of data so that I can see trends and find out where I spend most.
What is your experience with combine multiple Excel files / folder query feature? What are some of your favorite tricks with this? Please post them in the comments section.
This article is inspired from a comment by Sourav.