Say you want to combine multiple Excel files, but there is a twist. Each file has few tabs (worksheets) and you want to combine like for like, ie , all Sheet1s to one dataset, all Sheet2s to another dataset…
To make matters interesting each sheet has a different format.
What now?
Of course Power Query to the rescue.
This is an advanced example of Power Query. If you are a beginner, start with these pages.
Combine multiple Excel files – the problem
Imagine you work in Finance. Your job involves paying employees for their business travel expenses. Every time someone goes on a business trip, they submit a trip expense report. This is an Excel template with two tabs.
- Travel details tab: for gather personal and travel details
- Expense details tab: for itemized expense details
As you have a lot of employees, you don’t want to manually scan the files and combine the data. Here is a sample of how these files look.
You want to combine all the expense files in to one big, consolidated & refreshable travel expense workbook.
Using Power Query to combine files
Some of you may already know Power Query’s “Get data from Folder” feature. This helps us easily get & combine multiple excel files in a folder. Unfortunately, this alone will not be helpful for us as our file has two different tabs and we need to combine them separately 😉
Here is the process we need to follow.
Start by placing all the expense reports in to one folder. This can be a folder on your computer or on a network / shared drive.
Now go to “Get Data > From File > Folder”
Point to the folder path and Power Query will show all the files in that folder.
Once satisfied with the list of files (don’t worry if you need to exclude some files, you can do that while editing the query by applying filters), click on “Combine & Edit”.
Now you will get another screen asking you choose which tabs / tables you want to bring. As we have two sets of consolidations, let’s start with the first one – travel details tab. Select that and proceed.
At this point, Power Query will create a folder called “Transform sample” and places a few things in it. PQ will also create a query for all the merged data. This is how your Power Query window could look.
Editing the Transform sample query
As you can see, the default combined query data can be useless for our situation. So let’s proceed by editing “Transform sample file from reports” query.
What is Transform sample really?
In this sample query, you can make any changes and PQ will apply them to all the files in the folder before combining them to one gain data set.
Steps to turn travel details to a table
Our travel details sample needs to become one row table so that we can effectively merge multiple files. To do so, follow these steps:
- Remove blank / heading rows on the top.
- Remove any nulls or unnecessary rows from column 1
- Transpose the table
- Promote first row to headers
This is how the output would look after the process.
Combine all files
Now that we have edited transform sample, time to go back to the “reports” query to see the output. If you are happy with it, rename the query and load it in to Excel (or Power BI).
Combined travel details
Combining expense details
The process is same for expense details consolidation. Start by creating a fresh “from folder” query. As expense details are in a table, there is no need to do any additional changes to the transform sample. Simply combine everything from “expenses” tables and you are done.
Combined expense details
Download sample files to practice this
Power Query can be tricky to explain with blog posts alone. That is why I made few sample files and consolidated workbook. Click here to download everything.
Try to merge the files in “reports” folder using your own logic / transformation steps. Share your story / tips in the comments.
I get an error when merging data from files
There are many reasons why Power Query may show an error when connecting to a folder. Here is a check list to help you.
- Make sure the folder path is valid and accessible. If you created the query on one computer and try to refresh it from another, chances are it won’t work. Use shared network drives or change path in Power Query steps before refreshing.
- Files are loaded, but merged query errors. This can happen if you edited the transform sample. Usually Power Query adds “Changed type” steps automatically after you do something. These changed type steps refer to column names in the query and change data types. If you edit the transform sample and alter the column structure of table, then the query will fail. The solution? Simple, delete all the automatically added changed type steps.
- Some files should not be loaded, but they load and mess up the results. Before making any transformations, set up filters based on file type or names. This way you can prevent loading unnecessary files.
Do you merge / combine files with Power Query?
I do this all the time. My recent win was to merge 24 PDF credit card statements (2 types of cards over last 12 months) to one big table of data so that I can see trends and find out where I spend most.
What is your experience with combine multiple Excel files / folder query feature? What are some of your favorite tricks with this? Please post them in the comments section.
This article is inspired from a comment by Sourav.
21 Responses to “Combine multiple Excel files using Power Query [Full example + download]”
Chandoo- can you share your solution for combining 24 PDF documents as mentioned above.
Do you merge / combine files with Power Query?
I do this all the time. My recent win was to merge 24 PDF credit card statements (2 types of cards over last 12 months) to one big table of data so that I can see trends and find out where I spend most.
I have a similar problem that I am solving for. Thanks
Hmm, I will need to find some generic PDF dataset as I don't want to share my credit card statements online 😀
But here is the process, if you want to try.
1) You need to use Power BI Power Query (as Excel PQ doesn't yet support PDF import)
2) Place all PDFs in a folder and connect with "From Folder" query
3) PQ will detect structured tables in your PDFs. Select the correct one.
4) Edit transform sample so you can change the results or multi-select tables and append them to one big table.
Once the final query has what you need, just select entire table in PQ, press CTRL+C and paste it in Excel for further analysis or leave in PBI for visualizing.
Give it a try
Is there a MAcro Language that can launch the Power Query?
Hi Chandoo, My problem with Power Query is that my individual sheets are password protected and hence will not append. How can you solve this?
Great tutorial!
I managed to upload the files, but if I move them - I can't make it work after changing the source. Any specific steps to follow?
Not sure I understand the problem. If you move the folder location, you need to go to "Source" step and change it. Then refresh the query.
This is a very useful post. However what I am trying to do something similar but unfortunately all my files are in different locations, not a single folder. I can produce a table with a list of all the file paths, how could I use this to pull in all the relevant spreadsheets?
my data have alternative blank rows but the power query did not import all the rows.
Hi Chandoo.
Thank you for great website.
Wondering power query can get data from works books located in diffrent folders and combine them together.
I cannot copy them over to the same folder
Regards
Ahad
I have a Data Model created using the folder query method. Worked great to create the initial model. Now, every day a new file is added to the folder. How do I update the data model with just the new file (takes too much time to update all files every time).
This can't be a unique question if people are using Power Query on folders, but can't find this online.
Thanks
Hi Dan...
This process is called "incremental refresh" and it is not available in Excel as yet. It is part of Power BI premium functionality. One way to mimic this is...
Split your query in to two - old files and new files.
Set up such that all the old files are in old files folder. Build your queries.
Now, use Append queries to combine both sets of data.
Add files to new files folder every day and refresh that query alone.
Once a month, move all new files to old files folder and refresh all queries.
I know it is tedious, but I can't think of another way to apply this idea.
@Dan
Why not have an import Worksheet
And have some code that saves todays values as values at the base of a Data storage Worksheet
Hi, How did you get the 'From Folder' function in Excel Power Query? I do not see this as an option. This example looks like Power BI, not Power Query. Right?
Sorry i was not able to get this to work at all. From the very beginning I could not get the 2 files I had to show up to be combined- All that show up was only the first tab. Not sure if you have a visual step by step approach that might be easier to follow.
Cual es mejor, PowerQuery vs PowerPivot??
I’m trying to combine folder items which have different page ranges - but the exact same column headers - depending on the file. Any thoughts on how to set that up?
Hi Chandoo,
I need to combine some files which are saved in different folders on the network. I was wondering if you can suggest a solution for that. Files will have almost similar structure howevery I cannot move them in the same location.
Appreciate your help.
Kind regards,
ND
How many excel files in a folder can power query combine into one dataset.
I am practicing the given worksheets example that you have provided.
At some point, under applied steps PQ initiate "promoted headers" automatically, If I remove "promoted headers", the "reports" from query pane gets an error. It says
Expression.Error: The column 'ACME Inc. Employee Travel Expense Report' of the table wasn't found.
Details:
ACME Inc. Employee Travel Expense Report
I'd love to practice this example, but I can't get very far. I think the problem stems from the very beginning before any steps or transformations have taken place. I can see on your "Combine Files" photo, which shows the sample "travel details" file, that you have two columns (Column1 and Column2), with Acme Inc Employee in row 1.
I don't have that. My sample "travel details" file shows Acme Inc Employee as the column 1 header; the column 2 header is Column2. As soon as I move on to any step that removes Acme Inc Employee as the column 1 header, I get this error message:
Expression.Error: The column 'ACME Inc. Employee Travel Expense Report' of the table wasn't found.
I don't see that type of error addressed in your error list. What would be the solution to this error? Thx.
Hello Chando,
I successfully uploaded my multiple Excel to power query, however, I selected the combine and edit option, and then I clicked on the double down arrow to combine the file. but I can't get the combined file yet. please help me once.