• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Combining files using a list that has file names, and column names (from the same sheet)

jtrt1962

New Member
Hi

Each file has multiple columns in Sheet("Data1") with a different column name for each. The number of columns in each file can be different. There is one common column in each file named "Date". I want to combine the files by using the Date as the join column. Ideally this can be dynamic if the file names and column names are included in a list as not all columns from each file are required. In the Result file columns Orange & Green are excluded.

Thank you in advance.
 

Attachments

  • Source1.xlsx
    8.9 KB · Views: 3
  • Source2.xlsx
    9 KB · Views: 4
  • Result.xlsx
    9 KB · Views: 6
Hi,

Try this solution

Instructions
  1. Create a folder, e.g., C:\Test\zz\
  2. Copy all the files you want to join into that folder.
  3. Open the included file "List.xlsx".
  4. Update the path in cell A2.
  5. Update the list of files and their column headers. The list table starts at cell A4.
  6. Right-click the green table at cell N4 and select Refresh.
And that's it — all the files are now merged.

If you like it, don't forget to click the thumbs up
 

Attachments

  • List.xlsx
    20.6 KB · Views: 3
Hi Frank

Wow, so easy, though I am getting an error: '[Expression.Eooror] The column 'Date' of the table wasn't found.' on the Merged line of the query. The Date column is in both source files. I also tried it with some real data.

Thanks
John
 

Attachments

  • 8731001.xlsx
    19.4 KB · Views: 1
  • List.xlsx
    21 KB · Views: 1
  • 8731002.xlsx
    17.5 KB · Views: 1
Hi,

Most likely due to the testing I was doing, I forgot that the 'Date' column also needs to be included.
After all, we're performing a column filtering, and 'Date' is one of those columns as well.

Please give it another try.

If you like it, don't forget to click the thumbs up

Regards
 

Attachments

  • List.xlsx
    23 KB · Views: 4
Hi Frank

Thanks, that works perfectly.

Can I ask one last question? There needs to be a 'Remove Top 9 rows' and a change of the Date column to Date data type for each source file before the merge query is run. Would I do that first and then run the Merge query?

John
 
Hi,

I need you to send me the files so I can see how the data looks before removing the 9 rows and transforming the Date column to type date.
I'm heading to work now, I'll check it later.

Regards
 
Hi Frank

Great, thanks so much for helping me. I have attached two live files. We are interested only in the Data1 sheet in each file. Also, I assume the merge query will work with more than two files? I should have mentioned that the transformation also needs to keep the bottom 60 rows.

In another scenario, I tried using lists to filter the data, but found it to be very slow compared to hard coding the filters in the query. The data source is a 10 GB csv file. Would that be correct?

Thanks again
John
 

Attachments

  • 8731001.xlsx
    105.8 KB · Views: 1
  • 8731002.xlsx
    104.6 KB · Views: 1
Last edited:
Hi,

I'm sending the Excel file back to you. Now remove the top rows up to the header and work only with the "Data" sheet. Rename the first column from "Series ID" to "Date" and change its data type to date.

There shouldn’t be any issues with 60 rows or more. If there’s an error, it’s probably because one of the files doesn’t have the expected format or some column contains unexpected data.

Regarding the CSV files, if you want speed, you need to transform the data into a format that gives you that speed—in other words, convert the CSVs into databases, and filters will most likely be much faster, since those formats are designed to be queried and return results quickly.

Best regards.
 

Attachments

  • List_v2.xlsx
    42.5 KB · Views: 1
Back
Top