Extract data from PDF to Excel – Step by Step Tutorial

Share

Facebook
Twitter
LinkedIn
get data from pdf to excel

In this tutorial learn how to,

  • Extract tabular data from one PDF to Excel
  • Combine and extract tables from multiple PDFS to Excel
 
We will be using Excel 365 & Power Query to do this. If you have different version of Excel (2016, 2013 or older), read the FAQ section at the end for another way to do this.

How to extract PDF table to Excel

Optional:  If you need a sample PDF to practice these concepts, use the randomly made credit card statements I created. Download them from here.

Step 1: Go to Data ribbon & click on Get Data > File > PDF

From data ribbon, use the PDF option  and point to the location on your computer (or web address).

data from PDF option - power query get data excel

Step 2: Select the table(s) you want in the navigator screen

Power Query will open up a navaigator screen. Just specify the table(s) you want. Refer to below illustration to know more about the navigator screen.

navigator screen for pdf - power query

💡 Bonus tip: Use the composite table if you want to get a data table in your PDF that spans multiple pages. This is excellent for bank or credit card statements.

Step 3: Load or Transform data

If the preview in navigator looks satisfactory, just load it. Otherwise, click on “Transform data” to open query editor to make any final adjustments.

Combine & Extract data from multiple PDFs

Step 0: Place all your PDFs in a folder

Step 1: Folder connection

Instead of PDF option, use the Folder option in the Get Data.

from folder option - get data - power query - excel

 

Step 2: Choose “Combine” in file listing screen

Power Query will show you a screen with a list of all files it found in the folder. Choose any of the combine options here to combine the data from all files to one table.

File listing screen - Power Query - Folder connection option

Step 3: Select the table you want from Transfer Sample Screen

Now, you will see another navigator like screen. Just select the table you want in here. Power Query will go to each file in the folder, get the same table and combine them.

Step 4: Load or Edit the query

And enjoy.

Practice PDF Credit Card Statements

If you need a sample PDF to practice these concepts, use the randomly made credit card statements I created. Download them from here.

Video - Convert PDF to Excel

Still not sure how to extract data tables from PDF to Excel? Watch this short video and get it. See it below or on my YouTube channel.

PDF to Excel - FAQs

I don’t have PDF option in my Excel. What do I do?

You can use free Power BI Desktop to do the same. (Download Power BI for free here)

Once you have Power BI, open it, go to Get Data > PDF and follow the same steps as above tutorial.

Instead of loading the data, copy the entire table from Query Editor and paste it to Excel. See below illustration.

copy entire table - power query in Power BI

I have new files, how do I refresh?

Just place the files in the same folder.

Go to Excel and right click on the extracted table and select “Refresh”. Excel will update the details.

I want to exclude certain files in the folder when combining…

Open the query editor and go to the query that is responsible for your combining PDF process. Go to source step. This will show all the files in the folder. 

Include a filter condition here. Power Query will warn about inserting a step. Proceed and you will be able to exclude files based on conditions.

Examples:

  • Process files that have file name starting with certain letters
  • Files created after certain date
  • Having specific extension.

Remember: Power Query is case sensitive.  

I want to pre-process or clean-up data before loading it into Excel

Open the query editor and add any necessary data transformation steps at the end. 

Examples:

  • Removing all foreign currency transactions from credit card statements
  • Cleaning up account codes
  • Rearranging columns in the PDF data table

For more on what you can do with Power Query, check out this tutorial.

Other questions…

Post a comment and I will try to help you.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

63 Responses to “Custom Chart Axis Formating – Part 2.”

  1. Stephen says:

    Hui, these are cool little tricks. Not one I need today, but well worth remembering for future dashboards

  2. Ed says:

    I recently learned what I thought was a really simple but useful number format. A custom format followed by ;;; will not display 0 values. Example format #,##0.00,,;;; will display 12,570,000 as 12.57 and display 0 as blank. I found that this really helped me reduce some of the clutter on dynamic charts. Thanks for another good article.

  3. Fred says:

    Like! 🙂

  4. Fred says:

    hi Hui,

    Once I have created a custom format, how do I remove/delete it from the list again? I tried a few methods such as right click (no option to remove). I tried hi-lighting the custom format and hit the delete key. Nothing works.

  5. Hui... says:

    @Fred,
    Unlike the Custom Number format dialog for cells they don't have a Delete Button on the Chart Number Formats dialog, Maybe next version?
    .
    If you don't want to use your Custom Format select one of the built in formats.

  6. davidlim says:

    hi chandoo and all,

    great tips on the formatting.

    1 curious answer: Is it possible to highlight Sat/Sun for DATES on x-axis?

    assuming i have 1 month of daily product sales, x-axis = dates, y-axies = sum of sales.

    thanks!

  7. Hui... says:

    @Davidlim
    .
    You have limited options here as you can only use 3 conditional ranges in the [ ] brackets
    So you can do something like
    [Green][<40787]ddd;[Blue][>40788]ddd;[Red]ddd
    This will make:
    Dates earlier than September 2011 Green
    Dates after September 2nd 2011 Blue
    Dates on September 1 or 2nd, 2011 Red
    .
    Otherwise you can use the techniques where you use a Combination chart and color the weekend column a highlight color to emphasize them
    Have a look at: http://chandoo.org/wp/2009/08/26/combo-charts-to-group-times/
    Download the file just below:
    Download this excel combo chart and play with it to learn more

    Select the hidden bars and apply a fill

  8. Fowmy says:

    Great post,

    I would like to know a way to apply custom formatting to the horizontal axis.
    Suppose, I want to highlight F,G & H in Red

  9. Hui... says:

    @Fowmy
    As far as I'm aware it can't be done using Custom Formats
    You can of course use cells lined up under the chart and do the Conditional Formatting in those cells

  10. Donald says:

    @Hui:
    How do I get the number formats to work on a Dynamic Chart.i.e: Chart with different scaling based on different data sources. For example, if I have five KPI and each have a Target, how do I get the chart to dynamically change number format based on the data selected?

  11. Hui... says:

    @Donald
    Have a read of this Forum and my comments and see if that helps you
    http://chandoo.org/forums/topic/making-vlookup-recieve-multiple-formats-of-data

  12. Donald says:

    @Hui: Thanx for the speedy comment, I've checked the link and your last comment is almost what I need but I can't get it right for my application. See below my problems. Data below is displayed on the dynamic graph. The Graph only shows two data lines Target and the actual KPI data. on the data line I won't to highlight the numbers based on the info below relative to the Target line.
    KPI Target GREEN ORANGE RED
    DCR 1 and 1.2
    BSS Setup 99 >99 95> and <99 <95
    TCH BLK 0.5 and 1
    SD BLK 0.5 and 1
    UL_TBF S_Rate 90 >85 85> and <90 85 85> and <90 <85

  13. Hui... says:

    @Donald
    Do you want to email me this file
    I'm struggling to visualise this
    add instructions please

  14. Jonas says:

    I remember seeing a blog post some time ago about the number format colors. The default green color is ugly, and there was some neat trick to change that into more dark green version. I think it had to do with assigning some code instead of [green].

  15. Donald says:

    @Hui: I just forward you a mail now. I've also noticed that the custom only allows two conditions and I struggling to put more custom for same chart. As indicated, the graph has different target format i.e 1% and 95%.

  16. Oleksiy says:

    @Donald: I'm not sure what do you want to get in your case, here is what I've used in my dashboard for different KPI values:
    [50000]$#,K;0
    I have %'s, monthly sales amounts (all > $50000) and invoice counts. However I didn't apply this formatting to the axis number format - it will always have 0 as 0.00% - any ideas how to avoid this?

  17. Oleksiy says:

    Formatting in my comment above should be as following: [50000]$#,K;0

  18. Oleksiy says:

    one more time: [50000]$#,K;0

  19. Donald says:

    @Oleksiy: Follow link on Hui comment (11). Looks like it might address your problem.

  20. Oleksiy says:

    @Donald: I have done similar for series values already, just for some reason Chandoo's website modified my comment from "/<1/0.00%; /50000/$#,K;0" where / - [ and ]. 🙂
    Problem is that I can't apply this to the axis format as it always has zero.

  21. Fred says:

    Thanks, Hui.

  22. Tamoghna Acharyya says:

    Hello Hui, Please suggest how can I highlight ( making it bold or colored) a particular month among 12 months that I put in X axis.

  23. Hui... says:

    You can use the same technique with Dates that are Dates, but not when they are Text.
    That is if your X-Axis has dates, apply a custom number format like
    [Red][<=40790]d-mmm;[Black]d-mmm
    that is Dates <= 4 Sept 2011 will be Red, others will be Black where 40790 is the serial number for 4 Sept 2011 You can change the Date Format d-mmm to whatever suits you . [Red][<=40790]d-mmm;[Black][<40798]d-mmm;[Green]d-mmm
    Red <=4 Sep Black < 12 Sep Green >= 12 Sep
    .
    The Date fomats can change as well
    [Red][<=40790]d mm;[Black][<40798]d-mmm;[Green]d mmmm yy
    Red <= 4 Sept; displayed as 4 09 Black < 12 Sept; displayed as 12-Sep Green >= 12 Sept; displayed as 12 September 11

  24. Tamoghna Acharyya says:

    Thanks a lot Hui for your great suggestion. So it is only possible for months not for any other texts!

  25. Hui... says:

    @Tamoghna
    Its possible for any Numbers, %, $, Dates or Times,
    Which are all numbers anyway.

    It is not possible for Text

    If you need to do text, you can consider using Text Boxes or cells behind the chart where you can apply conditional formats to.
    So instead of using the Built In axis labels, make the chart transparent and place a number of Columns behind the chart with the approriate text and Conditional Formats in it
    A similar approach can be done using Text Boxes linked to cells

  26. Linda says:

    Hui,

    This is great and very timely because I suddenly have a need for lables that change format according to the values - so thank you.

    A quick question however, on a slightly different issue. Is it possible to format the markers so they don't show for a zero value but do show for any value above or below zero.

    Thanks,

    Linda

  27. Hui... says:

    @Linda
    try a format like
    [red]0;[green]-2;;
    .
    Note the custom format layout is
    Positive;Negative;0;Text
    .
    so by having a third parameter of ;;
    you get no format when it is 0

  28. Linda says:

    Hui,
    Thanks for the quick response. However, I don't seem to know where to type the format. I can see how to do this for the Labels but not for the actual graph marker itself. Esentially I want the marker to show if there is a value, but not if it is 0.

    Appreciate your help.

    Linda

  29. Hui... says:

    @Linda
    Sorry, I'd misread your requirements
    Where your data is, change the formula to be
    =if(my formula=0, na(), my Formula)
    .
    You may have to change the settings
    Select chart
    Right Click, Select data
    Hidden & Empty cells
    Adjust to suit

  30. Linda says:

    Hui,

    Thank you so much that worked well. I had a couple of problems at first because I had the graph type set as a line and the #NA had no effect. However, once I changed it to XY scatter, your suggestion worked like a treat!

    Thanks so much for your help

    Linda

  31. Aashtee says:

    Hello Hui,
    I have a data validation cell (A1) with a dropdown list for "Qty" and "$$$".
    My data set is values that I plot asa Pie Chart (In Column B1).

    These values are conditionally read from 2 different tables depending on the drop down list selection for $$$$ or Qty.
    I have conditionlly formatted all cells in B1 to display number format as Number (0 decimal places) or Currency $ again dependent on selection made in A1.

    Now my pie chart is updating correctly based on my selections and data but the labels do not get formatted to Number or Currency automatically.
    How can I conditionally format the labels based on selection in A1?

  32. Hui... says:

    @Aashtee
    You can't conditionally format chart objects against another cell only against there own values.
    If the values for Qty and price are different
    ie: Price $100-200
    Qty 1-20
    you can use a Custom Number format like
    [Blue][>=100]$#,###.00 ;[Red][<100]#,###;
    .
    But if they overlap it can't be done

  33. Annie says:

    Hi Hui,

    I'm trying to customize the x-axis from 0,1,2,3,4,5 to read: 0, KG, 1, 2, 3, 4, 5. How can I do this?

    Also, the x-axis figures are currently on top of my chart, how can I move these to be on the bottom?

    Thanks!

  34. Annie says:

    It's a clustered bar chart that I'm using to show when curriculum was developed for different subjects. The y-axis indicates the year the curriculum was developed and the x-axis corresponds to the grade level (KG is short for kindergarten, followed by Class 1, 2, 3, 4 and 5).

    • Hui... says:

      @Annie
      I'm struggling with an easy solution for this one
      One way would be to delete the axis altogether or use a Custom Number format like ;;;
      Then setup a manual set of cells with the 0 K 1 2 3 4 etc which would be located behind the chart and then resemble the Axis Labels
      or
      Setup a Text Box/es with the same Sequence 0 K 1 2 3 4 etc and place that where the axis would be
      Once properly located and sized, The Text Box could be grouped with the chart so that they remain fixed to each other.

    • Kyle McGhee says:

      Hi Annie,

      I think this might work for you...basically what Hui said but a couple small tweaks.

      use this custom format
      General;[<0]"0";"KG"
      It will make negatives appear as 0 and 0 appear as KG, positive numbers will remain as they are.

      Then select the x-axis and ctrl+1 to go to format axis.
      Axis Options
      1/Set Minimum to -1 (Fixed)
      2/Set Maximum to 5 (Fixed, optional)
      3/Vertical axis crosses; Axis Value = -1

      In your data, make sure that all data points relating to KG are 0.

      Your clustered bar chart should have 0 KG 1 2 3 4 5 for the x-axis labels.

      Kyle

  35. Annie says:

    THANK YOU SO MUCH!

    This worked perfectly. I really appreciate all of your help.

    Phew!

    Annie

  36. Kyle McGhee says:

    minor note on the customer format I posted...it doesn't need the [<0] in General;[<0]"0";"KG". You can just use General;"0";"KG"

  37. Majid says:

    hi drea,
    thank you so much !
    i am from iran.
    this site is very good for me.
    this site has very good information from excel.
    by

  38. Russ Urquhart says:

    I need to do something like your highlight thousands as K, but to this degree:

    1?      0.000001
    10?     0.00001                                                                                 100?    0.0001                                                                                  1m      0.001                                                                                   10m     0.010                                                                                   100m    0.100
    1       1.000                                  
    10      10.000                                                                                  
    100     100.000         
    1k      1000.000                                                                                10k     10000.000                                                                               100k    100000.000                                                                              1M      1000000.000                                                                             10M     10000000.000                                                                            100M    100000000.000                                                                           1G      1000000000.000                                                                          10G     10000000000.000                                                                         
    100G    100000000000.000   

    From what i've been told i can not express all of that as a chart label number format, so i was looking at other options.

    Within VBA and Excel, how can i apply a NumberFormat like this to a chart?

    Any help is greatly appreciated!

    Russ 

  39. Russ Urquhart says:

    Actually the numbers ghot screwed up when i pasted.

    They should be like

    1K     1000.0
    10K    10000.0
    100K    100000.0

    etc.

     

  40. Yousuf says:

    My y axis goes from 0 to 1 with increments of 0.1
    I want it do be displayed in terms of p10,p20 all the way to p100
    For ex instead of 0.1 i want p90 and instead of 0.2 i want p80 all the waiy to p0. Is this possible?

    • Hui says:

      @Yousuf
      You can't do maths in Number Formatting apart from the Power of 10 tricks discussed here: http://chandoo.org/wp/2012/01/31/custom-number-formats-multiply-divide-by-any-power-of-10/
      However you can still do what you want
      Setup your chart
      Select the Y Axis and set Max to 1, Min to 0 and Major Unit to 0.1
      With the Chart selected Add text boxes and type the text you want for each Axis Point eg: p10, p20 etc
      Locate the text boxes in the correct locations using the Axis as a guide
      Set the text size, font, Bold etc to suit
      Select all the text boxes and group them
      Select the axis and set the text color to None
       
       

  41. lonchas says:

    How can I use an image instead of a text on chart axes? I would like to use companies logos instead of using the names on x-axis. Is it possible?

  42. Bisal Kumar Garg says:

    I have data labels in percentage format. which custom format i should use to have green color fornt if more then 100% and red color font if less then 100%.

  43. Shelly says:

    I'm hoping you can help. I have a dynamic chart for financial data. Most of the charts have a y axis based on $ with a couple charts that are a %. I can not use a option mentioned above since some of the $s have a negative value. I tried conditional formatting the source, but the 'Link to source' does not pick up the conditional formatting. Is there a way to have the y axis dynamically change from $ to %. I am using a combo box to change the data on the chart.

  44. ERik says:

    The “linked” data from my table is conditional formatted to be red based off of some criteria. I want my chart axis to be red too, but it only picks up the number format, not the conditional formatting.

    Is there any way to link conditional formatting of sourced data to axis labels?

  45. Shabbir says:

    You are awesome chandoo. Thanks

  46. Louie says:

    Hi Chandoo,

    Your posts are very helpful.

    Is there a way to conditionally format the data label position/location (in addition color, as you have shown in this post)?

    I have some line charts with markers showing the same measure from year to year. Each chart has two lines. One of the lines is an average of participants in the group and stays the same. The other line is for each participant and gets updated dynamically to produce about 50 unique charts total. If I put the data labels "above" or "below," they look good for about half the participants, then overlap or are confusing next to each other for the remaining half of the participants (given that the one line is the average of all participants). Right, left, and center do not look good, as they overlap the lines. I tried using the Chart Tools---> Design---> Style 2, which makes the markers bigger and places the data label inside the marker. However, for the 3-4 participants per year who have about average values, the marker for the participant overlaps with that for the average and makes the labels unreadable.

    Thank you for any help you can offer!

Leave a Reply