PSA: Don’t let auto correct spoil your party

Posted on August 26th, 2016 in Learn Excel - 7 comments

So here is a news from strange but true department. Microsoft Excel blamed for gene study errors [bbc.com].

Microsoft’s Excel has been blamed for errors in academic papers on genomics.
Researchers trying to raise awareness of the issue claim that the spreadsheet software automatically converts the names of certain genes into dates.
Gene symbols like SEPT2 (Septin 2) were found to be altered to “September 2”.

Aah, classic!

This is what happens when you spend countless hours learning genome sequencing and very little about the software tools where your data goes. May be we need bring clippy back to warn people about such sticky situations.

clippy-genome-help

All jokes aside, here is a public service announcement for you. Beware of helpful features in Excel like auto correct, flash fill, auto fill, scientific notation etc.

Here are a few tips for you if you find yourself coding genome in Excel (or something similar)

  • Use TEXT format for data that contains possible dates, values that start with = etc. To set TEXT format, select data entry range and use Home > Number > Text
    • This can deal with cells that contain possible dates, credit card numbers, very long numbers, leading zeros, fractions, values that start with = (which Excel thinks formulas )format-as-text-input-cells
  • When importing text files to Excel (like your genome sequence data or what have you), select text as data type for the columns that can be misinterpreted by Excel.text-import-settings
  • If a cell starts with = and should not be treated as a formula, prefix the cell with ‘ apostrophe
  • Disable features like Flash fill, auto complete and percentage entry if you mustdisable-options-excel

Help the hapless, share your tips

Now its your turn. Please share your tips to handle situations like these. Post your tips in the comments box.

More reading:

Before you embark on saving sensitive stuff in spreadsheets, soak up some survival skills:

Written by Chandoo
Tags: , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

7 Responses to “PSA: Don’t let auto correct spoil your party”

  1. Asel says:

    Thanks as always for your tips!
    My colleagues have already noticed that I'm getting awesome in Excel 😀

  2. Patrick O'Beirne says:

    #1 tip: Check your data. Don't assume that your software tool, whichever it is, matches your naive expectations of perfection, or that you have always used it correctly.
    There's a skills certification called "Spreadsheet Safe" and one of their points is "validate CSV file imports."
    http://www.spreadsheetsafe.com/us_main_page_section/services/

    Most, but not all of these data files are imported as text or csv formats from instrumentation such as DNA sequencers, gene microarrays or proteomics screens. So your second solution is the correct one in this case: "When importing text files to Excel (like your genome sequence data or what have you), select text as data type for the columns that can be misinterpreted by Excel."

  3. GraH says:

    Once you are too deep in the matter, you often don't see simple mistakes or errors any more. So ask a (critical) colleague to test or review what you have done. Or explain him/her how you did it. Allow why questions, please.
    Alternative: apply some agile pair "programming" (also valid for VBA). 2 brains have more brainpower and 4 eyes see a hell of a lot more then 2.
    I learned the hard way to import all field as text. Certainly for dates: where server settings and local settings (per user!) are different, trouble is guaranteed! ( 04/05/2016 (Belgium date, 4th of May) 05/04/2016 (American date, 4th of May). You calculate negative service levels or lead times, hey?

  4. daveycroc says:

    Another version of the date issue I experienced. Data was exported from SAS to a CSV, but the dates only had a two digit year. Therefore dates that should have been 2050 etc were imported from the CSV as 1950!

  5. Daveycroc, see the MS support pages for

    How Microsoft Excel works with two-digit year numbers

    Change the date system, format, or two-digit year representation

    • daveycroc says:

      Thanks Patrick. I dealt with it but it took some head scratching to figure out why so many mortgages had passed their expiry date! lol

Leave a Reply