Dirty Excel Secrets: CSV

While behind the scenes at Data-8 we have countless databases whirring, hundreds of bespoke workflows validating data, and what seems at times like unlimited storage, the input we often see and output we often provide are just humble spreadsheets and text files. pexels-photo-113850-large They have their own complexities, and you might find yourself struggling with what should otherwise be a simple job. If you’ve ever opened a file of phone numbers and wonder why a London number is now showing as 2.08E+09 then you know what we’re talking about.

Many databases such as MS Dynamics or Salesforce allow exports to a .csv – a Comma-Separated Values file. These are a very common ‘delimited’ file. This simply means that every column in a spreadsheet is replaced by a comma, and so a spreadsheet with:

name surname postcode

becomes “name,surname,postcode”

CSVs are great as they’re a bog standard format, are (mostly) human-readable and as text files can be opened in many different programs. Where they fall down is, funny enough, the default Windows application that wants to open them: Excel. The problem is, Excel has its own ideas of how to handle a text file. It likes to map incoming data to various categories so that the user can filter and manipulate dates as dates, numbers as numbers, and text as text. If those London phone numbers are formatted as ‘02079304832’ then Excel assumes it’s just a long number, and as numbers aren’t preceded by a zero, the field shortens to 2079304832. But there’s more: Excel spots that it’s an unusually long number, and use scientific notation to describe it. Scientific notation is great for abbreviating long numbers like the number of pigeons in London but not so useful for phone numbers there. While you can drag the phone number column in your data to show the full number, you’ve still lost that first zero. There’s a better solution: we’re going to lie to Excel.

Open up Excel as normal. In the ribbon (or equivalent), go to Data > Get External Data > From Text and point this at your extract from CRM. Your data is Delimited. It may have Headers. Click Next, and you’ll see a mess of text in the preview pane. Uncheck ‘Tab’ and check ‘Comma’ and now that mess will have separated out columns like in any spreadsheet. Next again, then choose each column in your data and change the data format to be Text. Excel leaves text columns alone, and doesn’t so any of that mapping. Click Finish and your data imports.

CSVs are just one type of delimited file – you can separate columns with tabs, spaces, pound signs and lots more.  Here at Data-8 we often supply cleansed data with appended fields in a csv format that might look odd if you just open it up in Excel. Try the instructions above and if you have any problems, let us know.

We’ll just keep that lying to Excel between us.

DataQualityReport (002)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s