Class 23: Structured data (Excel)

Methodology of Scientific Research

Andrés Aravena, PhD

May 21, 2024

How to use Excel

1. Do not use Excel

Use R or Python instead

You can track your computations steps by step

Let me show some examples

Greece economy crashed because of Austerity

In 2010 Carmen Reinhart and Kenneth Rogoff (U. Maryland and Harvard) published a paper saying

rising levels of government debt are associated with much weaker rates of economic growth, indeed negative ones

Reinhart, Carmen, and Kenneth Rogoff. 2010. ‘Growth in a Time of Debt’. w15639. Cambridge, MA: National Bureau of Economic Research. https://doi.org/10.3386/w15639.

Austerity

Smaller public sector, less hospitals, less schools/universities, less infrastructure

This policy was applied in many countries, and resulted in big crises in Greece and UK (among other places)

Many people lost they jobs, their savings, sometimes their lives

Some people think that austerity was one of the reasons for Brexit

Cassidy, John. 2013. ‘The Reinhart and Rogoff Controversy: A Summing Up’. The New Yorker, 26 April 2013. https://www.newyorker.com/news/john-cassidy/the-reinhart-and-rogoff-controversy-a-summing-up.

Austerity was an Excel error

“Reinhart and Rogoff allowed researchers at the University of Massachusetts to look at their original spreadsheet – and the mystery of the irreproducible results was solved. First, they omitted some data; second, they used unusual and highly questionable statistical procedures; and finally, yes, they made an Excel coding error.

It is hard to find errors in Excel

‘Excel Spreadsheet Error Helped Spark the Austerity Binge’. Sarasota Herald-Tribune. Accessed 21 May 2024. https://www.heraldtribune.com/story/news/2013/04/20/excel-spreadsheet-error-helped-spark-the-austerity-binge/29166815007/.

Not enough testing

“The underlying problem with spreadsheets is how easy it is to use them without applying adequate scrutiny, oversight and validation. They can include many complex customised algorithms but are not typically built by software engineers who are trained to create reliable software.

“Instead, research shows, they are often built without sufficient planning, any control process or testing. This results in lurking data integrity problems that, given the right circumstances, can suddenly cause catastrophe.

Thorne, Simon. 2020. ‘Excel Errors: The UK Government Has an Embarrassingly Long History of Spreadsheet Horror Stories’. The Conversation. 7 October 2020. http://theconversation.com/excel-errors-the-uk-government-has-an-embarrassingly-long-history-of-spreadsheet-horror-stories-147606.

Excel is bad for Molecular Biology

Official gene names were changed because of Excel

Vincent, James. 2020. ‘Scientists Rename Human Genes to Stop Microsoft Excel from Misreading Them as Dates’. The Verge. 6 August 2020. https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates.

Update

October 24th, 2023, 11:25AM ET:

Microsoft has updated Excel on Windows and macOS, adding a toggle to turn off automatic data conversion.

UK COVID data was lost

“the details of 16,000 positive test cases had been lost because of an error made with Microsoft Excel”

And there are many other stories like that

https://eusprig.org/research-info/horror-stories/

Thorne, Simon. 2020. ‘Excel Errors: The UK Government Has an Embarrassingly Long History of Spreadsheet Horror Stories’. The Conversation. 7 October 2020. http://theconversation.com/excel-errors-the-uk-government-has-an-embarrassingly-long-history-of-spreadsheet-horror-stories-147606.

Repeat after me

However

if you have to use it…

2. Use it only for small data

Excel works well with a few thousand rows

3. Data first, presentation later

Many people are tempted to draw nice tables

These are hard to use

Store data into tables, easy to analyze

Later, prepare reports from the tables
(automatically)

4. One sheet for each table

Data is organized in tables

  • One row for each individual/case
  • One column for each variable
  • One sheet for each table
  • Always start in cell A1

The columns are fixed. Data is inserted as new rows

5. One column, one datatype

This is like data frames in R or Python

Do not mix numbers and text

All values must be in the same units
(or use another column for the units)

One cell, one value

If you need multiple values, use more rows

6. Use Google Forms to insert data as a team

  • It allows you to enter data directly into a spreadsheet

  • Minimizes errors, using a controlled vocabulary

  • You can distribute the work among several team members

7. You can make quick analysis

Learn to use

  • Pivot tables
  • Plots