The first thing a data analyst trainee should learn is that playing with Excel’s functions and tools is a great way to get into trouble when you don’t have an underlying understanding of the fundamental data’s behaviors AND don’t understand the functions and tools core assumptions. This is important. The second or third lesson a data analyst trainee will learn is to not use Excel but that is advanced training.
Why does this matter?
It seems like the White House is using Excel and not understanding the phenonoman they are trying to model.
To better visualize observed data, we also continually update a curve-fitting exercise to summarize COVID-19’s observed trajectory. Particularly with irregular data, curve fitting can improve data visualization. As shown, IHME’s mortality curves have matched the data fairly well. pic.twitter.com/NtJcOdA98R
— CEA (@WhiteHouseCEA) May 5, 2020
Eyeballing the data, there sure as hell seems to be a day of the week seasonality. But let’s go beyond that.
If we were to assume that a cubit fit is an appropriate choice to model the data, and that we can project out of the current data to the near future so that there are almost no deaths on May 15th, that requires a What the Hell response.
A person infected today is unlikely to show up as a death until Memorial Day.
If we can safely add up median time from infection to symptoms and then symptoms to hospitalization, that sums to a back of the envelope span of 12 to 13 days.
We’ll have to add time from hospitalization to death. But this morbid math has a point. The next 7 to 10 days of deaths have almost entirely been baked into the cake as these are individuals who were infected before states started trying to open up again. We’re not going to get a reliable signal on mortality due to policy changes for at least another two weeks in the states that have been early and aggressive in re-opening.
For a cubit curve fitting exercise to be valid, we need to bring this basic mechanical reality into play. And that reality is that the people who are highly likely to die on or before May 15th are already infected. For there to be no deaths on May 15th, we basically need no one to have been infected after April 27th or so.
This is basic data analysis guidelines — understand the fundamental phenomenon you are trying to model while also understanding the modeling assumptions of the tools being used. The White House decision making systems are disregarding both tenets of basic data analysis.