Self-selection and convenience samples

by David Anderson| April 20, 20207:39 am| 22 Comments

This post is in: Anderson On Health Insurance, COVID-19 Coronavirus

I have three sisters. They all live within 20 minutes of the I-495 loop that surrounds Boston.

My older sister had one hell of a nasty upper respiratory infection in late January and early February. She never went to a clinic and therefore was never diagnosed with anything.
My middle sister had a nasty two week flu in mid-February. She self-isolated at home. She went to her PCP and got a flu-swab that came back positive. She recovered and was about to be back in the office and seeing her friends just before her concerned older brother suggested that physical distancing would be a good idea for COVID-19.
My youngest sister had a great winter and early spring regarding her health.

Why do I mention this?

Let us imagine that a researcher wants to conduct a serology study with a convenience sample to identify the prevalence of COVID-19 in Eastern Massachusetts, and more importantly, the prevalence of individuals who are now immune after having a low to no symptom infection. That is a damn good question where we need good answers in order to inform policy responses. However methodology matters.

If the researcher has a limited number of tests and wants a fast response on a small budget, they could put out an ads that announce COVID-19 immune/infection history tests are available. They researchers plan to use population weights to correct for observed demographic imbalances of the first 500 people who show up to get tested.

Is there a problem with this method?

Yeah….

My three sisters have very different probabilities of responding to that ad. My middle sister knows she has been physically isolated for almost two months now and that her notable winter illness was diagnosed as flu. My youngest sister is feeling great and has been physically isolated since the start of March.

However my older sister had one hell of a nasty disease course this winter. There is a good probability that it was an non-diagnosed flu. There is a decent probability it was not flu and not COVID-19 but some other viral infection. There is a non-zero but fairly small probability that she had COVID-19 in late January.

Which sister is most likely to respond to an advertisement for free serology testing to see if they had COVID-19?

People like my older sister could very plausibly be far more responsive to an ad for free COVID-19 infection history/immunity/serology testing than either of my younger sisters. They would have a higher prior value on the new information that a good (albeit imperfect) test could give them.

A convenience sample where the participants effectively self-recruit is highly likely to have lots of people who are systemically different than the general population. Self-selection means generalizability of the results is extremely limited. A good researcher can say that whatever they saw in the sample is relevant for the self-selected sample but not the general population. It might establish a boundary of plausible estimates but the point estimate is highly likely to be biased and uncorrectable for unobserved self-selection tilts.

If we want generalizability, we either need complete population sampling or random sampling of a population so that the probability of all three of my sisters being selected for a test is the same.

« Monday Morning Open Thread: Happy Patriots Day!

Empty and busy at the same time »

22Comments

1.

JMG

April 20, 2020 at 7:47 am

This is an excellent explanation of the testing question and I wish I had been able to give it to my wife when she asked me about it at dinner rather than the somewhat, OK, almost totally incoherent answer I did give. I will recommend this to her this morning.
2.

debbie

April 20, 2020 at 7:54 am

Everyone should be tested. Next time I see my doctor, I intend to ask for testing. I had a 13-week something that coughed like pneumonia but didn’t sound like pneumonia when she listened. I’ve also been on plaquenil forever and I’m curious about antibodies.
3.

Pete Mack

April 20, 2020 at 7:56 am

It’s worse than that. The antibody tests have unknown false positive rates, but existing good tests have at least a couple percent and as high as 5. So the results in places with low infection rates–and by that I mean below 25%–will have extremely inaccurate results. Below 10% and it is no better than flipping a coin…and likely worse.
4.

Cheryl Rofer

April 20, 2020 at 8:04 am

Thanks, Dave! This is one of the problems with the Stanford – Santa Clara study.

Could you do an explainer on the sensitivity/ specificity problem, also in that study. This is what Pete Mack is talking about. I am not enough of a statistician to explain it, but back when I was doing environmental sampling, one of my statisticians explained it to me. I recognized it when it was made against the Stanford study. I find it very counterintuitive, and it’s going to be a big problem now that FDA is not requiring approval for serological tests and many serological tests are appearing, with big error rates.
5.

scribbler

April 20, 2020 at 8:08 am

Yikes. Three sisters. That must have been rough.
6.

PST

April 20, 2020 at 8:10 am

@scribbler: Yeah, on the sisters.
7.

zzyzx

April 20, 2020 at 8:21 am

@debbie: I think that can only happen if we have a test that can be done without lab work and they’re still working on them. That would be the ideal, that there would be a Covid19 test along the lines of my blood glucose monitor, and everyone gets in the habit of testing themselves before leaving the house. I just don’t know if we can get to the point where those are reliable and affordable quickly enough to help us.
8.
David Anderson

April 20, 2020 at 8:34 am
@debbie: given limited number of tests available, not everyone should be tested. Randomly selected census blocks should have universal testing as part of a parameterization regime to actually assess what is happening.

But in general practice, given that we have limited testing capacity, tests should be reserved for people who have a higher than random probability of being infected. And right now that means the following:

people who are symptomatic

people who are in close contact with infected individuals

Healthcare workers

people living and working in congregate living situations, jails and prisons e

etc

As we go from 150,000-170,000 tests per day (~1 million tests per week) to 5-10 million tests per week, the criteria should change, but in a restricted testing environment, we need to test the folks who we have a strong reason to believe are more likely to be infected.

Right now, it makes no sense to test my two younger sisters as they are physically distancing and have no reason to suspect that they are bumping into currently infectious individuals. I should not be tested right now either for the same reasons. My older sister is a reasonable testing target because she has a critical job with fairly significant exposure risks.
9.

David Fud

April 20, 2020 at 8:35 am

Since there will be a proliferation of private (theoretically cheap) serological tests, maybe the answer is to test with multiple instruments to reduce the false positive rate. Might be more expensive, but should (if the tests function in different ways) help to address the error rate due to multiple methods.
10.

David Fud

April 20, 2020 at 8:38 am

@David Anderson: This is the correct answer, but also introduces a high positivity rate that has been widely discussed lately. It points to the need for more tests. I know that you aren’t really addressing the quantity of tests, so I won’t bother you about that. The purpose for the test changes in your scenario from one of understanding the course of the disease in a particular population to protection of medical personnel. Which is fine, if you are doing triage, but isn’t great if you are trying to actually get ahead of the curve and manage the pandemic.
11.

lee

April 20, 2020 at 8:39 am

Here is a medium.com article about how poorly we are doing on testing.

Looks like our max output for tests is about 145k/day.

At that rate we will never have enough testing for it to be meaningful.
12.

MattF

April 20, 2020 at 8:41 am

One more goal— assess the meaning/reliability of testing that’s already been done, in this country and elsewhere. Or do you disregard everything up to now?

ETA: Also, I had that ‘coughing thing’ that was going around early this winter, so…
13.

David Anderson

April 20, 2020 at 9:00 am

@David Fud: Completely agree with everything you are saying. We need 5-10 million tests a week for months on end tied to an effective tracing regime and a quarantine system that won’t lead to economic, social and psychological ruin to be able to go back to 80% of normal with major social changes (masks, temperatures taken frequently, hand washing/sanitizing stations everywhere etc). But right now we’re at maybe 20% of the best case testing capacity, and most likely well below that, so given that reality, we’re still in triage and prioritization mode.
14.

oldster

April 20, 2020 at 9:03 am

You’re being very polite about this when you could say,

“The Stanford study was an appallingly dishonest hit piece with no scientific value, designed to advance a political agenda by people who had already written op-eds minimizing the pandemic.”
15.

Ten Bears

April 20, 2020 at 9:09 am

The question I beg answer: I experienced a flu-like episode in late November, early December, which without going into too great a detail about my pit-bull tenacity and all around bad attitude about the medical industrial complex I fought off as I ever fought them off – chicken soup. Took about a month for it to pass, manifesting itself mostly as shoulder pain not unlike a ripped bicep.

But there’s more: I experienced, my entire family, the law firm of my employment and university to which I was attached… hell, pretty much all of Western Montana, the Idaho panhandle and Eastern Washington were hit with a SARS like crowned virus, a viral walking pneumonia… in 1998. Pretty much shut everything down, the Kids had an extra two weeks of spring break.

And even more: across my five generations of memory Central Oregon has long experienced a reoccurring SARS like viral walking pneumonia, the ‘creeping crud’, running back through five or six of the hundred year families I am tied into.

Back to Boston, I am reminded of something I read a year or so ago in Albion’s Seed, something I picked up upon recommendation due to my interest as a Mad Scientist in things like building and engineering, and the somewhat unique architecture of the place. Germaine to the topic was a brief mention of a propensity to The Fever, a viral walking pneumonia, in both the pilgrim and cavalier stock, stock my dear mother has so thoughtfully avoided all but the whitest of both back thirteen and fifteen generations.

The question: In the face of all of that where do we even begin to figure who, or how, are immune?

My spouse is pretty immunally compromised, our big scare has been the notification we were in the same building at Mass General the same day as the first positive. And had been twice earlier in the week. Lock-down around here has been pretty scientific, though not necessarily any more than our ordinary daily precautions. The masks, really, are the difference. Not to mention she has since October in preparation for brain surgery been on a literal cocktail or steroids and anti-biotics, damn bug may never have had a chance.

Apologies for the disjointedness, but I am as deep into this as anyone, and it’s just such a simple question: how do I know?
16.

p.a.

April 20, 2020 at 9:41 am

Isn’t it still an open question whether infection leads to immunity, even temporarily?
17.

ziggy

April 20, 2020 at 10:41 am

We can’t just dismiss the Santa Clara study out of hand, though, because there are other studies worldwide which show a similar proportion of seroprevalence in those areas. I’m trying to find the tweet which listed these studies, but not having any luck–several were from Denmark. We really need to have well-conducted studies to get a handle on this, and I’m hopeful this will happen soon now that the antibody tests are in full production.

I’m very concerned that the proportion of asymptomatic victims (as shown on the Theodore Roosevelt and Iceland study) could be very much higher than we assume. And that these people do have significant transmission while asymptomatic or pre-symptomatic. So the proportion of the population that has been or is infected could be much higher than we know in some areas, especially if they are relatively young and healthy. In that case, trying to suppress the spread completely is going to be very difficult, if not impossible.

On the positive side, perhaps if there is a significant population of people with antibodies to Covid, perhaps we can do more plasma therapy, and begin it earlier, with better outcomes for severe cases.
18.

ziggy

April 20, 2020 at 10:51 am

@Ten Bears: Your illness in December was not Covid-19, and here is a twitter thread that explains why:

https://twitter.com/trvrb/status/1249414295042965504

Bear in mind that the vast majority of people that exhibit symptoms compatible with Covid, and are able to get testing because of that, test negative (currently 90% in my state).

Hopefully soon antibody tests will be widely available as well as accurate, and we will be able to know if we have been exposed, and perhaps get an idea of our immunity situation.
19.

narya

April 20, 2020 at 10:56 am

As soon as there’s a reliable serological test, I’m getting in line. I had a weird fever in February–NO other symptoms, no one around me got sick–that peaked at 103 one night. I took ibuprofen to get it down (and that worked) , or would have gone to the ER. When I went to the doc the next day, they tested me for every damn thing and did a chest x-ray, and . . . nothing. No flu. No strep. Liver enzymes elevated, but nearly normal by the next day. No cough. Found out I’d had CMV and Epstein Barr at some point in my life, but negative for them, too. Blood cultures all negative. Weirdest thing I ever had. And then another flu-like episode a month later, that acted like a normal winter cold/flu, gone in a couple of days. So, two things, roughly in the time frame, with no diagnosis? I should get in line, six feet away from your sister, wearing a mask.
20.

Villago Delenda Est

April 20, 2020 at 11:40 am

David, you are a terrible human being. You’re making Rethuglican heads hurt.
21.

Pablo

April 20, 2020 at 11:51 am

@David Fud: It’s not usually the instruments that are the problem. It’s that the test reagents lack exact specificity.
22.

pluky

April 20, 2020 at 1:25 pm

@Cheryl Rofer:

Just the thing that Wikipedia was made for!
https://en.wikipedia.org/wiki/Sensitivity_and_specificity

Comments are closed.

Let us savor the impending downfall of lawless scoundrels who richly deserve the trouble barreling their way.

The arc of the moral universe doesn’t bend itself. it’s up to us.

Glad to see john eastman going through some things.

We know you aren’t a Democrat but since you seem confused let me help you.

Ron DeSantis, the grand wizard, oops, governor of FL.

There is one struggling party in US right now, and it’s not the Democrats.

It’s the corruption, stupid.

I’d hate to be the candidate who lost to this guy.

I’m more Christian than these people and I’m an atheist.

… gradually, and then suddenly.

Come for the politics, stay for the snark.

I’m pretty sure there’s only one Jack Smith.

Well, whatever it is, it’s better than being a Republican.

Black Jesus loves a paper trail.

Michigan is a great lesson for Dems everywhere: when you have power…use it!

I didn’t have alien invasion on my 2023 BINGO card.

Chutkan laughs. Lauro sits back down.

Relentless negativity is not a sign that you are more realistic.

Narcissists are always shocked to discover other people have agency.

Republican also-rans: four mules fighting over a turnip.

Perhaps you mistook them for somebody who gives a damn.

Usually wrong but never in doubt

Jack Smith: “Why did you start campaigning in the middle of my investigation?!”

Imperialist aggressors must be defeated, or the whole world loses.

Self-selection and convenience samples

Reader Interactions

22Comments