May 8th, 2013

Of POPRES, LOLIPOP and "unexpected" findings

There's another one of those, hey, we're all more related than we think! So let's just get along like one big happy family things out there in the news. Here is a sentence from NYT/IHT coverage:

"The Californian study yielded some surprising conclusions, including that Britons share more recent common ancestors with people in Ireland than with others in Britain."

Let's allow for difficult wording -- which seems inevitable when describing this kind of research/analysis, whether done for fun or profit, by pros or amateurs -- and ask ourselves whether this makes sense. Does this make sense? No. So we'll follow the link, which leads us here, but doesn't really address the Briton/Ireland question.

I think this is the paragraph we are meant to be reading:

"We were quite surprised by this result. However, we think it can be explained by migration rates and population size changes between countries. For example, over the past few centuries there has been a great deal of migration between the UK and Ireland, particularly of people of Irish ancestry into the UK. Because of this, many UK individuals have recent Irish ancestors (see for example). This was happening at the same time as population growth in both places. Since the population size of Ireland was smaller than the UK, an Irish ancestor is more likely to be related to a modern-day Irish person than an ancestor in the UK is to be related to a modern-day person in the UK."

I thought about that for a while. I took a nap. I thought about it some more and I thought, That Is a Stupid Non-Explanation (<-- note gratuitous and aggressive hostility). What dataset are they using anyway?

Back to IHT/NYT, which says: Peter Ralph and Graham Coop of the University of California used genomic data for 2,257 Europeans to conduct the first such study of an entire continent."

Well that is Not Helpful. But I think this is:;wap2

So what is "POPRES"? It is a Rollup. What went into the "Briton" component? LOLIPOP. What is in LOLIPOP?

This is LOLIPOP:

Oh, fuck it. Try this:

London Life Sciences Population Study

"The LOLIPOP study is a population-based study of Indian Asians and European whites, aged 35–75 years, identified from the lists of 58 general practitioners in West London.14 To date, 938 northern Europeans and 431 Indian Asians from this collection are included in POPRES. Although extensive cardiovascular-related phenotypic data were collected on these participants, the POPRES database only includes nonidentifying demographic information: age at collection, self-identified race and/or ethnicity, and country of birth."

If a third of your Britons are "Indian Asian", then I think the answer to your what's up with this weird result with Briton/Ireland has been answered, and it has more to do with relocation post Partition than any fucked up speculation about historical migrations. In fact, one has to conclude that the result isn't weird or unexpected at all, but quite normal and expected, _if you read the description of your data set_.

It should not be this easy to find this big a problem. It really should not. However, when I saw that they described their sample as "random" but also that they had removed half of each sibling pair in the group, I just rolled my eyes and gave up.

POPRES, btw, is kind of cool, altho too limited to justify the kind of work that Ralph and Coop have attempted to do on it. Which is kinda crystal clear, if you read this:

Really, reference population is almost all you need to ask about to predict where things went wrong with this kind of research.

ETA: See 4.1 in the data cleaning explanation. They yanked out more than I thought -- not just halves of sibling pairs but people related at a level expected from first cousins. They attempted to assign people based on their grandparents country of origin, with this interesting sentence: "Most samples from the United Kingdom reported this as their country of origin; however, the few that reported "England" or "Scotland" were assigned this label." I _think_ they tried to scrape all the "Indian Asian" participants out of LOLIPOP. I'm still trying to figure out what I think of this. On the one hand, they've essentially turned "European" into "people in Europe with European grandparents", which is creepy as fuck anyway, especially if your whole point is to try to blow up genetic underpinnings of racial theory.

But at least it isn't as ridiculous as I thought it was.

ETAYA: Oh, no! It's worse! They wound up with n=22 for "England" and n=60 for "Ireland".

Would _you_ feel happy generalizing about anything based on those numbers? I kinda like the 1 Bulgarian. I assume that person showed up on Lausanne.

I would _so much rather_ just wait until we had better datasets. It's very difficult to believe in any of this.

ETA Still More: Does n=358 for "United Kingdom" make you feel better? The more people identify as UK, the more likely their grandparents came from all over, at least all over the UK, if not further afield. The difference with the numbers from the Lausanne component of POPRES is striking -- primarily because they didn't have to throw away almost all their data.