walkitout (walkitout) wrote,

MRCA debate, what a good simulation needs, etc.

I realize the last few posts have been quite horrible and I apologize. On top of that, there's been some offline conversation with Ebeth that is therefore missing from whatever this blog is a record of. I'm going to attempt to summarize what I've been thinking about, why it matters, and what I think is worth looking for on this topic in the future.

There are a couple of mathemetical constructs used when thinking about population genetics. One is MRCA (most recent common ancestor) -- the most recent person who shows up in _everyone living's_ family tree. Another -- less commonly used -- is ACA or All Common Ancestors, which I'm still ignoring (I don't disbelieve it, per se, because it's quite obvious that MRCA and ACA are both real and in historical times and well documented for subpopulations such as Iceland). Mitochondrial MRCA provides a time frame of about 200K years ago. Y-Chromosomal MRCA provides a time from of 60-90K years ago. If you are a grad student or postdoc in population genetics, you could potentially get a big, juicy chunk of fame if you could pull this date in a ways.

A math professor named Jack Lee is apparently something of a genealogy buff and was initially pleased to discover he was descended from Charlemagne, until he did some probability calculations that suggested to him that an awful lot of people are descended from Charlemagne. His essay on the topic is widely reproduced and has within it the basic problem that shows up in a lot of initial attempts to model lineages: it assumes random mating. Jack Lee recognized that as an issue, and also pointed out that while his scribble doesn't definitively show that everyone within a certain area is descended from a certain person, it does show that it is quite wildly improbable than any arbitrarily chosen person within that area is _not_ descended from that certain person. The argument about Charlemagne can be made about anyone in the same time frame, the same location, who can be shown to have living descendants today.

The next approach tried (AFAIK) was simulation. Basically, try to model lineages and find out where they meet up in the distant past. The initial attempt at this would appear to be Chang in 1999, the same Chang who appears as an author on Rohde's 2004 paper. I have not read Chang's first paper on the topic, but it, too, used random mating. I did eventually find a .pdf of a draft of Rohde's paper; I can't speak to how closely it resembles what was published in Nature. The draft is what I was poking at in the last two posts.

One of the reasons Lee argument is so simple and compelling is because the area he is describing is accessible by walking within a comparatively short period of time and in fact we know people _did_ move around on foot throughout the generations in question. Mixing really was happening. The problem tackled by Chang and Rohde is much, much harder: they have geographic bottlenecks to confront. How far back to we have to trace to find someone whose lineage touches every part of the globe today? Rohde, Change and people like Humphrys behave as if they are ideologically committed to a within-history time frame. They build into their assumptions that there are no truly isolated populations today. They assume a very high migration rate in very implausible locations over very broad ranges of time, and claim they are being conservative by pointing out how low (by comparison) they made the migration rate in plausible places like Morocco to Spain. They allow recent arrivals unhindered access to the interior. They fail to model centuries long communities practicing significant endogamy within an otherwise mixing region. They have too high a rate of people leaving their village to marry someone from another region.

Migration from one village to another. Migration from one region to another. Migration from one continent to another. Number of people migrating. Which port they would choose. Rohde et al made all of these variables too high, but they convinced themselves they _weren't_ too high, because their max population multiplied by the probability produced a number that "felt" small. They also assumed many things were independent which anyone who has done their own genealogy knows perfectly well are not -- and anyone who knows some history would not even know where to begin the criticism. Then they were surprised that the effects of these parameters didn't just add up or multiply, either.

And at no point did they factor in any technology or resource effects (either in terms of enabling migration or forcing it) and they treated the decision to migrate as one made by each "sim" independently.

If you corrected for these errors, you would shortly discover all kinds of very interesting things -- but none of them would be a MRCA in historical times. Thus, no way to bag the fame.

The world _is_ an island. We are all "cousins". I believe these things. When I belonged to a crazy religion and believed that God created The World and Every Living Thing In It that swims in the sea and so forth, I did _not_ believe He did so in 6 24 hour days and I did _not_ believe that you could add up all the years in the genealogies in the Bible and figure out how long ago that creation event took place. I don't need for us all to have been here for only 6000 or so years. I didn't then. I don't now. I understand it's hard to differentiate yourself from the crowd these days as a scientist, but this is not a good game to be playing.

I would _love_ to see a SimLineage some day (when we have the computing power for it, because we clearly don't yet). But I'd actually be okay with running these models with better parameters. Specifically, parameters that better match my sense of the Laws of Migration:

(1) People don't move.
(2) If they do move, then they stay put.
(3) They go with people they know.

There probably should be a 5th one, about how they pick their destination, but I'm still kind of hazy on exactly how that works. Maybe, (2a) If they do move, they go to the nearest larger place that will accept them.

ETA: I was never entirely certain whether Rohde's sim was a person or a household. I think it was a person. If Rohde et al had instead simulated family units, it might have made more sense.
Tags: genealogy

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.