Michael Cooley's Genetic Genealogy Blog GEN • GEN
3 September 2018

Strother Group-01 Takes Shape

Recent Big Y studies for the Strother DNA Project have revealed what I call an anchor SNP (Single Nucleotide Polymorphism) — a SNP mutation that came into a lineage for the first time with the birth of an identified man, at an identified place, and on an identified date.1 We don't yet know the exact date for this one, but we now know that SNP A20343 claimed its place in genetic history with the birth of Francis Strother in Virginia (possibly Orange County) a bit after 1709. We can call this the Francis Strother SNP, and testing for it can prove or disprove descent from him.2

Those of us with only a rudimentary high school understanding of biology need to throw out most of what we understand when studying the Y chromosome. They play by different rules. Characteristics on the Y do not "skip a generation." There are no dominant and recessive genes on it. The reason for this behavior is very specific: its most important component is the male sex gene. Only men have the Y. It doesn't undergo the recombination we see with the other chromosomes — that is, the exchange of genetic information between the mother/father pairs of chromosomes. There is no "mother version" of the Y chromosome — and, no, the X chromosome is not that.3

The Y, then, isn't altered by contributions from the mother's side of the family. It passes directly from father to son as is. It's a clone, but one that does alter a bit through an occasional mutation, each change passed down to succeeding generations, like moss on a rolling stone. (Averaging the mutation rates among tens of thousands of testers tells us that they occur about one SNP mutation every 144 years.)

The following diagram summarizes our present knowledge about the SNP tree for the Strother family. We see that Francis Strother's descendants have the Francis SNP but that the descendants of his brothers do not, which, in turn, instructs us that their father, Jeremiah, lacked it. It simply didn't come into being until Francis was born.

The two testers at the far right have surnames other than Strother, hence the difficulty in placing them in the known Strother tree. For now, the genetics tells us only, due to their lack of A20343, that neither man is descended from Francis. A bit more on that later.

Note that the white boxes indicate SNPs discovered in the descent from the ancestor to the tester, but that their origins haven't been determined. We cannot yet say, for example, whether Christopher was born with A22909, only that it emerged in the line during that time between Christoper's birth and that of the tester. We need to triangulate in on it, like we did with the Francis SNP, by testing other Christopher descendants. However, Y133708 in the William line can be placed. The testers descended from William are father and son. The father half of the team doesn't have Y133708. Therefore, the son was born with the mutation.

Here's a further simplification of the A20343 descent. Follow the blue.

Because this works, SNPs are perfect for defining whole classes of male descent. All Strother males in Group 01 are, to date, of the haplogroup R1b-BY23988, shown at the top of the first diagram. We've now found a new subclade, a separate tribe, if you will, of the Strother family: R1b-A20343.4 For all we know, in a million years or so it could evolve to a separate subspecies, perhaps Homo francispithecus! But I'm getting a bit ahead of myself — and of evolution.

Strother STRs

Short Tandem Repeats (STRs) all us to arrange individual testers into broad groups, but it lacks the specificity of SNPs. It's SNPs, not STRs, that define haplogroups — those reliable points on to which we can hang our trees. This is because the number of repeats in STRs (short strings of genetic letters) can mutate upwards and, in the next generation, back down again — additions or deletions — which obscures the probable STR configurations of a group's Most Recent Common Ancestor (MRCA). They are, however, useful for statistical analysis. I explore the possibilities below.

Some STRs are found to be reasonably stable over several hundred years. I've found a couple in the CF01 Cooley group for which I belong. Because of the nature of STRs, however, I don't expect them to appear consistently among their respective testers. If we peruse the results of Group 01 at the Strother DNA Project we see that the testers who spell the name Struthers have the values of 15-15-17-17 at DYS464 whereas most Strother testers have either 15-15-15-17 or 15-15-16-17. It's too early to know whether that really means anything, but it might help as more testers come forward.

The following tables help identify STR variation among the seven Big Y testers in Group 01. I'm using only Big Y data in this analysis because the test (which looks at more than ten million locations on the Y chromosome) includes up to 561 STRs.5 If we're going to observe a trend, it'll be best found in large samples.

It's a complex task to compare STRs over several testers and determine meaning. Instead, geneticists calculate a modal haplotype, a hypothetical set of the values for the MRCA. For five of the seven testers listed in the first graphic, the modal would represent Jeremiah. But two testers (the first two listed below) share a yet-to-be-determined MRCA with the others. Our modal, then, stands in for someone who probably sits above Jeremiah X generations and possibly sits in the middle of the A12273 set of SNPs — haplogroup R1b-A12273 (known as R1b-BY23497 at FTDNA.com).

The modal is the average (and/or majority value) of all results rounded to the nearest whole number. For example, the modal for STR marker DYS458 is 17. You can see why:

Kit #83607, the last listed, shows a variation of 1 with the modal. We call the the genetic distance (GD). If the result was 19, the GD would be 2. The six other testers have a GD of 0, meaning they all match one another at that position.

Of the 561 markers sequenced, I've selected only those markers among our testers that have at least one mismatch with the modal (the hypothetical MRCA). The GD — the number of differences against the modal — is determined for each person.

The first two testers in this diagram — those who do not have the Strother surname — are among the most interesting of these results. Kit #126422 is a GD of 0 from the MRCA! That's rather remarkable, and tells me there's a good chance the tester's ancestor likely sits squarely in the Strother tree. Kit #16988, on the other hand, has a total GD of 7 — greater than all the others. That signals he's likely the most distantly related of these seven testers.

We can further simplify the diagram by removing what I call one-offs, those values that provide a GD greater than one for only one of our sample. The problem with one-offs is that it's impossible to determine (with current test results) just where the mutation came into play and, hence, whether it holds any significance. For example, the GD at DYS458 for #83607 (second row, last column, above), might have occurred at his birth. In which case, it would describe only the tester and nothing about his immediate forebears and cousins, which is what we're looking for. We just don't know because we have nothing with which to compare it, and, until we do, the result is meaningless for our study. After all, genetic genealogy (and its grandfather, population genetics) works by finding differences and similarities between target populations, a process that susses out pertinent patterns. Comparing one sample to itself doesn't help us. We might as well subtract zero from zero in hopes of finding meaning in the result. Yes, we do want to see matches — and the more the merrier — but differences reveal where branches occur.

So, let's remove the one-offs and replace the GD with the actual number of repeats for each marker.

Again, we're looking for patterns. We find matching STR values at only two positions: DYS464c and FTY194. However, these testers are father and son, which is not surprising since Y chromosomal inheritance patterns informs us that the son will have what the father has.

Additional testing will show whether these makers will reveal something about the lineage of William Strother, the father/son ancestor and the brother of Francis. My earlier discussion points a finger at DYS464 as having merit to the larger study, and the 15 repeats there might help us down the line. Although the total GD figures are indicative of the degree of relationship between the testers, particularly in regards to our two mystery testers, for now, the current STR study is largely a wash.

Upcoming Tests

This study is limited only to the seven Big Y testers. It doesn't take into account the remaining 37- and 67-marker tests found on the project page; in particular, the possible tell-tale DYS464 marker. So far, then, we have nothing to sink our teeth into. But the SNPs are nicely mirroring the Strother genealogical tree, and that means something. First, it announces, in uncertain terms, that this whole genetics thing works, even to the point that we can recognize specific markers belonging to specific individuals. The Francis SNP, A20343, for example, tells us that the unknown ancestor for kits #126422 and #169888 is not Francis; otherwise, they'd have the mutation. Any of the four William mutations might reveal at least one additional anchor SNP, as might that SNP hanging out in the Christoper lineage.

Because SNP mutations do not sprout in every generation, we can't expect each lineage will have a defining SNP. And for those SNPs that do appear, it'll take some triangulation to determine at which generation they were introduced. But we have four more Big Y test results coming our way over the next two or three months: descendants of three of Jeremiah's brothers and a Struthers.

The Strother DNA data will greatly expand beyond the current study. A break up of the A12273 block of six SNPs (first graphic) might be found, for example. We could develop an appreciation about where and when the Struthers of Lanarkshire separated from the Strother lineage. And we stand to learn more about our two mystery lineages, as well as others that are hanging in the balance.


There's certainly a whole lot more to Strother/Struthers DNA than discussed here. This is just a sliver — and one based on the slender thread of the Y chromosome. As I noted in The Man Who Would Be BY23988, the Allison and Reid families are distantly related. Even further back, we find the MRCA for a clan that includes modern O'Deas and Simpsons. If we tug harder at the thread, more families emerge and a pattern of the tribal movement of populations across Britain and beyond reveals itself. It takes us into the realm of population genetics, beyond ancient history and into prehistory. But not quite "pre." We're watching history unfold from the center of our cells, in our chromosomes, in the very stuff that holds the instructions for life. The Strothers are not only related to the Allisons and Simpsons. Look closer and we encounter cousins who have traversed all the continents over tens of thousands of years and probably well beyond. We think we're learning about Strother genealogy, but this data serves a larger purpose. It's now part of the totality of information collected by population geneticists, archaeologists, anthropologists, and historians, data used to reveal glimpses of humanity going back to the origin of our species and, most importantly, it provides insights to the relatedness among each and every human individual, living and dead. By way of these tests, we're not only lifting the veil on lost history, we're making real contributions. For example, the 400-year-old SNP mutation we've just discovered, A20343, is only one of tens of thousands noted at ybrowse.org. Francis Strother, if only by means of an arbitrarily assigned alphanumeric label, is writ into the history of human genetics.

1 A20343 is a single point mutation, from a G to an A, at position 7917671 of the Y chromosome.

2 In fact, the very same mutation can occur in different lineages, just as one doesn't have to be a Cooley in order to be a legitimate Michael. So, before testing for a single SNP mutation, an STR panel should be taken to verify the general lineage the tester belongs. In many cases, a 12-marker test should be sufficient.

3 I'll one day write an article about the X. But suffice it to say for now only that they're odd critters and have a mix of inheritable patterns. They can be profoundly useful but lack good analogs and defy predictability.

4 The left side of the hyphen in a haplogroup name specifies the major haplogroup, in the case R1b, which is the most common haplogroup now found in Western Europe. In contrast, I belong to haplogroup R1a-YP4491, R1a being the most common haplogroup in Eastern Europe, a subclade of which migrated some thousands of years ago to Scandinavia.

5 FTDNA guarantees the results for something over 300 positions, and some positions are generally not successfully sequenced.