Michael Cooley's Genetic Genealogy Blog
24 August 2020

Akins Group AF04-AF06 and SNP Counting

It seemed a good idea some months ago to report on these two groups as one. I had referenced them in earlier reports to show the deep genetic divide between them. But this, now, was an ill-conceived idea. Rather than speeding up the reporting, it slowed it down considerably. I did, however, gain personal benefit in that it motivated me to write several scripts to automate, to a degree, the creation of some of the graphics I typically use.

A SNP is a Single Nuclear Polymorphism, a single-point mutation from one genetic letter to one of the other three. The letters (nucleotides or bases) are Cytosine, Guanine, Thymine, Adenine — or CGTA.



Z304, of the major R1b haplogroup, at the top of the following tree, is estimated to be more than four thousand years old. This means the two subclades illustrated here have no socially relevant genetic connection to one another (at least not through the paternal lineages). Their connection exists well outside the genealogical time frame. (It's true the lineages might have derived from a common population and might have intermarried over thousands of years; even in the last several hundred years, but genetics isn't going to tell that story.)

Each of the SNPs listed in the table were born in a single man. I have come to refer to them as Z304 Man, DF96 Man, DF98 Man, and etc. Some of those men are presently clumped together in haplogroups and will, in time, become individualized once new testers come along. (Anyone who has followed the Y-DNA for any time has observed it over and over.) Some of these haplogroups are quite large. PH2163, at the far right, has 33 SNPs. They emerged over lives of at least 33 more, perhaps double that and more. Perhaps over two thousand years and more. High-count SNP sequencing of the Y chromosome — in larger and larger numbers of tests — is the only way to break large haplogroups into two or more smaller groups. And from this, a hierarchical tree can be constructed, older groups above the younger. (We know, for example, Z304 is older because it is the sum of two — in this sample — individual populations.)


SNP Tree for Akins AF04-06 (R1b-Z304)


Click the image to enlarge

This article, however, concerns a procedure in which we can estimate a time frame. It's important to understand, however, that the results are nothing more than a temporary estimate. The resultant numbers are not hard facts and can't be counted on. But they do provide the researcher an idea about the scope of the regular emergence of these powerful genetic markers.

I described in Anchor SNPs and the Strother DNA Project that SNPs can be identified with real names, birth places and dates, but only for those well-within the genealogical time frame — and rarely, at that. Otherwise, we need to rely on the law of large numbers to develop even a notion about where and when these lineages unfolded. (A knowledge of genetics and of the birds and the bees provides us with the how of the matter.)

Large-scale testing of FTDNA's Y-500 product informed the community that an average SNP creation rate is about one every 144 years. Again, that's an average calculated over thousands of tests, and it can vary greatly on a lineage-by-lineage basis. A Struthers tester, for example, has only one novel SNP over the period time that another lineage gained fifteen. In other words, there are no facts to be gleaned from this exercise. And, in the end, we need to rely on a bigger dataset, such as that at YFull.com.


1446 / 34 = 42.5294117647059

FTDNA's Y-700 uses a much larger sample and, therefore, finds more SNPs. My personal experience with several small projects prompts the suggestion of a SNP rate of about one every fifty or eighty years — and that's not yet a number over which to get into an argument over, as demonstrated here. But we can arrive at a number that seems appropriate for one's project by counting all the SNPs in every lineage and dividing the total product by the number of testers. This provides the average number of SNPs birthed from the emergence of the target haplogroup, say R1b-Z304. And since we have an estimated birth date for Z304, four thousand years, we can divide our SNP average into that to get an average rate of successful spawnings. (Fun word.)

A quick and dirty calculation shows a total of 1,446 SNPs and an average of a little better than 42.5 SNPs per lineage. If we accept the 4,000 year estimation, the average rate is about 94 years per SNP, slower mutation rate than my previous observances.

I've stated some caveats. And statisticians might reasonably, even correctly, and probably successively, argue with this method. But no matter the method, the two figures — the target SNP (or haplogroup) versus the average mutation rate, no matter how lowball or highball, will inform one another up until we (never) arrive at a reasonable estimation.

As I illustrated with this small sampling of two branches derived from Z304, so can be done with any of its subclades or with any project. The estimates can provide project members with added perspective. The challenge, however, for administrators is to have access to the greater numerical data, not only from a single company's database, but to realistic aggregate data from all entities involved. (I'm an old, not-full-throated socialist but lean in that direction, and I strongly respect the motto E pluribus unum).