Michael Cooley's Genetic Genealogy Blog GEN • GEN
23 July 2019

Big Y-700 Results For Akins AF04 and AF05

Sometimes we just have to wait.

I have no idea what problems FTDNA encountered before the release of the latest results for the Akins DNA Project, except that a large number of their Y-700 customers needed to provide fresh samples. A ten-month wait is a long time but, finally, it was worth it. I'll conclude this article with a partial assessment of my Y-700 experience to date. First, the long-awaited results (starred).



Figure 1: Akins AF04/05/06 SNP Tree

Definitions are needed for the benefit of those readers who have little knowledge about the subject. The Y chromosome passes virtually unchanged from father to son, generation after generation. Every now and then a mutation occurs — spontaneously as far as anyone knows — and that change, along with any predecessor mutations, are, in turn, passed down to the next generation of males. With this, branching can be detected.

Genetic genealogists are interested in two kinds of mutations: Short Tandem Repeats (STRs), which I'll not describe in this article, and Single Nucleotide Polymorphisms (SNPs), which happens when a single genetic letter (A, C, T, or G) has flipped to one of the other ACTG values. (And when I speak here of upstream I mean "up the tree." In other words, your second great-grandfather is upstream of you father.)

The above diagram is a SNP tree, a genealogical tree laying out the genetic changes found on the Y chromosome since the most recent mutation (Z304) common to all testers. (Just like distant cousins share the same second great-grandfather and his antecedents, the starred testers, above, share BY171114 and all its ancestral SNPs.) SNPs are important because they tend to live in highly stable regions of the Y and are passed down through male lineages for tens of thousands of years along with all accumulated (or antecedent) SNPs in the lineage. This pattern continues through every male descendant until the lineage itself becomes extinct. Furthermore, these SNPs are born (mutated) into the germ cells of specific men. Of course, we generally don't know the name of the man in whom a SNP originated, so we assign him (or the SNP, really) an alphanumeric identifier, such as A25027, which, in this case, is the 25,027th SNP named by Yseq.net (the "A" being for team member Astrid).

Haplogroup Ages

Three Akins groups (AF04, AF05, and AF06) in the Akins DNA Project belong to the broad R-U106 haplogroup, one of largest of the R1b umbrella haplogroup that dominates Western Europe. More specifically, all three lineages share the Z304 marker (also known as S264). YFull.com estimates Z304 to be more than 4,000 years old. (Needless to say, the sharing of a surname between branches as distant as BY12480 and DF98 can only be a coincidence.)

Our new tester (#241547) is the first of the AF05 Akins group to have done advanced testing. He and AF04 share haplogroup R-BY71510 with a Tierney tester (#755074). That SNP, however, isn't on YFull's radar. With only nine testers in our lineage, any age estimate is going to be a shot in the dark. Suffice it to say for now that the haplogroup immediately upstream of it, R-Y13174, is estimated be to be about 3,000 years old.

Our next step down the SNP tree brings us to the recently established R-BY171114. ("R-" is short for "R1b," the umbrella haplogroup.) This a point of great interest because the marker is shared by AF04 and AF05, uniting the groups, if you will, under the same banner. The question is whether the person first born with the SNP lived in the genealogical timeframe or was nearly as ancient as his predecessors. The probabilities can be estimated.

To guesstimate the age of a SNP or haplogroup, we merely count the number of SNP mutations that emerged since the target SNP itself was born. These mutations occur irregularly throughout the male lineage, and we can identify and name them only to the extent they've been detected. Keep in mind, then, that because three new SNPs, for example, are found during the cell-smashing, chemical-stripping, and software-filtering gauntlet that our spit samples are subjected to, we can't be certain whether more will be found in future testing. The technique for aging, then, is necessarily skewed by what has not yet been discovered.

Still, we can take a stab at it. Let's look at BY171114, the parent SNP of the AF04/05 Akins group. Descended from it are eighteen private SNPs (those that are not presently shared) among the eight testers. By averaging SNP counts across its entire database, YFull estimates that a new SNP makes an appearance at the average rate of once every 144 years. Using this 144-year formula, we can calculate that the Most Recent Common Ancestor (MRCA) for AF04 and AF05 was born about 324 years "before present." (Present is defined as the year 1950.) In other words, the "founder" of BY171114 was likely born sometime after the year 1600, about the time of the founding of Jamestown. Likewise, with only one SNP between the BY172259 testers, their MRCA would have been born within the last two hundred years, which is in keeping with the known genealogy.

Figure 2: SNP evaluation (data from Yseq.net)

According to the law of large numbers, these estimates are probably not too far off. But there's a lot to consider. First, we're interested only in those SNPs living in regions on the Y chromosome that have shown staying power. This characteristic is lacking with STRs and the chromosomes' centromeres: those regions experience a high degree of mutation and are considered not suitable to phylogeny. Other areas on chromosomes look so much like areas on another chromosome that the lab can't know for sure to which the sequenced segment belongs.1 (The three-billion-plus positions in our genomes don't come pre-numbered. Locations can be inferred only by understanding the surrounding data.) Unless any of these SNPs prove to be significant in future testing, they should be set aside.

I've also removed SNPs from the tree that likely belong further upstream. For example, kits #241547, #147812, and #B11295 have various combos of SNPs FT51412, FT50625, and FT51423. I first thought we had a new subclade, but their appearance is too inconsistent to know for sure. They probably belong upstream.2

These eliminations have resulted in the reduction of the number of private SNPs reported by FTDNA: from fourteen SNPs for kit #241547 to six, and from fourteen to two SNPs for #B11295.) The SNPs in green on the left are those that are likely upstream and those in grey have duplicate segments elsewhere in the genome, making them difficult to assess.

Let's look at one more example. PH2163 (AF06) is believed to be about 432 years old (nine SNPs, divided by three testers, times 144 years). BY179781 is about 360 years old. But the sample size is small; the estimate is less likely to be accurate. It simply amounts to the the difference between tossing a quarter ten, a hundred, or a thousand times.

Finally, note that the first diagram has been dramatically simplified. For example, haplogroup R-BY71510 is actually comprised of twelve SNPs, and R-BY171114 has a total of six known markers. Kit #241547's route to R-Y13174 takes us through 24 SNPs (6 + 6 + 12). According to YFull's formula that amounts to more than 3,400 years. But because we followed only one route, the law of large numbers tells us that that's likely not accurate. More routes to need to be studied for a more accurate estimate. Again, the more testers we have, the more accurate the estimate.

Akins Genealogy

Admittedly, I'm not an Akins genealogist. My interest is limited to my narrow thread of descent, which happens to be a collateral lineage to our AF05 tester. That line is proven back to William Eaken, possibly born in Ireland around 1700 and died in Bucks County, Pennsylvania in 1766. There are some thoughts about his parentage but, so far as I know, nothing is proven. He is clearly, however, a genetic cousin of some degree to Henry Akin, born in Ireland in 1694 and died in Connecticut in 1788. (There are a couple of small STR differences which appear to separate AF05 from AF04, but that topic will be another article.)

A clearly-defined genealogical tree can be overlaid onto a genetic tree. For example, it's now clear that the MRCA for BY172259 (far left) was Baron Dekalb Akins (1861-1958). We don't know yet whether the mutation originated with him or his father, John Calvin Akins. Any living descendants of Baron's brothers can answer that question. If the genealogy is correct, Baron's grandfather could not have had the marker, which is demonstrated by the fact that his newphews' lineages lack it. BY172259, then, is a near example of what I call an anchor SNP, a SNP that can be attributed to a specific name and, thereby, a specific birth date and place. A better understanding of the degree of relationship to the other Akins will be acquired once the upstream SNPs in the R-BY171114 haplogroup are parsed out through subsequent testing.

In the case of haplogroup R-BY179781 (far right), the two testers are father and son. All five "novel" SNPs virtually belong to both men and live upstream in or near the R-BY179781 block, which otherwise is comprised only of the one SNP. The next upstream haplogroup, PH2163, however, is comprised of 41 SNPs! Obviously, we need many more samples in order to sort that out. (The greater the marker difference, the greater the relationship.)

Recommendations

With several Big Ys under the AF04/05 belt, single-SNP testing can provide those member not having advanced testing with very specific data. I would hazard to say that all AF04/05 members will have BY171114. If anyone is in doubt, the matter can easily be settled. The cost at FTDNA for a single SNP is $39 and $18 at YFull.

Figure 3: The Iceberg of Knowledge
Recent Problems

Being more impatient than smart, I wrote a computer program a few years ago that extracts all unnamed SNPs from a BAM (a raw data file), which is the raw, partially processed datafile. The idea behind the procedure involved the expectation that most named SNPs are upstream SNPs and unnamed SNPs have emerged more recently. I managed to successfully identify new SNPs and haplogroups in forty or more BAMs. But the new Y-700 has presented some problems. FTDNA has stated that the increased sample size (up to 50% larger) will eventually bring the average rate of SNP detection closer to one every 80 years. The law of large numbers tells us that's good. But consider where those SNPs reside.

Considerable testing of late on the Strother DNA Project has found that most lines appear to have approximately two to three SNPs going back to the MRCA, who died in 1702. But FTDNA's increased sample size is discovering upwards of two dozen novel SNPs. (A good case in point is the aforementioned 41 SNPs in R-PH2163.) That's okay — we want more data. But where did they come from? Eventually, FTDNA determined (at least in this Strother example) that the vast majority of the new SNPs belong substantially upstream to William Strother, the MRCA. Statically, this makes sense:

There's a well-known graphic that shows internet genealogy representing the very tip of the documentation that's out there to be found. When we recognize that Y-SNPs have been coming into the human lineage for three hundred thousand years and more, we understand that the vast majority of discovered SNPs are ancient, and that only a relatively small number came to us within the genealogical timeframe. Until matches are found, these are considered private SNPs. Naturally, the more data gathered, the smaller that tip will become, while the vast majority belong well-below sea-level, so to speak. FTDNA's fifty-percent-more sample translates to about fifteen million tested base positions. That's considerable, but it's still a fraction of the roughly fifty million nucleotides found on the Y chromosome.

All of that's okay. Any data is good data — whether ancient or recent. But with all the new upstream SNPs being discovered for the first time, Y-700's larger sample size has rendered my not-so-old computer program inadequate. I guess I'll need to return to my former state of impatience, at least for now, while FTDNA sorts out this pile of data. (Even as admins, we have access only to slivers of the data.)

I've been a customer of FTDNA's since 2006 and a project administrator since 2012. It sometimes appears to us that whenever the company rolls out a new product, it's done without sufficient heads up, with insufficient (it seems) testing, and with a degree of marketing not proportional to their readiness. This has resulted in ten-month long waits, such as experienced by our new AF05 tester. But I also know from personal experience that, in the end, they largely get it right. I just wish they'd sit still long enough for some of us to adequately catch our collective breaths and come up to speed.

The reader can see that there's a lot of fluidity to genetic testing. New results literally provide new results. The Y-DNA SNP tree will not be settled until after a few million people have tested their complete Y chromosome. In the meantime, don't hold your breath; take things as they come.

(Sigh.)


1. An example of this is A25027 (grey in the diagram). The segment matches another segment elsewhere on the Y chromosome. Sanger short-read testing might not reveal it. Therefore, Yseq.net doesn't test for it.

2. BY207698 (in blue) could go either way. Note, however, that it was discovered last year on another tree. That doesn't mean a mutation cannot belong on two different branches, but, considering what we're now seeing with Y-700, we should remain skeptical for the time-being.