Michael Cooley's Genetic Genealogy Blog
22 February 2022

A Little About Y-STRs

Two things first. The Y chromosome, for reason that it carries the male sex gene (SRY), is inherited only by males from their fathers and can be tested only by men. It holds genetic markers that are unique to the tester's paternal lineage. This allows two unrelated Johnson families, for example, to see just how far they're unrelated. (The more markers that match, the closer the relationship.) For example, my Cooleys share a common ancestor going back 5,000 years with another Cooley clan, and 25,000 years with another. That all of us are Cooleys has nothing to do with biological inheritance.

Next, here's a description of the two basic types of genetic mutation. STR stands for Short Tandem Repeats. This means that short repetitions of a defined series of the genetic letters A, C, T, and G, sit near one another inside a defined region of the Y. I have 34 repeats of TTTC in a region called DYS449, whereas most of the others in my group have 33 repeats. (I suppose I can reasonably be called a mutant, a characterization my family and friends would likely give a nod to.) On the other hand, SNPs (Single Nucleotide Polymorphisms) result in a single "morphed" genetic letter to another, say an A to a G. More on that in a bit.





34 repeats of DYS449

So, looking at STRs.


FTDNA's Y-12 ($59) is often enough to properly place results into a project. Typically, however, hundreds of matches might be found. But if the surname was passed down due to biological descent, the project administrator might be able to place the results into a specific surname group with a reasonable degree of confidence. Still, this test ferrets out only 12 markers. That's too little data to work with. In comparison, FTDNA's Big Y can sequence up to fifteen million bases.

More markers mean more resolution. STR-matching confidence goes up as one moves up to Y-37, Y-67, and Y-111 testing, with the Y-37 recommended for entry-level testers. The cost for 37 markers is reasonable ($119) and very often provides sufficiently-narrowed matches to allow proper identification of the tester to the correct group. Personally, however, I consider the Y-111 largely useless except for one factor mentioned below. But first a bit about the behavior of STRs.

Y chromosomal mutations occur at the creation of sperm. Of the millions produced, any number of differing mutations might appear among them. It's the luck of the draw as to which sperm and which mutations (if any) the child inherits. Thankfully, mutations are rare. (We'd be in a lot of trouble as a species otherwise.)

Still, STRs mutate more frequently than SNPs and the number of repetitions can go either up or down. This greater frequency is why I acquired an extra repeat somewhere over last nine generations above the usual 33 for my tribe. But there are some among my Cooleys who have lost a repeat over the same period of time leaving them only 32. Repeats are so variable that another more distantly-related Cooley managed to acquire three extra repeats in another STR region, DYS439, and it seems with the previous five or six generations. It's this issue that makes STR results virtually impossible for building a genetic tree. They just can't be trusted.1

SNPs are different. Yes, they can also reside in volatile regions, such as within STRs, the chromosome's centromere, or the two pseudoautosomal regions, PAR1 and PAR2. Remaining, however, are vast stable regions (if something microscopic can be considered vast) that have remained virtually unchanged for thousands, even hundreds of thousands of years. In fact, some markers are known to have survived for at least 330,000 years, perhaps passed down from father to son over approximately 95,000 generations! The high degree of stability of SNPs make them highly suitable for constructing trees and making rather solid judgments about the degree of relationship between one branch and another.

Considering there are a sufficient number of testers, SNP trees are very telling. The descendants of the MRCA for Group 1 of the Strother DNA Project, for example, is largely known. The SNPs perfectly overlay the genealogy. However, a large portion of the Pettit-Mellowes Project is not so well known. The two trees, in fact, look pretty much the same. Names known or not, the SNP tree is a near-mirror of the genealogical tree. Indeed, it needs even less interpretation.

Although the gleaned STRs from a Y-111 test can further narrow the matches, it often narrows them to 0 matches, thereby reducing their value. On the other hand, upgrading to the Y-111 will help spread the costs should the tester intends to to eventually upgrade to the Big Y, even if it costs more in the end.2

So, there it is. STRs for general grouping and SNPs for building a reliable tree of some accuracy, the accuracy depending, of course, on how many testers are involved.


1. Perhaps father and son can be traced with repeats that are otherwise mismatched to the group, perhaps even some generations upstream. But it's difficult to find such patterns over multiple centuries. Still it happens. Using the Cooleys again as an example, close to two dozen members of my clan consistently show a mutation otherwise not found in the broader population of haplogroup R1a-YP4491. Somewhere along the line we lost two repeats at DYS464b. Furthermore, this is the only marker that clearly distinguishes most of us from the "distantly-related Cooley tester(s)" mentioned above. The DYS449 mutation clearly came into the lineage some time prior to the birth of our MRCA (c1739), which infers that the "others" who have that version matching the broader YP4248 population is not from our MRCA but descend from some degrees above him.

2. In addition to the several million bases the Big Y sequences, it also identifies 700 STRs.