Michael Cooley's Genetic Genealogy Blog
25 June 2020

Overview for Cooley Y-DNA Group CF04

Despite the surname and individual being tested, Y chromosomal analysis is the same. One man might be John Baker son of Samuel Baker while I'm Michael Cooley son of Jack Cooley. The Y-DNA markers might differ, but the inheritance pattern is exactly the same. Every man's Y matches his father's to the same degree that my Y matches to my father — virtually 100%. Whether I write about the Pettits, the Fisks, the Wrights, or the Ashenhursts, the essential contents of any two articles is largely the same, only the particulars change.

When it comes to the Y chromosome, throw aside the common perceptions about DNA inheritance. A male doesn't get half of the Y from his mother, there are no recessive-dominate concepts to struggle over, and no generations are skipped. Simply, the Y chromosome is not inherited in the same way that the one-billion-plus other scraps of DNA are. It carries the male sex gene (the SRY gene). Instead of a Y, women inherit a second X chromosome from their fathers, which lacks the SRY. Therefore (no surprise), they are women. The Y passes as-is, whole hog, from father to son much like a baton passes in a relay. The carriers change, but the baton remains. And it doesn't matter whether the teams are Hogue, Foster, Strother, Hatfield or McCoy. The game is always played the same way.

It just so happens, however, that this article is rubber-stamped Cooley, and stamped with a specific brand of Cooley, what we at the Cooley DNA Project call CF04. It's a small group but somewhat genealogically diverse, if not so much genetically. The group's origins are undoubtedly English but none of the testers can be traced back any further than the mid-18th century.

CF04 is represented by these STR testers.

This color coding for location is used in the next three diagrams, which involve Short Tandem Repeats (STRs), those numbers represented on the results page at the Cooley DNA Project. Although I generally stress Single Nucleotide Polymorphisms (SNPs) in my recent articles, the group's low turnout for advanced SNP testing necessitates a review of STRs.


Thee numbers result from the number of times (or REPEATS) that a short string of genetic letters (i.e., SHORT) are found near one another (or in TANDEM) at a specific region of the Y chromosome. For example, the column marked A (above) represents a region called DYS449. It's comprised of X repeats of the sequence TTTC. The CF04 Cooleys, at least among those who have tested, range between 32 and 34 repeats.

The first caution about STRs is that they are not terribly stable over the generations. A string can be added or deleted with a birth in any generation. In column A, five testers have 34 repeats of DYS449, one has 33 repeats and two have 32 repeats. We have little idea, then, what value their common ancestor had. It's a similar situation with the B and C columns. (In another of the Cooley groups, a tester has three more repeats at a position than all the others in his group. Their common ancestor born only about 250 years ago, which makes it a bit of an anomaly.)

Although the first twelve positions tend to be stable over fifteen or more generations. The Maryland and Pennsylvania families have three occasions in the sample that doesn't comply with this "rule," whereas the Maryland and English Cooleys are gold in this regard. The parameters, however, can change on a case by case basis, meaning that it's not possible to determine an exact lineage using STRs. But by gaining a sense of their tendency to drift, the results can be arranged into groups of lineages that are likely to be related. In fact, looking at the Maryland and Pennsylvania examples alone, I would generally be forced to conclude they did not co-inhabit the genealogical timeframe. But the English testers certainly function as a bridge between them. This is the same group, even if parts might be distantly related.

STR patterns and anomalies.

The boxed markers in the above graphic are what I call one-offs in that they diverge from the vast majority of markers. This suggests they occurred somewhere in the tester's lineage after the Most Common Ancestor (MRCA). How far back we can't know. But here's an example that has been resolved. I have 34 repeats of DYS449 (Cooley group CF01) whereas almost everyone else in the group has 33 repeats. It turns out that my dad was part of the majority with 33 repeats. The mutation, then, occurred at my birth. I'm the anomaly. Still, potential patterns do emerge, like the two seen in pink for the Daniell Cooleys. But it takes more than two testers to make a pattern. Additional descendants of Daniell's might confirm or deny the notion.

STR genetic distance

Patterns, when they do emerge, are more likely to show themselves over a range of values and for specific combinations of Y-DNA locations, which makes for complex algorithms containing any number of ifs, ands, or elses — and something I've never delved into.1 But there is a helpful shortcut for considering degrees of relationship. It's called Genetic Distance (GD), which is simply the numeric count of marker differences between two testers. That's seen above with the GD of three over the first twelve markers (3 out of 12) between the Maryland and Pennsylvania testers. Furthermore, the GD between testers 53013 and 83087 is 8 out of 37 markers. Like the three-marker difference, this is a little much for consideration. However, in context to the rest of the group, we can be confident about the grouping. Judgment through fudging makes STR analysis sort of an art and provides another indication that we should not be too literal about STRs. There's simply too much randomness involved.2

The law of large numbers helps us out. Two or three flips of a coin isn't going to tell us anything particularly meaningful about what to expect when the coin is flipped, but the average of, say, 500 tosses will. Testing hundreds of STR markers over hundreds of testers will provide invaluable insight into population genetic variation. We don't have nearly that many testers but we have hundreds of markers.

GD over hundreds of markers

The following diagrams illustrates how we can get a better handle on genetic distance by increasing the number of markers. The first box shows the GD over 37 markers using every possible pairing of testers. The Maryland pair have a GD of 2, the England pair a GD of 4, and the Pennsylvania testers have an average GD of about 2.37 within its subgroup. These values suggest that, indeed, the three subgroupings are reasonably accurate. But look at the boxes adjacent to the Maryland and England groups. They range in value from 6 to 8, an average of about 6.5. By continuing the comparisons, we can guess that Daniell Cooley (Maryland) was somewhat closer related to the Pennsylvania group than the English group, which is somewhat contrary to what 12 markers inferred.

But this is a small sampling. FTDNA's Big Y product looks at about fifteen million positions on the Y chromosome and extracts both STRs and SNPs (the latter being the next topic). I've extrapolated STR averages over 561 markers. Unfortunately, the only the Pennsylvania group, descendants of brothers Robert and Francis have done this test. Because of the lack of genealogical variety, I've pulled in the results of two members of the Davis family. They're not genealogically close, but close enough genetically to serve the point.



Davis Family

Like the graphic on the left, the genetic distance over 561 markers is relatively small within the subgroups, but considerably larger in the adjacent boxes. Indeed, the Most Recent Common Ancestor (MRCA) between the CF04 Cooleys and the Davises might have lived more than a thousand years ago.

The point here is that we can place STR results into generally-related groups having members possibly related to one another over dozens of generations. But SNPs can narrow that to a handful of generations. Still, the old refrain needs to be repeated: We need more testers.

Big Y SNP results

There's a strong element of randomness in biology (witness the current Covid-19 madness). But non-STR regions can be highly stable. A Single Nucleotide Polymorphism (SNP) results when a single (SINGLE) genetic letter (a NUCLEOTIDE or base) mutates to one of the other values (a POLYMORPHISM). When this occurs in a stable region of the Y we have genetic gold. Some years ago, FTDNA found about sixty Y-SNPs that are more than 300,000 years old. These markers passed down through a lineage for about a ten thousand generations! Random mutations occurred around them for eons. Because of the nature of those SNPs that live in stable regions of the Y, a large number survived the gauntlet to the present. Just as importantly, the genealogy is inscribed in each marker in that we know the markers passed straight down the male line, a lineage that many of us can trace back, by name, through a half dozen, a dozen, and more generations. And once we get a fix on an ancient SNP, there's a chance they can be traced back further into the historical era.

For example, the identifying SNP for my group (CF01) is YP4491. All male Cooley descendants of John Cooley (c1737-1811) have that marker, as do testers in at least two collateral lineages. This can be determined with confidence because SNPs, unlike STRs, lack the inherent tendency to bounce around in value. It's true that Y-testing is sometimes inconclusive. Otherwise, it's a binary choice: yea or nay. These remarkable facts provide an accuracy not found STR analysis, and the SNPs in the white box will be critical to future analysis. More on that in a moment.


Novel SNPs (listed at the bottom of the tree) are never-before-seen markers (much like the novel coronavirus). Patrilineally-related men have the same markers, sometimes with some light variation. But we won't know that until they test. Once a second tester comes along, they are no longer novel. They're shared.

The random elements involved with SNP emergence are at least three-fold: the value to which the "ancestral value" mutates, the position at which the mutation occurs, and the timing of the mutation. We can arrive at a tentative timeline by averaging the rate of mutation across the entire SNP database. For the obsolete Big Y-500 test, which tested about ten million positions, YFull.com came up with an average mutation rate of about one every 144 years. But the Y-700 looks at about 50% more sample and finds more SNPs. If a detailed study has been conducted for its SNP rate, I haven't seen it. But the Big Y-700 results I've looked at suggest a rate of one SNP per every 45 to 50 years. I would liberalize that to say that they generally occur about every two to five generations — on average.

YFull estimates the age of the YP4945 haplogroup — the second block listed on the tree — to be about 1400 years old. Our sample complies with the law of large numbers to the extent that millions of positions have been sequenced. But the contributor size is quite small. Still, my calculations informs that the Davis results (Y-500) averages to about 127 years per SNP (somewhat close to the 144 year prediction), and the Francis Cooley testers (Y-700) would have an average rate of about 82 years per SNP to take the lineage back 1400 years, to about the year 600. These rates are close enough to our expectations that we can accept them as reasonable. But there's nothing fixed about the year 600. More testing will adjust that estimate.

The Cooley/Davis example is useful for demonstration purposes, but the relationship appears to be so old that it doesn't help genealogically. In some era hence, however, these results, with the aid of many others, will triangulate to a specific place and tribe. This would be the stuff of population genetics and archaeology and would play into the very history of Britain itself. More about that in a bit. The essential point here is that SNPs are reliable landmarks, unlike STRs.

The white box

The fifteen SNPs in the white box in the Cooley column is where future action lay. The Maryland and English Cooleys are hiding out there. More Big Y testing can flesh them out and help determine something about the degree of relationship between the three groups.

Although our ancestors' markers are embedded in our genetic code, the names, of course, are not. But every SNP is born inside a specific man. For example, in the Strother Project Jeremiah Strother was born in 1655 with (then) novel SNP R1b-Y133705 and his son, Francis, was born in 1700 with SNP A20343. This was determined among triangulated results of multiple testers. In other words, what we once knew only as Y133705 now has a specific human name. One need not travel far into thought to see the terrific advantage provided to Strother researchers.

Although we may never learn the names of the "founders" buried in the white box, we can parse them out. And they will parse because we've already determined there's a degree of genetic distance between the three Cooley subgroups. Testing outside the CF04 Pennsylvania group should do just that.

Nordic heritage

By looking at matching testers far afield of Cooley and Davis, clues are found to their deep history. For example, there are several matches to testers with Scandinavian names, including Johansson, Jonsson, Sandberg, Månsson, Ambjörnsson, Averstad, Pettersson, and Martinsson. This isn't a surprise. The descent from a marker called R1a-S5301 or "Old Scandinavian" has long been known. The following diagram at FTDNA takes this long view, making it clear that the English group split from the Scandinavian group after the emergence of the YP4693 haplogroup. YFull estimates the MRCA to be about 1450 years old.

We can't take these timeline estimates too seriously. Look too closely and conflicting counts emerge. But the migration to Britain certainly occurred after the Roman era. And the CF04 founder (yes, a single man) likely stepped ashore on the British coast somewhere between the end of the latter era of the Saxon invasions and during the Viking incursions.



The father of Francis and Robert was either Joseph or William, depending on which early accounting one accepts. (I believe they had a brother Joseph.) And it's believed the family came to America during the 1760s, perhaps about the time that John of Yorkshire married. Because Daniell of Maryland died in 1729, he was clearly of an earlier generation, perhaps two or more. That would explain the greater STR genetic distance. The post-Roman descent will be worked out at some date well into the future and with a great many more tests. But the white box can be well-sorted with two or three advanced tests.

I suspect some of the samples in FTDNA's vaults are becoming too old for Y-700 testing, requiring new samples for advanced testing. And it might be that some of the original contributors are no longer willing or available for providing a new sample. In that case, a Cooley cousin, uncle, nephew, etc., can serve in the same way. If the genealogy is correct, he will have the same Y-DNA markers. And considering that the membership for this group has become rather static, new recruits would be desirable. I'm of a different Cooley group, so my results won't help. Still, I'm more than willing to continue to work on this — as long as there are results to work with.

Finally, the Pennsylvania group is well situated. At this point, I see no reason for further Big Y testing among them. They've found a single, distinct SNP that defines their group, a unique marker for their founder, R1a-FT157756. If anyone wants to confirm descent from this man, whomever he turns out to be, FT157756 can be tested at YSEQ.net for $18 (+$6).

1 FTDNA does make broad predictions for ancient haplogroups based on Y-STRs. For example, the often-seen R-M269 might be seven thousand or more years old.

2 But there's a limit to fudging. There are far too many differences between CF01 and CF04 to suggest they're genealogically related, although they do both belong to the major R1a haplogroup.