Michael Cooley's Genetic Genealogy Blog
20 March 2021

The R1a-YP4248 Subclade Project

The R1a-YP4248 Subclade Project opened its doors about a year ago. Because I'd already been the administrator for several surname projects associated with it (Cochran, Cooley, Hackett, and Whitfield), it wasn't terribly difficult to get started. Although we keep a presence at FTDNA (of course), our real home is at http://dna.ancestraldata.com/YP4248/. There are fewer than 70 members but it's turned out to be highly rewarding.

Surname projects are great. But by definition they're scattered among any number of haplogroups, unrelated families, far-flung regions, etc. This works well for me as co-admin of the Cooley DNA Project; after all, I have an obvious vested interest. I learned early on that the cleanest way to zero in on my brand of Cooleys was to rule out the other Cooley tribes. You know, eliminate the competition or cull the herd, so to speak. The Y chromosome was (and is always) highly efficient at accomplishing that — a genealogical AR-15. In the process, I clearly tagged the other Cooley families and was able to drive the final nail into the coffin of the ill-informed and poorly-executed Dutch Cooley Myth. Because of this journey, I became well-versed in several Cooley patrilineages. This hasn't been the case, however, for every project I maintain. I'm lucky, for example, to have highly motivated co-admins in the Akins DNA Project who do their very best to keep me on my toes.


1. Speculation. Landing of YP4252, 200 BC
2. Confirmed. Yorkshire 1655 (Story BY30796)
3. Confirmed. Derbyshire 1744 (Hackett YP4253)
4. Speculation. Birmingham c1750 (Cooley YP4491)
5. Confirmed. Ireland 1700s (Cochran YP5244)
6. Speculation. Renfrewshire 13th century (Barons Cochran)
7. Unconfirmed. Unk Scotland 19th century (Cummings BY27664)
8. Unconfirmed. Sutherland c1761 (Gray N/A)

Haplogroup projects are of enormous benefit to researchers, and I marvel at how the admins keep them current and moving forward. (Of course, to a degree, organization is baked into FTDNA's project interface.) But haplogroup projects have the advantage of a singularly-fixed objective — all members are descended from a specific haplogroup and, therefore, from a specific unknown, unnamed man who lived, perhaps, as much as tens of thousands of years ago, or more. The path downstream might become rather fragmented, but each new test adds a bit of data, rendering further resolution to the holistic qualities of the tree. Still, these projects are big and bulky.

I propose that subclade projects offer the perfect balance. Memberships tend to be smaller than the upstream major haplogroups. There are not as many surnames to ponder over, the geography is rather localized (until recent centuries, of course), and the MRCA is of a much more recent date. What I'm now calling the YP4248 Man, for example, was likely born a bit prior to 1000 AD, possibly in Britain. There's next to zero chance of ever identifying him, but given enough testers, we'll one day narrow in on his location and general era.

To date, all markers downstream of YP4248 appear to have originated in Britain and spread out from there — at least to Canada and the U.S. We need to go a degree or two above the haplogroup before the lineage branches into descendants having Scandinavian names. This is no surprise considering that YP4248 is descended from the "Young Scandinavian" marker, R1a-L488. Current speculation — and it can be none other than speculation at this point — is that YP4248's parent haplogroup, YP4252, might have landed near present day Yorkshire a couple of centuries before the common era, say about 200 BCE. If that's correct, that would be well before the classic Viking Age.

I've not familiarized myself with YP4248's sibling haplogroups beyond the haplotrees at FTDNA and YFull. But its two known "child" subclades, YP5007 and YP4253, are very much of the project. They're shown on the map separated by an artificial line (probably soon to be proven inaccurate) that starts at the Humber Estuary on the east. Of course, the whole mapping is drawn from minimal data and is incomplete. But that's the beauty of science (and genetic genealogy does fall in that category); it's not immutable. The data is factual, but its interpretation, which is the art of science, changes as it responds to new data.

Until additional evidence comes forth, it's reasonable to reference YP5007 as being Scottish and YP4253 as English. Of course, a millennia and more is sufficient time for a haplogroup to spread throughout a smallish collection of islands. In the meantime, YP4253 is known to be comprised of three surnames: Hackett, Whitfield, and Cooley. (I know that many consider Cooley to be Irish, but consider "coo ley" — cow field.)1 And we're now waiting for SNP results for a Higdon whose STR results almost certainly belong to L448 and possibly YP4248. Interestingly, the lineage has been traced to Yorkshire.

YP5007 is proving to be much more diverse than its "English" counterpart. The Cochrans themselves are revealing a number of genetic branches — all, so far, from YP5244. And it's there, among the two SNPs in YP5244, where we might one day find the first recorded Cochran of this lineage. Subclade BY30798 is showing to be predominately Story, and Rankin appears to be a collateral lineage to both with common origins possibly in an old Scottish tribe.

As inaccurate as this map might be, it does illustrate how genetics, genealogy, history, and even archaeology can be used to come up with an hypothesis which, in turn, can inform these same academic persuasions. (What we do here is not wholly inconsequential.) In fact, as genealogists it would behoove us to occasionally turn to these disciplines for information and inspiration. We mustn't go too far afield, however. We're after facts, not speculative conjecture.

Our facts sit on the Y chromosome, a molecule that passes strictly from father to son. Unlike autosomes, no genetic data is lost generationally. One hundred percent of it is inherited. And because the Y passes partilineally, the lineage is self-described, unlike autosomes for which inference upon inference must be made. The Y merely lacks the names and birth places of each father. And it's those blanks we're trying to fill. In fact, the final bit of data that convinced me that my Edward Cooley was indeed the son of John Cooley (c1737-1811) was found in the Y chromosome.

Thanks to this inheritance pattern, I received the whole archive of my paternal markers from Dad and tested positive for YP4491, YP4253, YP4248, and beyond, including L448. And just as my dad had a brother, so does virtually every haplogroup. And just as Uncle Howard invoked a new branch on the Cooley tree, so did YP5007 and YP4253 on the YP4248 tree. And all our testers can trace back to L448, even back 300,000 years or so, certainly not genealogically but genetically. (Is there a difference?) It's no coincidence, then, that the following tree looks like a genealogical tree. If we had been fortunate enough to know the full paternal lineage for our testers going back 1,200 years, the genealogical tree would neatly align with this one. This is the scaffolding onto which we can pin our ancestors and their descendants.


R1a-YP4248 SNP Tree


The bodies first inhabited by BY27664, BY30798, and YP5244 (the three known subclades of YP5007) were certainly not brothers, but I think you get the idea. And, as I worked on this article, new Big Y results came in for a Rankin member. A cursory study suggests that he presents us with a fourth subclade for YP5007 (the "Scottish" side of the tree). We'll need a matching Rankin to test before we can officially add it as such, but the SNP tree makes it a given. (I'm told there was a cluster of Rankins and Cochrans in the Scottish Lowlands.)


SNP 'birthing' and aging

Every SNP was first born along with a specific man. As such, the first appearance of each alphanumeric designation (SNP names) represents a specific man. For example, the Strother DNA Project now understands that R1b-Y133702 first appeared on the scene with the birth of Jeremiah Strother in 1655 and that, remarkably, A20343 first emerged in Jeremiah's son, Francis, in 1700. I call these anchor SNPs, markers that are anchored to a specific place, a specific time and, in this case, with two fully identified men. These SNPs can then be used to provide context for nearby markers on the tree. And we're close to identifying such a SNP in our project for Cochran Y64331. The two testers have a known common ancestor, William Cochran (c1806-1889). This doesn't represent an anchor SNP, however, because there are two SNPs at that level. They first need to be parsed out. And we don't have test results for any of William's brothers. They could well have had both SNPs, and that would push the anchoring of the SNPs back at least another generation. Still, we can say this for now: Any male Cochran descendant of William's will have those markers (assuming the genealogy is right!). But we can't yet say that anyone with the SNPs are descended from William, inclusive to my definition for an anchor SNP.

SNP emergence is, of course, random. However, by counting the number of SNPs found in each tester's lineage and averaging the count among those of every tester in the world-wide SNP tree, we arrive at a possible duration between SNP births. Setting aside detail, I estimate one new SNP about every four generations in any one lineage. If we take that as fact (and, of course, it's not!) we can estimate that the eleven SNPs found sitting inside haplogroup YP4248 were brought into the world over about 44 generations, perhaps 1,100 years (depending on how we define a generation). But we don't know in which order those markers arrived. So, it's a long way to go before determining which pair of boots, carrying whatever combination of SNPs, first troddled across the eastern shoreline of the Isles. (Dream on, Michael.)

One hundred and five SNP markers have been found among our twenty Big Y testers, an average of just over five SNPs per person. (Three of the testers noted in the tree are single-SNP marker confirmations — YP4491). This tells us that approximately 500 years passed (roughly within the genealogical timeframe) from the testers' average birth dates back to the end of the 1,100-year YP4248 era. And note that the 67 novel SNPs (novel variants, private variants, etc) listed in the light blue box have not yet been placed in the tree. Buried in there are any number of new subclades and anchor SNPs. Find the right testers and we'll begin to better tie things together genealogically.

Yes. I know that's a mouthful, especially when you consider that approximate means "I dunno."


Playing with one more toy before bidding adieu

While working on my CF01 group in the Cooley DNA Project, I soon noticed that 67 STR markers (Y-67) rarely told us anything more than what we have gleaned from Y-37. I don't know how well that dovetails with other admins' experiences, but I also learned that STR markers are fickle creatures; they can grow then shrink. In other words, a value might be 34 at one region in one generation, then 35 in a later generation, and then 33 in yet another. They're useful for gleaning an overall trend but not so helpful in determining phylogenetic relationships.

For this reason, I rarely recommend Y-111, which was quite expensive, and, instead, suggested that the tester save up a bit more for the Big Y. (The better pricing now allows Y-111 to serve as a good stepping stone toward the Big Y.) But when doing the initial research for this project, I discovered that nearly all the Cochrans in the relevant group had upgraded to Y-111, something to which I had previously determined to largely ignore. Now that the project is moving forward and there were only two or three Big Y testers among them, it was time to make better use of the data. What I found extended the usefulness of Y-111 while, at the same time, further highlighting STR shortcomings.

I'm sure there's nothing new with a matrix that illustrates the genetic distance between each tester and each of the others. But I found that the differing number shapes fill their respective cells in unique ways, forming a faint visual pattern. Simple: turn the numeric values into color values. This is what emerged.



My small YP4491 Cooley/Whitfield clan (the Y-DNA being virtually identical) occupies that bright blue box at the lower right. The dark blue represents greater genetic distance (GD) and includes Hackett (at the very bottom), our YP4253 partner. The massive block above us (massive because of the high number of Cochran Y-111 testers) represents the whole of the present YP5007 subclade. Here we have two discretely drawn subclades in living color, if only blue. (I'm red-green colorblind, so don't expect anything more sophisticated!)

Notice how haplogroup names start to clump together. Although I might one day find a better way to sort these out (the whole process is automated), there will always remain some lines of interference due to the fickle nature of STRs. For example, if I remove lines 18 through 21, the topmost Y64331 nestles up closer to its brother and the collateral Tyrie lineage. We now see the trend I mentioned above rather than the clear delineation we're looking for in SNPs. Yet the graphic can still be used to help determine which testers might find benefit in upgrading. For example, additional Big Y results among lines 8, 9, and 10 (the small white box in the upper left) will likely reveal a new subclade, which would include some of 391280's six novel SNPs, the very last kit in the SNP tree. And we're now waiting for Big Y results for our Gray tester, line 17. Based on his placement, I'll dare to predict that he will end up being FT407422, or placed somewhere near it on our SNP tree. But I wouldn't place your bets on any of this yet!

Finally, I've written a little YP4248 haplogroup tester that will check any STR results against the present YP4248 modal. By plugging in the 67 markers belonging to the Higdon tester (for whom we're waiting for SNP results), I get this:



Higdon is three markers outside the min-max values for YP4248, boundaries that will be stretched by future testers. But we need to wait until his SNP results arrive before knowing whether he'll produce such results — or whether he belongs in the project at all. Furthermore, he has a GD of 12 out of 67 STRs compared against the modal. But that might be okay too. Hackett has a GD of 19 of 111, which is roughly the same ratio.2 We'll see. But, again, I wouldn't place any money on it.

However, I would recommend that a surefire bet be placed on the R1a-YP4248 Project by donating at our FTDNA project page! Ta ta!


1. Scottish: "Yer aywis at the coo's tail," meaning you're always late. (https://www.bbcamerica.com/anglophenia/2014/09/scottish-sayings-will-get-through-life). And ley, meaning field, is commonly used as a suffix in English surnames — Bailey, Hawley, etc. A common alternate to Cooley is Colley, generally thought to mean coal field.

2. These values can be found under "Genetic Distance over 111 Y-STRs" at the project page.