I had hoped to post this weeks ago. I'd been waiting for YFull.com to complete its analysis of the Big Y (and its associated BAM file) for kit #76628,1 but recently learned they're doing a massive overhaul to their system. It might be a few weeks more. Because I'm confident, however, about the discovery of a new haplogroup, I'll go ahead and present the latest findings.
In its initial report for Big Y tests, FTDNA lists the new SNPs they find, but it's a very long time before the new names are acquired. YFull is not only quicker but it searches a BAM file for SNPs that FTDNA did not find. Generally, I get a head start by ferreting out novel SNPs from FTDNA's original reports. I then submit them to yseq.com for evaluation and naming. The last batch was returned in October.
Once FTDNA posted the BAM for this kit, I submitted it to YFull and kept a copy for myself. I converted it to a human readable text file format, known as a SAM file, and started the arduous process of understanding it and writing computer scripts to parse it. I've just gotten to the point where I could pass on the results to the R1a DNA group at FTDNA.
Because FTDNA reports, among other things, SNPs that should not be reported, many of those I mentioned in an earlier article turned out to be bogus. (I've learned from yseq that SNPs found in STR regions, for example, should be discounted.)2 Virtually all the SNPs that appeared in the Cooley/Davis MRCA group were found to be unacceptable. This is the newest version:
We now know that CF04 shares only two of the YP4944 block of ten SNPs with the Davis tester. This means that YP4945 and YP4949, by virtue of the fact they they are shared by the larger Cooley/Davis/etc. population, are older and were present in the mysterious Cooley/Davis MRCA (Most Recent Common Ancestor). The YP4944 block (now smaller than previously supposed) branched off and became the Davis line. This means that YP4944 and A12227 (the Cooley side) are "sibling SNPs" — they're collateral branches.
And this reveals the central problem with trying to date SNPs, which, at present, is considered to average one mutation per every 144 years. Average, of course, is the operative word. (In a list of one hundred items, only one, if any, might represent "average.") If we consider that Davis's YP4944 block now has eight SNPs and that he has six "private" SNPs, multiply 14 SNPs times 144 and we have an "average" of 2016 years to the MRCA — the year 1 B.C. (since there is no year 0!). Yet the Cooley A12227 block is only 6*144=864.3 But have no fear, average is here. By averaging the two branches, we can state that the MRCA lived about 1440 years ago, or about the year 600 A.D. This is certainly not accurate, but it's a lot better.
Of course, this seems preposterous. But as more testers come on board, the more accurate the figures will become. In the meantime, it is accurate to consider that the SNPs for both branches do point to an MRCA — an actual, breathing, and specific man (if unidentified) who lived at a specific time in history. The estimated date of his birth may change time and time again, but the fact of his presence is solid.
But this means, unless YFull discovers new SNPs, that the CF04 Cooley line of SNPs mutated very slowly. If this is true, the lineage may not be as robust as earlier determined. But all the data is not yet in.
Will I discover these new (if any) SNPs myself? My initial run shows way too many candidates. At least part of the reason is that the BAM is reporting mismatches (the value of a base — A, C, T, or G — when in variance with the rest of the population) where there are none. Now that I'm confident the underlying algorithms in my scripts are correct, I can concentrate on writing a routine that will look for them. (In the meantime, I'm confident that the scripts can determine the presence of any known SNP.)
But that's neither here nor there. I'm not in a race with YFull. I just like parsing things. However it happens, sometime in the next few weeks we'll have a fuller picture. At least, for now, we have a much clearer view than we had several weeks ago.
I've received word that the descendant of John Cooley of Yorkshire may order the Big Y test before the end of the year. That would be key for this group as it will provide a degree of evidence toward the place of origins for the U.S. CF04 Cooleys. A Big Y for the Daniel Cooley descendant would also be very helpful in sorting out how the members of the group are related. Because we already have a Big Y for Robert, I'd suggest to the second Robert and the Francis testers that they test only for the six A12227 SNPs.4 This can be done at Yseq.net for only $105. Please contact me if you're interested. This test, however, will not discover any new "personal" SNPs.