This post may contain affiliate links, meaning I get a commission if you decide to make a purchase through my links, at no cost to you. The products that I advertise are the ones I believe in.
How do you compare DNA testing companies? Which DNA test is the best? Well, size does matter when comparing DNA testing companies and here’s why: it’s all about their DATABASES which includes factors such as number of USERS, FAMILY TREES, and most importantly the size of their REFERENCE POPULATIONS.
Reference Populations As Sample Sizes
When it comes to choosing DNA testing companies, size does matter. Not all of the companies I’m comparing (Ancestry, MyHeritage/Geni, and 23&Me) are equal in the features they share, and there are countless reviews online which strip those individual sites down point for point in terms of their services offered, but the ONE THING that is missing in most of those analyses is the SIZE of the database that you are adding your genetic material into and hoping to gain information from.
Your genetic results are based solely on the genetic REFERENCE POPULATIONS you are testing against, and each company’s reference populations are different in scope and size.
DNA testing companies are all different. Each DNA testing company tests against different databases. If they tested against the exact same database, you’d get the exact same results – assuming their testing methods are also identical. The results of your DNA ethnicity estimates could be severely distorted because while you are looking to reconstruct the puzzle of YOU the company you’ve chosen doesn’t have all the pieces!
Think of a DNA test as a REFERENCE TEST. If you asked 20 people that knew me in high school, they’re likely to describe my character and tell you different stories about me based on how they knew me. People that knew me from the football and wrestling team might tell you I was a stud, people that knew me from hanging out between classes might tell you I was a big goofball, and people that knew me in social settings would probably tell you I was kind of a weirdo.
Which comparison would be the most accurate? Well, the answer is that they are ALL accurate. So how can they be so conflicting? It all has to do with the SIZE of the REFERENCE POPULATIONS you were interviewing which equates to a statistical, scientific sample size.
Increase your sample size and you increase resolution and clarity.
Adjust your sample size from 20 people that knew me in high school to 200 people who I’ve gone to college and worked with and, boom, you’ve not only increased the sample size of your database but you’ve also gained a more holistic understanding of who I am. Hopefully more pieces in that puzzle creates a better picture of me . . . well, let’s keep hoping.
Our individual lives are a metaphor for our ancestors’ over time as we move from place to place, population to population. Sample a micro reference population that your ancestors came from and your sample size sucks, so to speak. Our genetic material has been raked together from all of the places and populations that all of our ancestors have been around over time. A sucky reference panel can be the equivalent to a wolf in sheep’s clothing.
Not All DNA Testing Companies Have The Same Reference Populations
Ancestry calls their reference populations their REFERENCE PANEL. MyHeritage calls their reference populations FOUNDER POPULATIONS. 23&Me calls their reference populations REFERENCE DATASETS. Here’s how they break down.
Ancestry’s ‘Reference Panel’
Of all the 3 DNA testing companies I’m reviewing, Ancestry has the largest reference population (aka, Reference Panel) at approximately 16,000 people (at the writing of this article in March 2018). These 16,000 typical natives of each region that they’ve aggregated is what you get compared to, hence their disclaimer in the ethnicity results dialogue box.
UPDATE: As of November 2021, Ancestry now has a whopping 56,580 DNA samples as its reference panel size. It jumped from its original 3,000 to over 56,580! Compare screenshots below from the original to 2018 and then 2021.
I can attest to this in that my ethnicity estimates, now called a “DNA Story,” has changed dramatically from 2012 until now. What I do like about Ancestry is that they are extremely transparent in explaining to you how they arrived at the results they give you. There are also numerous links and ‘white papers’ available for reading on Ancestry’s website – this is a good thing.
It should also be noted that USERS and REFERENCE POPULATIONS are NOT the same thing. When you take a DNA test you are not receiving your ethnicity results based on all of the ‘users’ in the database, only against its ‘reference populations.’
On Ancestry, that means your DNA is getting tested against a sample size of 16,000 individuals.
Having tested with both FTDNA and Ancestry, I have found that FTDNA’s results were slightly more accurate back in the day. However, as Ancestry improves its database and match algorithms it has continually gotten better. As of 2019, it is has now far surpassed FTDNA. My Ancestry “DNA Story” is like 1000x more accurate now than in 2012. FTDNA, to be honest, should be left for dead very soon.
The reference population that determines your Ethnicity Estimates on Ancestry is based on a relatively large sample size of over 56,580 regional genetic archetypes. Your DNA matches and ability to find genetic relatives, on the other hand, is compared to their entire user database of *4 million, which is, in fact also relatively limited. No pun intended.
(*estimates might actually be as low as only 2 million paid subscribers which grossly effects their viability)
MyHeritage’s and Geni’s ‘Founder Populations’
With MyHeritage and Geni, as a combined company, you get 320% less than Ancestry in terms of reference populations (aka, Founder Populations @ 5,000) yet a whopping 96% more users to compare to! Jiminy Crickets!
The rate of growth of MyHeritage and Geni is astounding, it’s even tempted me to test with them as their primary database is European. MyHeritage’s reference population is approximately 5,000 people. In addition to a more robust sample size, MyHeritage’s partner site Geni.com allows you to connect with over 120 million users world-wide. They are growing so fast you can actually watch it in real time with this nifty counter! Wowzers!
The thing that is also attractive about MyHeritage is that while they have over 5,000 genetic regional representatives comprising their ‘Founder Populations’ today, that number is sure to grow in the future. Greater sample size equals greater resolution.
What also separates MyHeritage from Ancestry is the quality of their sample size. If you read their warranty below it states that their Founder Populations are “hand-picked for this project from MyHeritage’s 92 million members.” Statistically, that is awesome!
By comparison, Ancestry has created their 16,000 sample size ‘Reference Panel’ from a pool of only *4 million (possibly only 2). While my ethnicity percentages might not be much different from Ancestry’s results if I tested with MyHeritage, it may provide a level of analysis not available to Ancestry due to its limited scope. However, while Ancestry used to show me as a percentage of “Eastern European,” it has now updated that to read “Baltic Nations.” One day it might be able to properly reanalyze that as “Lithuanian.” I wonder if MyHeritage would?
MyHeritage and Geni are fast becoming the DNA and genealogy behemoths. One can’t help be enamored by the sheer size and opportunity of being a part of these sites. Although the user interface on Geni is much to be desired, that can be fixed pretty easily and is akin to renovating the ultimate beach-front property because it’s all about “location, location, location” and, snakes alive does MyHeritage and Geni have an amazingly strategic location in the DNA testing companies market!
Read my full article: Is Ancestry Losing Its Market Share to MyHeritage and Geni?
23&Me’s ‘Reference Datasets’
Of the 3 DNA testing companies, with respect to reference populations and sample sizes, 23&Me is the most intriguing. Although I haven’t tested with them I’ve heard a lot of good things about them. Let’s start with their reference population, what they call their ‘reference datasets,’ which is over 10,000 people.
23&Me used to have the largest sample size to test against, hands down. It is still double that of MyHeritage yet just under that of Ancestry currently. Statistically, 23&Me should offer you binoculars when looking at your ethnic composition and be able to pinpoint for you with a high degree of accuracy where your ancestry is ultimately from.
23&Me has a user database of over 5 million people, which is also quite impressive. Being kind of the upstart company in the shadow of Ancestry, 23&Me has really pushed the envelope in advertising to chase down market shares away from Ancestry, within the United States at least.
What makes Ancestry ultimately attractive is that it is still the king of digitized genealogical records; ironically though a lot of what you pay for on Ancestry can be found for free on Family Search. What impresses me about 23&me is that they also have a nicely-crafted webpage which neatly explains your relationship to their methodology.
If you look closely, the actual number constituting their reference population is 11,091. I think they’ve done an excellent job in stripping away the statistical bias in their datasets by looking at reference populations in a purely endogamous manner. I believe testing at 23&Me would be highly beneficial.
In addition to their 5 million users, this site also includes in their reference datasets two very important research projects: the Human Genome Diversity Project, and the 1000 Genomes Project. In his book A Brief History of Everyone Who Ever Lived, author Adam Rutherford offers us his unique perspective on the origin of human genetics as a contributor to these projects. To have such scholarly weight supporting your DNA testing company’s website is something to be underscored!
As I’ve previously written, our human genome is like a book with our chromosomes being chapters and our genes being highlighted passages – actually I owe that metaphor to Adam Rutherford from his book mentioned above.
These DNA testing companies are attempting to unlock information deep within ourselves that is millions of years old to reveal a multitude of different faces, the faces of our ancestors. Not all companies function the same, therefore not all companies will deliver you equal results – nor equitable results for that matter.
The single, most important component in accurately analyzing your DNA is the ‘SAMPLE SIZE’ it is being tested against, what we call a ‘REFERENCE POPULATION.’ The bigger the size of the reference population the greater the degree of accuracy in the results of your ethnic admixture and origins. Arming yourself with this knowledge of which company carries which reference population sample size in their armament is what you should know in this genetic game of size does matter.
WE’D LOVE TO HEAR FROM YOU – MAKE SURE YOU LEAVE A COMMENT BELOW
SIGN UP to stay up to date on the latest posts from the Family History Foundation.
I’m curious about ethnicity estimates. When I first took the Ancestry test c2012 they had me mostly showing Scotland, some Ireland, and tiny percentages of Iberian Peninsula and Finland; one of my sisters who did the same test showed similar numbers but showed Sweden. Since, they have changed the estimate to only Scotland and Ireland, and a couple later estimates have me now as “gained some Ireland”. It’s almost like they discounted the significance of the earlier smaller percentages, but if you look at the history of how populations moved about, my small percentages make good sense.
Great question, and something every Ancestry DNA user deals with, myself included! Mine started out with the Scandinavian swathe, then that morphed into British Isles, then some of that drifted over to Scotland, etc. What is happening there is two-fold: (1) the testing companies are updating their reference populations; and, (2) they are adding in new reference populations. The net effect is that as they update their databases, our test kit results get resampled and reassigned. It’s like tuning in G# or A♭ haha. Yes, when you consider the deep history of the places our ancestors came from, there has been A LOT of movement throughout history, a lot of conquering, intermixing, and varying anthropological social systems, so, as you say, the small percentages do make sense. Just read the fine print, where it states +/- 5% may not make that “label” relevant.