Open-Source DNA Databases: What They Are and How to Use Them
If you have taken a DNA test with 23andMe, AncestryDNA, MyHeritage, FamilyTreeDNA, or any of the other major DNA testing companies, you have access to your raw DNA file. This file can be uploaded to several different open-source DNA databases, which also offer a number of tools for you to explore the database. Some open-source databases allow you to search for ancestry information, while others allow you to research health-related issues.
What is an Open-Source DNA Database?
If you Google “open-source DNA database”, you will find several different services and companies based entirely on open-source DNA databases. Some of these, like GEDmatch, allow you to do independent research on your DNA test results to find relatives and research your family tree. Others, like Promethease, allow you to search through a database connecting genetic variants to various health studies.
Unlike the private databases kept by large DNA testing companies, open-source DNA databases are available to all people. Researchers, law enforcement, and private individuals can easily access these databases to conduct research, and these platforms typically have various tools for analyzing their datasets.
Open-Source DNA Analysis
While DNA testing companies are the only places where you can actually get your DNA tested, there are a large number of open-source DNA databases you can upload your genomic data to. In fact, this Wiki page from the International Society of Genetic Genealogy details the numerous databases available. Most of these open-source databases also offer analysis software including analysis tools for conducting your own bioinformatic research.
Many of these tools focus on SNPs, the genetic data measured by DNA testing companies. This open-source software is typically available for free to users. In some cases, small service fees are imposed for the use of a database or a tool, though these prices are much lower than your original DNA test. Using these tools and sites, you can reanalyze your genome sequence and learn for yourself what research can tell you about your family history or health issues.
Is it safe to use an Open-Source DNA Database?
No, at least not entirely. The entire concept of an open database means that anyone has access to the genetic information you provide. While this is not as bad as giving out your Social Security Number to everyone on the internet, it can have negative consequences.
For example, family members of Joseph James DeAngelo uploaded their genetic information to an open-source DNA database. Unbeknownst to DeAngelo, this created a serious privacy concern because it led directly to his arrest.
DeAngelo, now indicted for being the rapist and serial killer known as The Golden State Killer, had previously escaped police custody. Though crime scene DNA was found, it could never be matched to the FBI database. Using the DNA evidence, law enforcement officers searched an open-source DNA database without a court order. They didn’t need one because of the nature of open-source databases.
While this may not seem like a concern to you if you are not a serial killer, you should be a little concerned. The human genome is 99.9% the same across all humans. Your whole-genome only has a tiny amount of changes which distinguish you from everyone else. As technology progresses and more people add their raw DNA data to databases, almost anyone will be identifiable based on their genetics. While solving cold cases and violent crimes is applauded, the potential applications of open-source databases are concerning. You can read more about these potential abuses in this article.
Are Open-Source DNA Databases Easy to Use?
Open-source is often a monicker for ‘bare-bones’. Open-source applications and databases have enormous functionality, but they lack the user-friendly interfaces we are used to from large, for-profit companies. If you are interested in using these open-source tools, get ready for a steep learning curve.
If you are not up to the challenge, you may want to look into finding a genetic counselor or genetic genealogist to do the work for you. Genealogists are specifically trained in a number of DNA testing tools and have learned how to analyze DNA profiles, SNPs, and all aspects of your genetic variants.
Another option is sticking with a genealogy website or for-profit genealogy database. While there are fees and subscriptions involved with this, it is often much cheaper than hiring a counselor and much easier than learning the entire science of genetics.
Other Option for Raw DNA Data Analysis
If you still want to research your DNA test results, there are many other options. Many of the companies will take the DNA sequencing results from one DNA testing company and use that DNA sample to give you results in areas which were unreported. Many of these services are free and can give you information about your health risks, family history, or other information simply by uploading your raw DNA data to their site.
To find out more DNA upload sites, check out our article "The Best DNA Upload Sites"