The Browse Raw Data tool may be of limited practical usefulness, depending on how much information you can extract from the data beyond what the 23andMe site already gives you. The raw data provided by 23andMe is an advanced view of all your uninterpreted raw genotype data, including data that is not used in 23andMe reports. This data has undergone a general quality review, however, only a subset of markers have been individually validated for accuracy. As such, the data from 23andMe's Browse Raw Data feature is suitable only for research, educational, and informational use and not for medical or other use. It is important to note that all genetic markers used in our reports are evaluated for high data quality and accuracy.
This article will address the following questions:
- How does 23andMe Report Genotypes?
- Which Reference Genome and Strand Does 23andMe Use?
- What Does Not Determined/ Not Genotyped Mean?
- What Are RS Numbers (rsids)?
How 23andMe Reports Genotypes
The 23andMe genotyping platform detects single nucleotide polymorphisms (SNPs). A SNP is a DNA location, or "marker," in the genome that has been shown to vary among people in terms of DNA base. There are four DNA bases: adenine (A), thymine (T), guanine (G), and cytosine (C). So, for example, at the same genomic location, you might have a C and someone else might have a T. These DNA base differences are known as "variants."
For most SNPs on the 23andMe platform, the 23andMe Raw Data tool reports the marker name (usually a unique number), its exact genomic location, the possible variants at that marker (A, T, G, or C), and the specific variants you have, i.e. your genotype. Because you have two sets of autosomal chromosomes -- one from your mother and one from your father -- you usually have two variants at every location, and your genotype will be reported as a pair of variants, e.g. "G/A."
Some chromosomes don't come in pairs (i.e. the mitochondrial chromosome and, for the most part, the X and Y chromosomes in men), so your genotype will sometimes be reported as a single letter.
Occasionally, for some SNPs on the 23andMe platform, your genotype may be reported as an insertion or deletion (--) of DNA bases instead of just a simple variant pair. Depending on the genomic location, either an insertion or deletion could represent the typical version of the variant. In other words, there are some markers where having an extra base (insertion) is the typical variant and having a deletion is the less common variant. Conversely, there are some places in the genome where having an insertion is rare, making a "deletion" the typical variant at that location.
23andMe does not report on all possible insertions or deletions. In general, the ones reported on are small, spanning only one or a few bases.
23andMe results indicate SNP (Single Nucleotide Polymorphism) positions and DNA bases based on the NCBI human reference genome (a standard version of the nucleotide sequence of the human genome). Both the raw data as well as site features and reports currently use human genome assembly GRCh37 (build 37).
DNA consists of two strands that are complementary to each other. The DNA base "A" always pairs with "T," and "G" always pairs with "C" across these two strands. One strand is called the positive (+) strand, and the other is called the negative (-) strand.
The genotypes displayed on the 23andMe website, including in the Raw Data tool, always refer to the positive (+) strand on build 37 of the human reference genome. This is sometimes different from how other websites or publications refer to a genotype.
If the possible genotypes reported by 23andMe and another source do not match, it is likely that we are referring to complementary DNA strands rather than the same strand. For example, 23andMe might report that a SNP has two versions, G and A. But other sources may report that the versions for that SNP are C and T. Because G pairs with C on the opposite DNA strand, while A pairs with T, both ways of reporting the SNP are correct.
In some cases, a user will not have data at a SNP (Single Nucleotide Polymorphism) location. This is reported in two possible ways: variant not determined or variant not genotyped.
Not determined: In some cases, we are not able to provide a result for a particular SNP. If results cannot be provided, you will see “variant not determined.” In the Raw Data tool, the entry for any uncalled SNP displays '--' instead of a two-letter genotype. If you see this result, our algorithm may not have been able to confidently determine your genotype at that marker. This can be caused by random test error or other factors that interfere with the test. Some “not determined” variants are expected in the raw data and are not a cause for concern.
Not genotyped: 23andMe periodically updates its DNA genotyping platform to take advantage of improvements in technology. The platform used in the analysis of your sample dictates which markers, or SNPs, we are able to provide data for. While many SNPs included in the Raw Data tool are available for all customers, some customers tested on different platforms will have some different markers available in the raw data. If a particular SNP was not included on the platform you were genotyped on, your results for this variant will appear as "not genotyped" in the Raw Data tool.
The rsID number is a unique name ("rs" followed by a number) used by researchers and databases to refer to a specific SNP (Single Nucleotide Polymorphism). It stands for Reference SNP cluster ID and is the naming convention used for most SNPs.
If a probe on our genotyping platform doesn't correspond to a SNP with a clear rsID, or the probe is assaying a DNA change that is not a SNP, that SNP or change is usually assigned an "internal" id ("i" followed by a number). Our researchers may have included some of these "custom" SNPs on our genotyping platform in order to maximize the number of actionable 23andMe features available to customers, as well as to offer flexibility for future research.
In general, most SNPs labeled with an "internal" id in the Raw Data tool will not have a corresponding rsID in outside scientific literature or other third party services.