Upload raw data from AncestryDNA, 23andMe, MyHeritage, FTDNA, WeGene (China) and others or upload a Whole Genome Sequencing (WGS/WES) file in .vcf.gz or .vcf format (1 GB max file size). If you are uploading WGS/WES and have both a SNP and Indel files, please upload the SNP file. No information about uploaded files are saved or shared. Uploaded files are de-identified on our server and deleted within 24 hours.
Frequently Asked Questions
23andMe, AncestryDNA, MyHeritage, FTDNA, WeGene (China) and Whole Genome/Exome Sequence (WGS/WES) VCF data is supported. Other companies that format their data like 23andMe data are also compatible. Virtually any WGS/WES data is compatible with this service as long as the data is aligned to hg19/GRCh37 or hg38/GRCh38. Low-Pass Whole Genome Sequencing (Nebula Genomics, Gencove) is also compatible.
If choosing a consumer genomics company, we currently recommend AncestryDNA as they have the most clinically-relevant data.
Generally speaking, the accuracy of the reports reflects the accuracy of raw data in combination with the accuracy of our interpretation of that data. While consumer genomic data definitely has its share of accuracy issues, accuracy is much better than some of the news articles makes it out to be. If you go be reprodicubility measures, the reproducibility of BeadChip Arrays is around 99.99% or greater. Reproducibility is not the only factor in determining accuracy, but high reproducibility is very important for high accuracy.
The accuracy of Whole Genome and Whole Exome Sequencing (WGS/WES) data can vary. Accuracy of non-low-pass WGS/WES data should exceed the accuracy of any consumer genomic BeadChip array as long as the provider of the VCF uses a good variant caller with good filtering strategies. Data with higher depth (30x and greater) is going to be more accurate and more represenative of the whole genome. However, since VCF files don't typically contain reference variants, we have to assume because the variant is missing from the VCF file, it matches reference. This can be a faulty assumption. Read "What are the known bugs or issues?" for more information about this issue. Low-Pass sequencing from companies such as Nebula Genomics do not have this issue since they report all reference variants. However, more genotypes may be missing with low-pass sequencing and results may not be as accurate as clinical-grade sequencing.
When interpreting variants, sometimes we intentionally swap the reference and alternate allele if the reference allele represents the risk allele. However, there are no guarantees of report accuracy or lack of programmatical errors in interpretation of the data.
Consumer genomic data (23andMe and Ancestry) represents a very small amount of the genome — roughly 0.02%, while Whole Genome and Whole Exome Sequencing (WGS/WES) represents (nearly) 100% of the genome and exome respectively. While the consumer genomic companies use a customized BeadChip array and try to select SNPs that are more important and/or have more variance, each SNP chip version will produce raw data that has a different set of SNPs. And this set of SNPs can vary between companies. Not every chip will have every variant in the panels. Variants that are missing from the raw data will be represented as "variants not found in your file" since they were not genotyped by the company that produced the raw data. Furthermore, the chips that the consumer genomics companies use don't accurately represent multi-allelic variants.
While Whole Genome and Whole Exome sequencing (WGS/WES) don't have these same set of problems, they have problems of their own. VCF files typically don't report reference variants to keep file sizes compact (hundreds of megabytes instead of tens of gigabytes). For this reason, we assume that if the variant doesn't exist in the VCF file, it matches reference. Variants that match reference will usually be reported as greeen, but if the reference variant is the risk allele, a reference variant may appear as red (homozygous for the reference allele).
While this approach usually works, it is not error-free. If certain areas of the genome or exome are of poor mapping quality or if certain variants are of low quality, this assumption doesn't always work. And currently, in latest reference genomes (hg38/GRCh38), the CBS gene doesn't map properly. If you are using hg38/GRCh38, CBS variants may show incorrectly as green when there may be a heterozygous (yellow) or homozygous (red) variant. For these reasons, we currently recommend using hg19/GRCh37 with the Methylation Panel.
Furthermore, Some variants are multi-allelic (they have three or more observed alleles) and 23andMe and Ancestry data doesn't report multi-allelic nature of these sites. For the sake of data consistency between consumer and WGS/WES genetic data, we are not currently represeting the multi-allelic nature of variants that can be observed in WGS/WES data. We realize that ultimately, this isn't the best solution, so the way variants are represented may change in the future.
Yes. But oddly, this reference has problems when it comes to mapping the CBS gene. It will not accuraately represent variation in the CBS gene. We recommend using hg19/GRCh37 beacause of this issue. Read "What are the known bugs or issues?" for more information.