Introducing a New Way to Examine Your 23andMe & FTDNA Data
with a MINOR ALLELE PROGRAM
This simple program lets you examine your 23andMe & FTDNA data in a new way.
The 23andMe v3 chip looks at 960,000 SNPs in your sample and tries to:
- identify the presence of mutations associated with a number of diseases
- find close relatives
- show your ancestral roots
However .. 23and Me does not specifically look for rare SNP results.
The FTDNA chip looks at about 500,000 of the same SNPs and an additional 200,000 SNPs.
So now by using this program you can search YOUR data for YOUR special results.
Or, if you are entering FTDNA results then using the older version is appropriate.
Please note that the program does work with '23andMe' data
for all chromosomes 1-23 and with 'FTDNA' data for Chromosomes 13-22.
More 'FTDNA' data files will appear in due course.
Links to 'openSNP' data sets
FAQS
AN EXPLANATION OF THE THEORY BEHIND THE PROGRAM
Each SNP is bi-allelic, that is you get 2 results when testing an SNP
For example with:
rs6139074 found on Chromosome 20 at coordinates '11244' the test is for 'A' and for 'C'
for this SNP the commonest result to get is 'AA' and less common are 'AC' and 'CC'
In this case 'C' is termed the MINOR ALLELE and has a frequency (MAF) of 0.23446, i.e. 23% approx.
which means, roughly:
about 3 in 4 persons can be expected to be 'AA'
maybe 1 in 6 persons 'AC'
and 1 in 20 only 'CC'
For full details on this SNP look at: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6139074
In respect of this program rs6139074 with its MAF of 0.23446 is not very special
and the program concentrates on SNPs with MAF values of under 0.02
THE CHOICE OF SNPs MADE BY 23andMe and FTDNA
The SNPs selected for use by 23andMe have been chosen because they have been connected with diseases
or because their mapping technique needs about 2,000 SNPs for every million base of a chromosome.
Nevertheless, there are quite a few 'rare' SNPs - and this program concentrates of these.
It is interesting to note there are also a few erroneous SNPs - for example where the wrong alleles have been tested !
And, there are some i-SNPs ('i' for 'Illumina'), 160 in the case of Chromosome 20, for which there is no published data.
WHERE DOES THE DATA FOR THIS PROGRAM COME FROM ?
Most of the MAF values used by this program come from the 1000 genome project,
and are available from the dbSNP (at: http://www.ncbi.nlm.nih.gov/snp)
For those SNPs without specific MAF values, I have used 'European' results and in consequence persons
with non-European ancestry may find the program gives an elevated number of results.
FETCHING A DATA FILE
The current version for 23andMe users is pre-loaded with a set of data files for the the SNPs with a MAF <0.01
FTDNA users will have to download the appropriate FTDNA data files.
The download links are found below the actual program form.
GETTING YOUR RAW DATA
In some ways the trickiest part of using this program is the making of your own data files.
Firstly you need to download your raw data from 23andMe and FTDNA.
Do this by:
Going to the 23andMe website at www.23andme.com and Login, or the FTDNA site.
For '23andMe' the procedure is:
Click on Account
Click on Browse Raw Data
Click on Download Raw Data (at upper right)
Enter your password and your secret data
and proceed to download the zipped file to your computer.
Usually you just need to double click on the .zip filename to 'unzip' the file
and finally use SAVE AS to put your .txt file in a suitable directory
MAKING YOUR FILE
The raw data file is very large and cannot be handled easily by the average home computer
so smaller files for each chromosome need to be made.
Do this by:
OPENing your raw data .txt file using NOTEPAD - this program is my preference, and I keep a shortcut
on my desktop, and drag a filename from a directory to the shortcut whenever I need to OPEN a .txt file.
On my computer OPENing the raw date .txt file takes about 2 minutes (it may take even longer on a laptop);
and you can reassure yourself all is going well by looking at notepad.exe ticking over in the Task Manager.
Use CTRL, ALT, DELETE, choose Start Task Manager, click on Processes, click on CPU (descending order).
Do not worry if your NOTEPAD window has the heading - not responding - as long as the Process is ticking over
all is well and eventually the file will load.
When the file appears, you need to COPY the relevant parts of the file to new files.
Do this CAREFULLY - for example for Chromosome 20 - by:
Using CTRL-F and find rs28971224 - the first SNP of Chromosome 21.
Leave rs28971224 highlighted
Using the righthand slider, go to the end of the file
Depress a SHIFT key
and with your finger still on the SHIFT key, position the cursor at the end of the file
and click the mouse - which should highlight the file after Chromosome 20
Now press DELETE to remove this unwanted part of the file.
Next find the start of Chromosome 20
Maybe using CTRL-HOME, and CTRL-F with rs6139074
And highlight rs6139074
Again go to the end of the file, and highlight the block,
by position the cursor to the end of the file,
depressing a SHIFT key and Clicking the mouse
Next use CTRL-C to copy this block,
OPEN a new NOTEPAD window
Position the cursor in the new window, click the mouse,
Use CTRL-V to paste the block for Chromosome 20 in the window
and use SAVE AS, using a filename such as 'your_chr20_data.txt'
to put your new file in the required directory.
FINALLY, close down the raw data .txt file. Use 'Dont Save'
to keep the whole or the .txt for another time.
USING THE MINOR ALLELE PROGRAM
Instructions for using the program are to be found below the program form.
But putting it simply:
Copy the 'data file' for a particular chromosome goes in the upper box,
And, YOUR results go in the lower box.
RUNning the program give you a list of the SNPs found ot be of interest.
LOOKING AT YOUR RESULTS
The results appear as a number of lines of text.
Such as:
rs13043752 at Chr20:32346969 gave 'AG' The minor allele is 'A' with an MAF = 0.00502
rs17219643 at Chr20:40139346 gave 'CT' The minor allele is 'T' with an MAF = 0.00594
The can be simply highlighted and copied (using CTRL-C) from the lower box,
and pasted (using CTRL-V) into a suitable NOTEPAD file.
For each chromosome there is a webpage showing a selection of results.
E.g. For chromosome 20 CLICK HERE
SNP LOOKUP PROGRAM
In order to make it easier to look up the results, try
CLICK HERE FOR THE SNP LOOKUP PROGRAM
And enter a block of results of any size, such as:
rs33981382 at Chr1:17541475 gave 'AG' The minor allele is 'G' with a MAF = 0.00639
.....more....
rs28532243 at Chr22:40855896 gave 'AC' The minor allele is 'A' with a MAF = 0.00999
to get the results:
<<<<<<
rs33981382 at Chr1:17541475 gave 'AG' The minor allele is 'G' with a MAF = 0.00639
rs33981382 PADI4 peptidyl arginine deiminase, type IV - missense Y309C
('b') 'AG' MAF = 0.00639 Found in 13 per 1000 persons worldwide.
The change Y<>C is fairly common and probably not harmful.
----------------------------------------------------------------------------------------------------------------
rs28532243 at Chr22:40855896 gave 'AC' The minor allele is 'A' with a MAF = 0.00999
rs28532243 CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6 - intron
('b') 'AC' MAF = 0.00999 No published population data.
>>>>>>
UNDERSTANDING YOUR RESULTS
The premise behind this program is that everyone has a few 'rare' SNPs
and these have not been identified as such by 23andMe.
For example in the above data the SNP has been found to be present in a set of results
rs13043752 at Chr20:32346969 gave 'AG' The minor allele is 'A' with an MAF = 0.00502
Looking at the data page for this SNP on
www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=13043752
it can be seen that 'AG' has a worldwide population frequency of about 2%
and also that, the mutation may affect the gene AHCY.
The change in the protein is described as R38W
i.e. an arginine(R) amino acid changes to a tryptophan(w) amino acid at position 38.
A 'heterozygous result, such as 'AG' indicates a 'carrier' status,
which would not be expected to give any symptoms
whereas,
a 'homozygous result, such as 'AA' indicates a complete change in the protein,
and a pathological change may occur.
This program has been written to find such examples of 'homozygous-recessinve' SNPs.
SUBMITTING YOUR RESULTS FOR INCLUSION
I am very happy to add interesting results to a results page.
Please send me the details and say if you wish your email address to be published on the page.
I will only include SNP details which have a MAF<0.01
DATA INTEGRITY
The data used in the data files comes in the most part from the '1000 Genome Project' as found
on the dbSNP website; and is in the public domain.
In the files a SNP with a MAF of X=0 implies there is insufficent population data to assign a meaningful value,
or the data available is contradictory.
And, a MAF value of A [C|G|T|D|I] = 0 implies the population diversity for the SNP is very low.
In about 97% of SNPs there is full agreement between the '23andMe' data and the matching dbSNP page.
However, in about 3% of SNPs it is necessary to 'complement' the dbSNP data, changing A<>T and C<>G as needed.
And, in a few cases, 'directionality' also has to be considered; meaning that 'C/G' may need to considered as 'G/C'.
DATA CONVENTIONS
'23andMe' and 'FTDNA' continue to use the Genome Project Build 36.3 coordinates throughout.
Also a number of SNPs have been merged, so on occasions 2 'rs' numbers refer to the same location.
But in some cases the merging is not totally successful and it is unclear just what is being tested.
RARE AND UNCOMMON SNP SUMMARY
The data files identify the 'rare' and 'uncommon' SNPs;
where 'uncommon' indicates a frequency of the minor allele of less than 1 in a 100 people worldwide.
Please note that the frequencies are not exact.
'23andMe' Data
RARE UNCOMMON TOTAL OF RS SNPs To Do
MAF <0.01
Chr 1a 175 691 24984 } 76130 0 Build 36/37 compatible
Chr 1b 257 1062 24961 } 0 Build 36/37 compatible
Chr 1c 67 353 26185 } 0 Build 36/37 compatible
Chr 2a 57 659 24839 } 76686 0 Build 36/37 compatible
Chr 2b 68 338 24994 } 0 Build 36/37 compatible
Chr 2c 132 467 26853 } 0 Build 36/37 compatible
Chr 3a 133 580 24805 } 62623 0 Build 36/37 compatible
Chr 3b 70 317 24985 } 0 Build 36/37 compatible
Chr 3c 37 111 12833 } 0 Build 36/37 compatible
Chr 4a 35 184 19991 } 54718 0 Build 36/37 compatible
Chr 4b 125 287 19991 } 0 Build 36/37 compatible
Chr 4c 22 106 14736 } 0 Build 36/37 compatible
Chr 5a 26 124 19995 } 55720 0 Build 36/37 compatible
Chr 5b 62 233 19992 } 0 Build 36/37 compatible
Chr 5c 78 317 15733 } 0 Build 36/37 compatible
Chr 6a 343 900 24952 } 62869 0 Build 36/37 compatible
Chr 6b 65 240 24995 } 0 Build 36/37 compatible
Chr 6c 60 193 12922 } 0 Build 36/37 compatible
Chr 7a 79 349 24989 } 50489 0 Build 36/37 compatible
Chr 7b 256 499 25500 } 0 Build 36/37 compatible
Chr 8a 93 341 24994 } 48978 0 Build 36/37 compatible
Chr 8b 46 234 23984 } 0 Build 36/37 compatible
Chr 9a 74 241 24995 } 42689 0 Build 36/37 compatible
Chr 9b 67 281 17684 } 0 Build 36/37 compatible
Chr 10a 54 225 24992 } 49988 0 Build 36/37 compatible
Chr 10b 118 462 24996 } 0 Build 36/37 compatible
Chr 11a 169 915 24979 } 47347 0 Build 36/37 compatible
Chr 11b 67 311 22368 } 0 Build 36/37 compatible
Chr 12a 131 414 24971 } 46698 0 Build 36/37 compatible
Chr 12b 84 483 21727 } 0 Build 36/37 compatible
Chr 13a 52 236 19997 } 35928 0 Build 36/37 compatible
Chr 13b 83 192 15931 } 0 Build 36/37 compatible
Chr 14 97 490 30616 0 Build 36/37 compatible
Chr 15 159 395 28152 0 Build 36/37 compatible
Chr 16 215 1343 29855 0 Build 36/37 compatible
Chr 17 220 1377 26259 0 Build 36/37 compatible
Chr 18 145 215 27848 0 Build 36/37 compatible
Chr 19 172 670 18201 0 Build 36/37 compatible
Chr 20 141 326 23671 0 Build 36/37 compatible
Chr 21 51 142 13304 0 Build 36/37 compatible
Chr 22 110 319 13895 0 Build 36/37 compatible
So do you have any of these 'rare' or 'uncommon' SNPs ?
'FTDNA' data - and still being refined
RARE UNCOMMON TOTAL OF RS SNPs
MAF <0.01 (used by 23andMe)
Chr 13a 33 135 14828
Chr 13b 67 165 12037
Chr 14 67 229 22662
Chr 15 118 265 21055
Chr 16 176 387 22851 (data from Bob May)
Chr 17 163 432 20327 (data from Bob May)
Chr 18 136 177 20960
Chr 19 135 453 14488
Chr 20 131 275 17882
Chr 21 25 104 9951
Chr 22 70 229 10139
PROBLEMS
The program was developed using both 'Internet Explorer' and 'FIREFOX'; and will run on both platforms.
However, online 'FIREFOX' is preferred as 'Internet Explorer' can give trouble - possibly because of 'permissions'
being inappropriate and the expected prompt messages failing to appear, or a general dislike of large data sets.
If you need to stop Internet Explorer, use CTRL ALT DELETE, click on Start Task Manager, click Applications,
highlight the Internet Explorer line, use End Program.
Please report any problems to me.
DISCLAIMER
The intellectual rights of 'Illumina' and '23andMe' to their data are accepted.
No liability can be accepted for assumptions drawn from using this program.
Care has been taken to ensures the integrity and interpretation of the data.
However, there will be errors and for these I apologise.
Quo Vadis ?
I do not know where results obtained from this program may lead, but the studying of 'rare' results
in other fields has lead to some interesting discoveries.
If you do find you have a 'rare' mutation then it can be very useful for tracking through your pedigree,
and you may be able to show links that do not show up on your 'Relative Finder'.
Also, some 'rare' mutations affect gene function altering the sequence of amino acids,
ie. the mutation has a 'missense' or 'non-synonymous'effect.
Mutations which are intergenic, ie. occur between genes, or are found in 'introns', ie. extra pieces of DNA
that split up teh various parts of a gene, are not usually pathological.
FUTURE PLANS
Firstly, to prepare data files for every chromosome.
And, subsequently look at the SNP lists used by other companies