Introducing a New Way to Examine Your 23andMe & FTDNA Data

              with a MINOR ALLELE PROGRAM


This simple program lets you examine your 23andMe & FTDNA data in a new way. The 23andMe v3 chip looks at 960,000 SNPs in your sample and tries to: - identify the presence of mutations associated with a number of diseases - find close relatives - show your ancestral roots However .. 23and Me does not specifically look for rare SNP results. The FTDNA chip looks at about 500,000 of the same SNPs and an additional 200,000 SNPs. So now by using this program you can search YOUR data for YOUR special results.

CLICK HERE FOR THE 23andMe PROGRAM

Or, if you are entering FTDNA results then using the older version is appropriate.

CLICK HERE FOR THE FTDNA PROGRAM

Please note that the program does work with '23andMe' data for all chromosomes 1-23 and with 'FTDNA' data for Chromosomes 13-22. More 'FTDNA' data files will appear in due course.

Links to 'openSNP' data sets

CLICK HERE TO LOOK AT EXAMPLE REPORTS

FAQS

AN EXPLANATION OF THE THEORY BEHIND THE PROGRAM

Each SNP is bi-allelic, that is you get 2 results when testing an SNP For example with: rs6139074 found on Chromosome 20 at coordinates '11244' the test is for 'A' and for 'C' for this SNP the commonest result to get is 'AA' and less common are 'AC' and 'CC' In this case 'C' is termed the MINOR ALLELE and has a frequency (MAF) of 0.23446, i.e. 23% approx. which means, roughly: about 3 in 4 persons can be expected to be 'AA' maybe 1 in 6 persons 'AC' and 1 in 20 only 'CC' For full details on this SNP look at: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6139074 In respect of this program rs6139074 with its MAF of 0.23446 is not very special and the program concentrates on SNPs with MAF values of under 0.02

THE CHOICE OF SNPs MADE BY 23andMe and FTDNA

The SNPs selected for use by 23andMe have been chosen because they have been connected with diseases or because their mapping technique needs about 2,000 SNPs for every million base of a chromosome. Nevertheless, there are quite a few 'rare' SNPs - and this program concentrates of these. It is interesting to note there are also a few erroneous SNPs - for example where the wrong alleles have been tested ! And, there are some i-SNPs ('i' for 'Illumina'), 160 in the case of Chromosome 20, for which there is no published data.

WHERE DOES THE DATA FOR THIS PROGRAM COME FROM ?

Most of the MAF values used by this program come from the 1000 genome project, and are available from the dbSNP (at: http://www.ncbi.nlm.nih.gov/snp) For those SNPs without specific MAF values, I have used 'European' results and in consequence persons with non-European ancestry may find the program gives an elevated number of results.

FETCHING A DATA FILE

The current version for 23andMe users is pre-loaded with a set of data files for the the SNPs with a MAF <0.01 FTDNA users will have to download the appropriate FTDNA data files. The download links are found below the actual program form.

GETTING YOUR RAW DATA

In some ways the trickiest part of using this program is the making of your own data files. Firstly you need to download your raw data from 23andMe and FTDNA. Do this by: Going to the 23andMe website at www.23andme.com and Login, or the FTDNA site. For '23andMe' the procedure is: Click on Account Click on Browse Raw Data Click on Download Raw Data (at upper right) Enter your password and your secret data and proceed to download the zipped file to your computer. Usually you just need to double click on the .zip filename to 'unzip' the file and finally use SAVE AS to put your .txt file in a suitable directory

MAKING YOUR FILE

The raw data file is very large and cannot be handled easily by the average home computer so smaller files for each chromosome need to be made. Do this by: OPENing your raw data .txt file using NOTEPAD - this program is my preference, and I keep a shortcut on my desktop, and drag a filename from a directory to the shortcut whenever I need to OPEN a .txt file. On my computer OPENing the raw date .txt file takes about 2 minutes (it may take even longer on a laptop); and you can reassure yourself all is going well by looking at notepad.exe ticking over in the Task Manager. Use CTRL, ALT, DELETE, choose Start Task Manager, click on Processes, click on CPU (descending order). Do not worry if your NOTEPAD window has the heading - not responding - as long as the Process is ticking over all is well and eventually the file will load. When the file appears, you need to COPY the relevant parts of the file to new files. Do this CAREFULLY - for example for Chromosome 20 - by: Using CTRL-F and find rs28971224 - the first SNP of Chromosome 21. Leave rs28971224 highlighted Using the righthand slider, go to the end of the file Depress a SHIFT key and with your finger still on the SHIFT key, position the cursor at the end of the file and click the mouse - which should highlight the file after Chromosome 20 Now press DELETE to remove this unwanted part of the file. Next find the start of Chromosome 20 Maybe using CTRL-HOME, and CTRL-F with rs6139074 And highlight rs6139074 Again go to the end of the file, and highlight the block, by position the cursor to the end of the file, depressing a SHIFT key and Clicking the mouse Next use CTRL-C to copy this block, OPEN a new NOTEPAD window Position the cursor in the new window, click the mouse, Use CTRL-V to paste the block for Chromosome 20 in the window and use SAVE AS, using a filename such as 'your_chr20_data.txt' to put your new file in the required directory. FINALLY, close down the raw data .txt file. Use 'Dont Save' to keep the whole or the .txt for another time.

USING THE MINOR ALLELE PROGRAM

Instructions for using the program are to be found below the program form. But putting it simply: Copy the 'data file' for a particular chromosome goes in the upper box, And, YOUR results go in the lower box. RUNning the program give you a list of the SNPs found ot be of interest.

LOOKING AT YOUR RESULTS

The results appear as a number of lines of text. Such as: rs13043752 at Chr20:32346969 gave 'AG' The minor allele is 'A' with an MAF = 0.00502 rs17219643 at Chr20:40139346 gave 'CT' The minor allele is 'T' with an MAF = 0.00594 The can be simply highlighted and copied (using CTRL-C) from the lower box, and pasted (using CTRL-V) into a suitable NOTEPAD file. For each chromosome there is a webpage showing a selection of results. E.g. For chromosome 20 CLICK HERE

SNP LOOKUP PROGRAM

In order to make it easier to look up the results, try CLICK HERE FOR THE SNP LOOKUP PROGRAM And enter a block of results of any size, such as: rs33981382 at Chr1:17541475 gave 'AG' The minor allele is 'G' with a MAF = 0.00639 .....more.... rs28532243 at Chr22:40855896 gave 'AC' The minor allele is 'A' with a MAF = 0.00999 to get the results: <<<<<< rs33981382 at Chr1:17541475 gave 'AG' The minor allele is 'G' with a MAF = 0.00639 rs33981382 PADI4 peptidyl arginine deiminase, type IV - missense Y309C ('b') 'AG' MAF = 0.00639 Found in 13 per 1000 persons worldwide. The change Y<>C is fairly common and probably not harmful. ---------------------------------------------------------------------------------------------------------------- rs28532243 at Chr22:40855896 gave 'AC' The minor allele is 'A' with a MAF = 0.00999 rs28532243 CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6 - intron ('b') 'AC' MAF = 0.00999 No published population data. >>>>>>

UNDERSTANDING YOUR RESULTS

The premise behind this program is that everyone has a few 'rare' SNPs and these have not been identified as such by 23andMe. For example in the above data the SNP has been found to be present in a set of results rs13043752 at Chr20:32346969 gave 'AG' The minor allele is 'A' with an MAF = 0.00502 Looking at the data page for this SNP on www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=13043752 it can be seen that 'AG' has a worldwide population frequency of about 2% and also that, the mutation may affect the gene AHCY. The change in the protein is described as R38W i.e. an arginine(R) amino acid changes to a tryptophan(w) amino acid at position 38. A 'heterozygous result, such as 'AG' indicates a 'carrier' status, which would not be expected to give any symptoms whereas, a 'homozygous result, such as 'AA' indicates a complete change in the protein, and a pathological change may occur. This program has been written to find such examples of 'homozygous-recessinve' SNPs.

SUBMITTING YOUR RESULTS FOR INCLUSION

I am very happy to add interesting results to a results page. Please send me the details and say if you wish your email address to be published on the page. I will only include SNP details which have a MAF<0.01

DATA INTEGRITY

The data used in the data files comes in the most part from the '1000 Genome Project' as found on the dbSNP website; and is in the public domain. In the files a SNP with a MAF of X=0 implies there is insufficent population data to assign a meaningful value, or the data available is contradictory. And, a MAF value of A [C|G|T|D|I] = 0 implies the population diversity for the SNP is very low. In about 97% of SNPs there is full agreement between the '23andMe' data and the matching dbSNP page. However, in about 3% of SNPs it is necessary to 'complement' the dbSNP data, changing A<>T and C<>G as needed. And, in a few cases, 'directionality' also has to be considered; meaning that 'C/G' may need to considered as 'G/C'.

DATA CONVENTIONS

'23andMe' and 'FTDNA' continue to use the Genome Project Build 36.3 coordinates throughout. Also a number of SNPs have been merged, so on occasions 2 'rs' numbers refer to the same location. But in some cases the merging is not totally successful and it is unclear just what is being tested.

RARE AND UNCOMMON SNP SUMMARY

The data files identify the 'rare' and 'uncommon' SNPs; where 'uncommon' indicates a frequency of the minor allele of less than 1 in a 100 people worldwide. Please note that the frequencies are not exact. '23andMe' Data
            RARE      UNCOMMON  TOTAL OF RS SNPs	To Do
                      MAF <0.01

Chr 1a       175         691    24984 }  76130		  0	Build 36/37 compatible
Chr 1b       257        1062    24961 }			  0	Build 36/37 compatible
Chr 1c        67         353    26185 }			  0	Build 36/37 compatible
Chr 2a        57         659    24839 }  76686		  0	Build 36/37 compatible
Chr 2b        68         338    24994 }			  0	Build 36/37 compatible
Chr 2c       132         467    26853 }			  0	Build 36/37 compatible
Chr 3a       133         580    24805 }  62623		  0	Build 36/37 compatible
Chr 3b        70         317    24985 }			  0	Build 36/37 compatible
Chr 3c        37         111    12833 }			  0	Build 36/37 compatible
Chr 4a        35         184    19991 }  54718		  0	Build 36/37 compatible
Chr 4b       125         287    19991 }			  0	Build 36/37 compatible
Chr 4c        22         106	14736 }			  0	Build 36/37 compatible
Chr 5a        26         124  	19995 }  55720		  0	Build 36/37 compatible
Chr 5b        62         233   	19992 }			  0	Build 36/37 compatible
Chr 5c        78         317   	15733 }			  0	Build 36/37 compatible
Chr 6a       343         900   	24952 }  62869		  0	Build 36/37 compatible
Chr 6b        65         240   	24995 }			  0	Build 36/37 compatible
Chr 6c        60         193   	12922 }			  0	Build 36/37 compatible
Chr 7a        79         349   	24989 }  50489		  0	Build 36/37 compatible
Chr 7b       256         499	25500 }			  0	Build 36/37 compatible
Chr 8a        93         341   	24994 }  48978		  0	Build 36/37 compatible
Chr 8b        46         234	23984 }			  0	Build 36/37 compatible
Chr 9a        74         241	24995 }  42689		  0	Build 36/37 compatible
Chr 9b        67         281	17684 }			  0	Build 36/37 compatible
Chr 10a       54         225   	24992 }  49988		  0	Build 36/37 compatible
Chr 10b      118         462	24996 }			  0	Build 36/37 compatible
Chr 11a      169         915	24979 }  47347		  0	Build 36/37 compatible
Chr 11b       67         311	22368 }			  0	Build 36/37 compatible
Chr 12a      131         414	24971 }  46698		  0	Build 36/37 compatible
Chr 12b       84         483	21727 }			  0	Build 36/37 compatible
Chr 13a       52         236	19997 }  35928		  0	Build 36/37 compatible
Chr 13b       83         192	15931 }			  0	Build 36/37 compatible
Chr 14        97         490	30616			  0	Build 36/37 compatible
Chr 15       159         395	28152			  0	Build 36/37 compatible
Chr 16       215        1343	29855			  0	Build 36/37 compatible
Chr 17       220        1377	26259			  0	Build 36/37 compatible
Chr 18       145         215	27848			  0	Build 36/37 compatible
Chr 19       172         670	18201			  0	Build 36/37 compatible
Chr 20       141         326	23671			  0	Build 36/37 compatible
Chr 21        51         142	13304			  0	Build 36/37 compatible
Chr 22       110         319	13895			  0	Build 36/37 compatible
So do you have any of these 'rare' or 'uncommon' SNPs ? 'FTDNA' data - and still being refined RARE UNCOMMON TOTAL OF RS SNPs MAF <0.01 (used by 23andMe)
Chr 13a            33        135      14828
Chr 13b            67        165      12037
Chr 14             67        229      22662
Chr 15            118        265      21055
Chr 16            176        387      22851  (data from Bob May)
Chr 17            163        432      20327  (data from Bob May)
Chr 18            136        177      20960
Chr 19            135        453      14488
Chr 20            131        275      17882
Chr 21             25        104       9951
Chr 22             70        229      10139

PROBLEMS

The program was developed using both 'Internet Explorer' and 'FIREFOX'; and will run on both platforms. However, online 'FIREFOX' is preferred as 'Internet Explorer' can give trouble - possibly because of 'permissions' being inappropriate and the expected prompt messages failing to appear, or a general dislike of large data sets. If you need to stop Internet Explorer, use CTRL ALT DELETE, click on Start Task Manager, click Applications, highlight the Internet Explorer line, use End Program. Please report any problems to me.

DISCLAIMER

The intellectual rights of 'Illumina' and '23andMe' to their data are accepted. No liability can be accepted for assumptions drawn from using this program. Care has been taken to ensures the integrity and interpretation of the data. However, there will be errors and for these I apologise.

Quo Vadis ?

I do not know where results obtained from this program may lead, but the studying of 'rare' results in other fields has lead to some interesting discoveries. If you do find you have a 'rare' mutation then it can be very useful for tracking through your pedigree, and you may be able to show links that do not show up on your 'Relative Finder'. Also, some 'rare' mutations affect gene function altering the sequence of amino acids, ie. the mutation has a 'missense' or 'non-synonymous'effect. Mutations which are intergenic, ie. occur between genes, or are found in 'introns', ie. extra pieces of DNA that split up teh various parts of a gene, are not usually pathological.

FUTURE PLANS

Firstly, to prepare data files for every chromosome. And, subsequently look at the SNP lists used by other companies