 |
 |
|
|
| |
|
|
 |
Statistics |
|
| |
|
|
| |
Using genetic markers, we can estimate how closely people
are related to each other. We do this by counting the genetic
differences (mutations) between two individuals. The particular
DNA markers we analyse are called ‘short tandem repeats’,
where small sections of the DNA code are repeated several
times. At any one of these short tandem repeats, the number
of repeats can increase or decrease, usually one at a time.
Thus 9 repeats of the code, GTCA,
may suddenly be copied incorrectly within the body and change
to 10 repeats.
|
|
| |
|
|
| |
|
|
|
| |
|
|
| |
If we compare two known cousins and their Most Recent
Common Ancestor (MRCA), we might see
that the code found in the ancestor is also the same as in cousin
1 but slightly different in cousin 2. The diagram above shows
this example.
This type of ‘mutation’ occurs randomly, although
it is predicted that this occurs roughly once every 500 generations
for any single marker. An enzyme in the body miscopying the
DNA code and inserting or deleting repeat units causes these
mutations. They are fairly infrequent. However, the more markers
looked at, the higher the likelihood of observing a mutation.
It is worth noting that the ‘1 in 500’ figure
is an average – some markers may be slower, some faster.
Since the Y-DNA test uses 21 markers, it can be expected
for a mutation to occur once every 24 generations or so.
To estimate when the MRCA lived, we have to use certain statistical
methods. The model most representative of the actual biochemical
process is the ‘step-wise mutation model’.
A simple form of this model assumes that mutations at any
particular marker are either a ‘one-step increase’,
or a ‘one-step decrease’ – and that there
is an equal chance of both occurring.
However, the calculations for this step-wise mutational model
are complicated especially when two-step increases or decreases
are possible (roughly 1 in 30-50) and so we revert to the
far simpler ‘infinite alleles model’
and is based on the following rules: |
|
| |
|
|
| |
| 1. |
Each new mutation gives rise to a new allele never
seen in the |
| |
population before. |
| |
|
| 2. |
Every time a mutation exists, it creates a new allele. |
| |
|
|
|
| |
Simply put, when comparing two people, you only count the
markers as match/no match. Thus if you have 21 markers, and
20 of them match, it doesn’t matter if the other marker
is off by one or two, it is still only counted as one.
The infinite allele model closely fits the step-wise mutational
model as long as the number of matching markers is high. This
over-simplification will likely underestimate the time when
the MRCA lived, but only a little bit when the markers match
on a high number. This is OK for genealogists who are comparing
two people who are thought to be fairly closely related and
wouldn’t have picked up too many mutations since their
MRCA. |
|
| |
|
|
| |
Using this infinite alleles model, we can estimate how long
ago the MRCA lived. We essentially need to know three things: |
|
| |
|
|
| |
| 1. |
how many markers are used 21 markers. |
| |
|
| 2. |
how frequent mutations occur Once every 500 generations
(on |
| |
each marker) = 0.002% |
| |
|
| 3. |
how many mutations/mismatches are observed ? - (in
this case, |
| |
just one). |
| |
|
|
|
| |
|
|
| |
For the diagram above, we observe 1 mutation/difference between
the two cousins. Using the above 3 variables, we can use the
step-wise mutation model and calculate the Time to the MRCA
(often termed TMRCA and given in number of transmission events
(i.e. generations)). |
|
| |
|
|
| |
For
21 STR markers
Number
of mismatches |
Average
time to the MRCA
(in generations) |
95%
confidence interval |
|
0 |
8.3 |
0.3 to 43.9 |
1 |
20.5 |
3.0 to 68.0 |
2 |
33.2 |
7.7 to 90.5 |
|
|
|
|
|
|
| |
|
|
| |
In other words, with one mutation, the average time to the
MRCA is 20.5. If each generation is roughly 25 years, then this
is approximately 500 years. It is impossible to pin this down
to an exact year as we are dealing with random events, so the
upper and lower boundaries (given by the 95% Confidence Interval)
are between 75 and 1700 years. However, the equations used
assume that the individuals are picked randomly. This isn’t
the case; they are usually picked due to their presumed relatedness
(e.g. they share a surname). This is an unquantifiable factor
but would likely reduce the time to the MRCA much further.
Consider the case where the cousins had an exact match. This
would bring the average time to the MRCA down to just 8.3
generations (just over 200 years) and the 95% confidence interval
to just 0.3 and 43.9 generations.
The graph below ('No. of generations to the MRCA vs. No.
of markers') compares the use of 10, 12, 21 and 25 markers
in their estimation of the time to the MRCA. This shows that
once you reach the bottom of the curve (i.e. after about 20
markers), an increase in the number of markers tested no longer
gives you the same increase in accuracy to the MRCA.
|
|
| |
|
|
| |
|
|
| |
|
|
| |
Many people who share a surname will also share their haplotypes
(i.e. have a 21/21 match). The graph below ('Matches against
21 markers') shows that , mathematically, the most likely
person to have your haplotype is zero generations away - i.e.
you (look at the line 21*). This if course makes perfect sense.
But it also means that as you increase the number of generations,
the probability of matching someone else becomes lower, which
also makes sense. There is a higher chance that mutations
have occured.
If you match someone at 20 out of 21 markers, you'll get
a slightly different probability curve. The most likely MRCA
is now not at zero generations, but further away.
Using 21 markers, it is usual for related individuals to share
an exact haplotype i.e. a 21/21 match, although 20/21 and
19/21 matches should also be considered. Any more than this
and the times to the MRCA are just too long for a connection
to be considered - as most surnames begun much more recently. |
|
| |
|
|
| |
| |
|
|
| |
|
|
| |
|
|
|
|
|
 |
|
North American office: P.O. Box 1028, Richmond, TX 77406-1028 USA tel/fax: Toll free 866-7-DNA-DNA |
European office: 40 Preston Road, Weymouth, Dorset,
DT3 6PZ, UK tel:+44 (0) 1305 834936 fax:+44 (0)
1305 835925 |
|
|