Match statistics can be misleading when searching large DNA databases

Analysts at an Arizona state crime laboratory noticed a strange occurrence when two individuals in their database of 65,493-person database had the same two markers at nine of the 13 places on their listed DNA profile. If picking a random non-Hispanic person from the population, there is about a 1 in 754 million chance that the person would share the same profile, yet two matching profiles were found in a 65,493.

“The simple explanation for the seemingly improbable matches—which a forensic or statistical expert would see straight away, but police, prosecutors, and testifying lab analysts would not—lies in a mathematical parable known as the birthday problem: How many people must there be in a group to have more than a 50 percent chance that two of them will have the same birthday? Despite the intuitive answer (a very large group), the correct answer is that it takes only 23 people. It’s key to note that the question of the birthday problem is different than asking what the likelihood is that, picking a person at random on the street, that person would have a particular birthday. Similarly, the difference between “Does anyone in the database match anyone else?” and “Does anyone in the database match this evidence?” explains why nine-locus matches were likely to be common in a large database like Arizona’s. ”

For John Pluckett, a 70-year-old suspect in a decades-old cold case, this phenomenon had a detrimental effect. Investigators re-opened an old sexual assault and murder case to analyze a DNA sample, and through a search of a nationwide DNA database, they discovered Pluckett was a match. No other evidence ties Mr. Pluckett to the crime, and statistically speaking, two other people in the area could have matched the DNA profile of the crime. The framing of the statistics was crucial in Mr. Pluckett’s trial. The prosecution was able to use the statistic “1 in 1.1 million,” which is the statistic that would represent what the chances would be of someone matching the DNA profile if they were randomly pulled from the street, and the defense was unable to present the statistic “1 in 3,” which represents the chances of finding a DNA match in the entire government database. The defense was also unable to present the fact that 40 other people in the state at the time likely matched the DNA profile. John Pluckett was convicted of the crime and is currently serving a sentence of life without parole.

http://www.theatlantic.com/science/archive/2015/10/the-dark-side-of-dna-databases/408709/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s