Forensic Science International: Genetics
Forensic Science International: Genetics
The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA Susan Walsh a , Fan Liu a , Andreas Wollstein a , Leda Kovatsi b , Arwin Ralf a , Agnieszka Kosiniak-Kamysz c , Wojciech Branicki d,e , Manfred Kayser a, * a Department of Forensic Molecular Biology, Erasmus MC University Medical Centre Rotterdam, The Netherlands b Laboratory of Forensic Medicine & Toxicology, School of Medicine, Aristotle University of Thessaloniki, Greece c Department of Dermatology, Collegium Medicum of the Jagiellonian University, Kraków, Poland d Section of Forensic Genetics, Institute of Forensic Research, Kraków, Poland e Department of Genetics and Evolution, Institute of Zoology, Jagiellonian University, Kraków, Poland 1.
Introduction Over the last few years, the prediction of externally visible characteristics (EVCs) from DNA has been an interesting topic of study for many reasons, in particular, its anticipated use within forensic genetics [1–3] resulting in the chosen term Forensic DNA Phenotyping (FDP). The ability to predict the physical appearance of an individual directly from crime scene material can in principle help police investigations by limiting a large number of potential suspects in cases where perpetrators unknown to the investigating authorities are involved. These include cases where conventional STR proﬁling could not provide a hit within the forensic DNA (proﬁle) database, or could not provide a match with a suspect singled-out by police investigation, or cases where an STR proﬁle could simply not be generated due to low quality and/or quantity of DNA available.
Forensic Science International: Genetics 7 (2013) 98–115 A R T I C L E I N F O Article history: Received 2 May 2012 Received in revised form 25 June 2012 Accepted 23 July 2012 Keywords: Hair colour Eye colour Forensic DNA Phenotyping DNA intelligence Externally visible characteristics A B S T R A C T Recently, the ﬁeld of predicting phenotypes of externally visible characteristics (EVCs) from DNA genotypes with the ﬁnal aim of concentrating police investigations to ﬁnd persons completely unknown to investigating authorities, also referred to as Forensic DNA Phenotyping (FDP), has started to become established in forensic biology.
We previously developed and forensically validated the IrisPlex system for accurate prediction of blue and brown eye colour from DNA, and recently showed that all major hair colour categories are predictable from carefully selected DNA markers. Here, we introduce the newly developed HIrisPlex system, which is capable of simultaneously predicting both hair and eye colour from DNA. HIrisPlex consists of a single multiplex assay targeting 24 eye and hair colour predictive DNA variants including all 6 IrisPlex SNPs, as well as two prediction models, a newly developed model for hair colour categories and shade, and the previously developed IrisPlex model for eye colour.
The HIrisPlex assay was designed to cope with low amounts of template DNA, as well as degraded DNA, and preliminary sensitivity testing revealed full DNA proﬁles down to 63 pg input DNA. The power of the HIrisPlex system to predict hair colour was assessed in 1551 individuals from three different parts of Europe showing different hair colour frequencies. Using a 20% subset of individuals, while 80% were used for model building, the individual-based prediction accuracies employing a prediction-guided approach were 69.5% for blond, 78.5% for brown, 80% for red and 87.5% for black hair colour on average.
Results from HIrisPlex analysis on worldwide DNA samples imply that HIrisPlex hair colour prediction is reliable independent of bio-geographic ancestry (similar to previous IrisPlex ﬁndings for eye colour). We furthermore demonstrate that it is possible to infer with a prediction accuracy of >86% if a brown-eyed, black-haired individual is of non-European (excluding regions nearby Europe) versus European (including nearby regions) bio-geographic origin solely from the strength of HIrisPlex eye and hair colour probabilities, which can provide extra intelligence for future forensic applications.
The HIrisPlex system introduced here, including a single multiplex test assay, an interactive tool and prediction guide, and recommendations for reporting ﬁnal outcomes, represents the ﬁrst tool for simultaneously establishing categorical eye and hair colour of a person from DNA. The practical forensic application of the HIrisPlex system is expected to beneﬁt cases where other avenues of investigation, including STR proﬁling, provide no leads on who the unknown crime scene sample donor or the unknown missing person might be. ß 2012 Elsevier Ireland Ltd.
* Corresponding author. Tel.: +31 10 7038073; fax: +31 10 7044575. E-mail address: email@example.com (M. Kayser). Contents lists available at SciVerse ScienceDirect Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig 1872-4973 ß 2012 Elsevier Ireland Ltd. http://dx.doi.org/10.1016/j.fsigen.2012.07.005 Open access under CC BY-NC-ND license. Open access under CC BY-NC-ND license.
Using EVC information obtained from the crime scene material via FDP, police would then proceed with more concentrated enquires, and ﬁnally request standard forensic STR proﬁling only for the reduced number of EVC matching suspects aiming DNA individualisation for court room use.
Obviously, the more EVCs that are predictable from crime scene material, the better a person’s appearance can be described, and in turn the smaller the number of appearance-matching potential suspects for subsequent forensic STR proﬁling. Also in missing person cases where a body was found decomposed with no EVC information discernable from visual inspection, or body parts that do not provide EVC information including bones, FDP is expected to provide leads for ﬁnding the right antemortem samples or family members for ﬁnal STR-based identiﬁcation. The use of DNA (or other biomarkers) for investigative purposes termed ‘DNA intelligence’, rather than for identiﬁcation purposes in the court room as currently applied in forensics, marks a completely new application of DNA in forensics and is currently at the early stages of development.
At present there is only one FDP tool available that has already been developmentally validated for forensic use and that is the IrisPlex system, capable of predicting eye colour from DNA [4,5]. Although other studies have suggested DNA markers and methods for predicting externally visible traits, most notably eye colour [6–18] none of them introduced a tool that had undergone systematic forensic developmental validation testing as of yet. The IrisPlex system allows the prediction of eye colour from minute amounts of DNA (31 pg DNA input full proﬁles) and has proven to be 94% accurate for predicting blue and brown eye colour when tested on a European set of >3800 individuals .
However, work is on-going with regards to identifying the underlying genes and developing predictive DNA markers for several other EVCs  such as skin colour [8,20,21], hair colour [6,8], body height [22,23], male baldness , and hair morphology [25,26].
The previous progress on categorical eye colour DNA predict- ability together with the strong genetic and phenotypic relation- ship between eye and hair colour variation, as well as the increased understanding of the genetic basis of hair colour, all suggest that hair colour may represent the next-promising candidate EVC for DNA prediction after eye colour. Hair colour (as well as eye colour), is generally known to be highly variable in people of (at least partial) European descent and those from nearby regions such as the Middle East and parts of Western Asia , with individuals displaying numerous variations of hair colour shade that are usually summarised in four main categories of colour such as red, blond, brown and black.
In contrast, people from any other parts of the world (and without European/nearby genetic admixture) usually display the ancestral black hair colour (together with the ancestral brown eye colour) phenotype. Variation in hair (and eye) colour is assumed to be of European origin and is thought to have reached their currently observed frequencies via sexual selection (i.e. mate choice preferences) . The genetic basis of human hair colour variation has been studied considerably in the last few years. Recent studies either employing the candidate gene approach or genome-wide association and/or linkage analysis have identiﬁed genes and DNA variants likely to be involved in human hair colour variation [6–8,12,14,29–33].
Some preliminary attempts have already been made towards the prediction of hair colour from informative DNA variants. In fact, an early red hair prediction protocol based on a combination of non-synonymous single nucleotide polymorphisms (SNPs) in the MC1R gene that incur the red hair phenotype effect was already developed for forensic use more than ten years ago  and its accuracy was 84% in the prediction of red hair individuals. Sulem et al.  in their genome-wide association study for European pigmentation traits developed a hair colour prediction tool, which was capable of excluding red and either blond or brown hair colour in its prediction for many of their individuals.
More recently, Valenzuela et al.  assessed 75 SNPs from 24 genes previously implicated in hair, skin and eye colour in samples of various bio-geographic origins (Europe and elsewhere) and found that three of them, i.e. rs12913832 (HERC2), rs16891982 (SLC45A2) and rs1426654 (SLC24A5) combined gave the best prediction for light and dark hair colour.
Armed with previous knowledge on hair colour associated DNA variants and in considering the most up-to-date list of DNA variants related to human hair colour variation available at the time, we recently performed an evaluation of 46 SNPs from 13 genes  for model-based population-wise hair colour predic- tion aiming to ﬁnd a set of most hair colour predictive DNA variants. In this previous study we identiﬁed a set of 13 DNA markers (2 MC1R combined marker sets and 11 single DNA markers) from 11 genes (MC1R, HERC2, OCA2, SLC45A2 (MATP), KITLG, EXOC2, TYR, SLC24A4, IRF4, PIGU/ASIP and TYRP1) containing most hair colour predictive information.
This DNA marker set provided a high degree of population-based, prevalence-adjusted overall prediction accuracy as expressed by the area under the curve of a receiver operating characteristic curve (AUC) with estimates at 0.93 for red, 0.87 for black, 0.82 for brown, and 0.81 for blond hair colour, where 1 means completely accurate prediction. However, the genotyping methodology used in this previous screening study did not allow simultaneous genotyping of all 22 identiﬁed hair colour predictive DNA markers in a single reaction as would be appreciated in forensic DNA analysis where there can be limited amounts of starting material.
Furthermore, in the previous study, only samples with hair colour genotypes and phenotypes from a single country in Eastern Europe, i.e. Poland, were available, whereas the inclusion of individuals from other European regions, such as Western and Southern parts, would be beneﬁcial in order to enrich with individuals displaying hair colours such as brown and black that are more common in these parts of Europe.
In the present study, we developed and evaluated the sensitivity of a single-tube multiplex assay targeting the 22 previously recognised hair colour predictive DNA variants as well as the six eye colour predictive SNPs from our previously developed IrisPlex system (four of which are overlapping). We employed the SNaPshot technology because it can be easily implemented in forensic DNA laboratories as no additional equipment or serious interference with protocols is needed to apply it. Furthermore, we assessed the power of the 22 DNA variants to predict hair colour categories, as well as hair colour shade, via model-based prediction studies using an expanded database of hair colour genotype and phenotype data for >1500 individuals from Eastern, Western and Southern parts of Europe that displayed varying degrees of hair colouration.
Moreover, we investigated via analysing a worldwide set of individuals from 51 populations (HGDP-CEPH), whether or not the reliability of hair colour prediction available with these 22 DNA variants depends on knowledge of bio-geographic ancestry. We present and make available for future use, the ﬁrst system for parallel prediction of hair and eye colour from DNA we termed HIrisPlex, consisting of a single multiplex assay for 24 eye and/or hair colour predictive DNA variants and two prediction models, i.e. a newly developed model for hair colour and shade prediction and the previously developed IrisPlex model for eye colour prediction.
An interactive spread- sheet tool for obtaining individual hair colour, hair colour shade, and eye colour prediction probabilities from HIrisPlex genotypes as well as a prediction guide for accurate interpretation of individual hair colour and shade probabilities are made available to enhance the practical use of the HIrisPlex system in future applications such as forensics.
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 99
2. Materials and methods 2.1. Subjects, imagery and hair and eye colour classiﬁcation DNA samples and hair colour information was collected from 1551 European subjects living in Poland (n = 1093), the Republic of Ireland (n = 339) and Greece (n = 119). All participants gave informed consent. The study was approved in part by the Ethics Committee of the Jagiellonian University, number KBET/17/B/2005 and the Commission on Bioethics of the Regional Board of Medical Doctors in Krakow number 48 KBL/OIL/2008.
Hair and eye colour phenotypes were collected by a combination of self-assessment and professional single observer grading (Polish data). The professional grader (AKK) for the polish dataset is a medical doctor (dermatologist) who evaluated hair colour upon observa- tion, and questioning of individuals in circumstances where hair was dyed or grey. For hair colour phenotype self-assessment, individuals were asked to ﬁll into the questionnaire, the colour of their hair during their 20s, and at what age grey/white hairs started to appear (Irish collection), this avoided the effects of hair greying and whitening on phenotyping.
Sample collection in Ireland included high-resolution eye and hair photographic imagery. In a brief description, hair and eye images were taken using a Nikon D3100 with an AF-S Micro Nikkor 60 mm macro lens, the aperture, shutterspeed and ISO were ﬁxed to f = 22, 1/125, and 200 respectively. A ring ﬂash (model Speedlight SB-R200) and an average distance of 7 cm was used from the eye and from the back of the head for hair imagery. This ensured consistent sampling and regulated lighting conditions, including lens settings of a 0.2 and 0.23 ﬁxed focal length. All individuals were asked to ﬁll in a questionnaire that included basic information, such as gender and age as well as data concerning eye and hair pigmentation phenotype.
However, due to many Irish individuals having dyed or grey hair, self-reported hair colour classiﬁcations were used for this set in model training. For the Greek collection, a buccal swab was taken from each individual and a self-reported questionnaire regarding hair and eye colour information was collected. For both the Irish and Greek set, hair colour was classiﬁed into 7 categories: blond (5.9%), light-brown (34%), dark-brown (45.2%), auburn (5.7%), blond-red (1.3%), red (2.2%), and black (5.7%). For the Polish dataset, this data was collected as previously reported  and hair colour was classiﬁed into 7 categories: blond (13.7%), dark- blond (44.2%), brown (22.6%), auburn (1%), blond-red (3.9%), red (3.8%), and black (10.8%)).
For hair colour prediction analyses, we grouped blond and dark-blond into one blond category (42.6%), light brown and dark brown into one brown category (39.3%) and auburn, blond-red, and red into one red category (8.8%) with black as an additional fourth category (9.3%). Eye colour was classiﬁed into 3 categories blue, brown and intermediate (including green). The term category in this context refers to the grouping of similar phenotypic colours into one group to separate them from another colour group, i.e. blond category, black category. Table 1 displays the numbers of hair and eye colour phenotypes including sex, within all 3 populations sampled.
Notably red hair in the Polish population and green eye colour in the Irish population were intentionally enriched due to their rare occurrence, therefore both phenotypes do not reﬂect natural population frequencies. 2.2. DNA samples and HIrisPlex genotyping DNA from the Polish samples was extracted as described previously . Saliva samples collected from individuals in Ireland were extracted using the Puregene DNA isolation kit (Qiagen, Hilden, Germany). Buccal swabs collected from individu- als in Greece were extracted using an in-house organic extraction protocol. DNA from the H952 subset of the HGDP-CEPH panel that represents 952 individuals from 51 worldwide populations [36,37] were purchased from CEPH.
Due to lack of DNA in some samples belonging to the HGDP-CEPH 952 set, 7 individuals could not be genotyped by the HIrisPlex assay, and therefore the ﬁnal number of worldwide samples was 945.
All samples were genotyped using the HIrisPlex assay. The assay includes 23 SNPs and 1 insertion/deletion (INDEL) polymor- phism, altogether 24 DNA variants, from 11 genes: MC1R, HERC2, OCA2, SLC24A4, SLC45A2, IRF4, EXOC2, TRYP1, TYR, KITLG, and PIGU/ ASIP. Further information on these 24 markers can be found in Table 2, including primer sequences. The 24 PCR primer pairs were designed using the default parameters of the program Primer3Plus , which is a free web-based design software. PCR fragments were designed to be as short as possible to cater for degraded DNA, and therefore all are less than 160 bp in length.
To reduce the possibility of primer pairs interacting with each other, the program Autodimer  was used to analyse primer sequences. Surround- ing sequence regions were also searched with BLAST  against dbSNP  to reduce the chance of a primers location covering a known interfering SNP site for efﬁcient primer binding. For the population genotyping, genomic DNA quantities ranging from 300 pg to 3 ng in 1 ml formats were ampliﬁed per individual in a 10 ml reaction volume consisting of 1 PCR buffer, 2.5 mM MgCl2, 220 mM of each dNTP, and 1.75 U AmpliTaq Gold DNA polymerase (Applied Biosystems Inc., Foster City, CA) including PCR primer concentrations found in Table 2.
Thermo- cycling was performed on the 96-well GeneAmp1 PCR system 9700 (Applied Biosystems) under the following conditions (1) 95 8C for 10 min, (2) 33 cycles of 95 8C for 30 s and 61 8C for 30 s, (3) 5 min at 61 8C. PCR products were cleaned with ExoSAP-IT (USB Corp., Cleveland, OH), as recommended by the manufacturer. Following removal of unincorporated dNTPs and primers. The multiplex SBE (single base extension) assay was performed using 2 ml of product with 1 ml of ABI SNaPshot kit (Applied Biosystems, Foster City, CA) reaction mix in a total reaction volume of 5 ml. Single base extension (SBE) primer sequences and concentrations used in the assay can be found in Table 2.
Thermocycling conditions were as follows: 96 8C for 2 min and 25 cycles of 96 8C for 10 s, 50 8C for 5 s and 60 8C for 30 s. Products were cleaned using SAP (USB Corp.), following manufacturers guidelines and 1 ml of cleaned product was run on the ABI 3130xl Genetic Analyser (Applied Biosystems) with POP-7 on a 36 cm capillary Table 1 Phenotype frequencies according to hair and eye colour categories (including sex) for the full combined set of individuals from Poland, Ireland and Greece. Hair colour Blond Dark blond* /light brown Dark brown Brown red/auburn Blond red Red Black Total Eye colour – blue Intermediate (green, heterochromia) Brown Total Male Female Total Poland 150 483* 247 11 43 41 118 1093 590 164 339 1093 449 644 1093 Ireland 16 111 158 23 6 10 15 339 172 90 77 339 77 262 339 Greece 11 45 49 3 0 0 11 119 13 15 91 119 51 68 119 Total 177 639 454 37 49 51 144 1551 775 269 507 1551 577 974 1551 * represents individuals who were reported as dark blond in the dark blond/light brown category.
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 100
Table 2 Information about the 24 DNA variants of the HIrisPlex assay, including PCR and single base extension (SBE) primer sequences and concentrations. Assay position SNP CHR Position Gene Major Allele Minor Allele PCR primers Concentration Product size SBE primers Concentration 1 N29insA 16 89985753 Exonic MC1R C insA MC1Rset1F Set1 0.55 mm GCAGGGATCCCAGAGAAGAC 117bp CCCCAGCTGGGGCTGGCTGCCAA 1.3 mm 2 rs11547464 16 89986091 Exonic MC1R G A MC1Rset1R 0.55 mm TCAGAGATGGACACCTCCAG ttttttttttttGCCATCGCCGTGGACC 0.1 mm 3 rs885479 16 89986154 Exonic MC1R C T MC1Rset2F Set2 0.5 mm CTGGTGAGCTTGGTGGAGA 158bp ttttttttttttttttttGATGGCCGCAACGGCT 1.25 mm 4 rs1805008 16 89986144 Exonic MC1R C T MC1Rset2R 0.5 mm TCCAGCAGGAGGATGACG tttttttttttttACAGCATCGTGACCCTGCCG 0.375 mm 5 rs1805005 16 89985844 Exonic MC1R G T MC1Rset3F Set3 0.5 mm GTCCAGCCTCTGCTTCCTG 147bp tttttttttttttttTGGTGGAGAACG- CGCTGGTG 0.75 mm 6 rs1805006 16 89985918 Exonic MC1R C A MC1Rset3R 0.5 mm AGCGTGCTGAAGACGACAC ttttttttttttttttttttCTGCCTGGC- CTTGTCGGA 0.75 mm 7 rs1805007 16 89986117 Exonic MC1R C T MC1Rset4F Set4 0.4 mm CAAGAACTTCAACCTCTTTCTCG 106bp tttttttttttttttttttttttttCTCCATCTTC- TACGCACTG 1 mm 8 rs1805009 16 89986546 Exonic MC1R G C MC1Rset4R 0.4 mm CACCTCCTTGAGCGTCCTG ttttttttttttttttttttttttttttttATCTGC- AATGCCATCATC 0.4 mm 9 Y152OCH 16 89986122 Exonic MC1R C A ttttttttttttttttttttttttttttttCATCTT- CTACGCACTGCGCTA 0.6 mm 10 rs2228479 16 89985940 Exonic MC1R G A ttttttttttttttttttttttttttttttttttttC- TGGTGAGCGGGAGCAAC 0.375 mm 11 rs1110400 16 89986130 Exonic MC1R T C ttttttttttttttttttttttttttttttCTTCTA- CGCACTGCGCTACCACAGCA 0.3 mm 12 rs28777 5 33994716 Intronic SLC45A2 A C rs28777_F Set5 0.4 mm TACTCGTGTGGGAGTTCCAT 150bp tttttttttttttttttttttttttttttttttttttttCA- TGTGATCCTCACAGCAG 1.2 mm rs28777_R 0.4 mm TCTTTGATGTCCCCTTCGAT 13 rs16891982 5 33987450 Exonic SLC45A2 G C Rs16891982_F Set6 0.4 mm TCCAAGTTGTGCTAGACCAGA 128bp tttttttttttttttttttttttttttttttttttttttttttt- AAACACGGAGTTGATGCA 1 mm Rs16891982_R 0.4 mm CGAAAGAGGAGTCGAGGTTG 14 rs12821256 12 87852466 Intergenic KITLG A G rs12821256_F Set7 0.4 mm ATGCCCAAAGGATAAGGAAT 118bp ttttttttttttttttttttttttttttttttttttttt- GGAGCCAAGGGCATGTTACTACGGCAC 0.1 mm rs12821256_R 0.4 mm GGAGCCAAGGGCATGTTACT 15 rs4959270 6 402748 Intergenic EXOC2 C A Rs4959270_F Set8 0.4 mm TGAGAAATCTACCCCCACGA 140bp ttttttttttttttttttttttttttttttttttttttttt- GGAACACATCCAAACTATGACACTATG 0.375 mm Rs4959270_R 0.4 mm GTGTTCTTACCCCCTGTGGA 16 rs12203592 6 341321 Intronic IRF4 C T rs12203592_F Set9 0.4 mm AGGGCAGCTGATCTCTTCAG 126bp tttttttttttttttttttttttttttttttttttttttttttt- tTCCACTTTGGTGGGTAAAAGAAGG 0.3 mm rs12203592_R 0.4 mm GCTTCGTCATATGGCTAAACCT 17 rs1042602 11 88551344 Exonic TYR G T rs1042602 _F Set10 0.4 mm CAACACCCATGTTTAACGACA 124bp ttttttttttttttttttttttttttttttttttttttttttttt- tttttttTCAATGTCTCTCCAGATTTCA 1.25 mm rs1042602 _R 0.4 mm GCTTCATGGGCAAAATCAAT 18 rs1800407 15 25903913 Exonic OCA2 G A rs1800407_F Set11 0.4 mm AAGGCTGCCTCTGTTCTACG 124bp tttttttttttttttttttttttttttttttttttttttttttttt- tttttttttttttttGCATACCGGCTCTCCC 0.1 mm rs1800407_R 0.4 mm CGATGAGACAGAGCATGATGA 19 rs2402130 14 91870956 Intronic SLC24A4 A G rs2402130_F Set12 0.4 mm ACCTGTCTCACAGTGCTGCT 150bp tttttttttttttttttttttttttttttttttttttttttttttttt- ttttttttttttTGAACCATACGGAGCCCGTG 0.75 mm rs2402130_R 0.4 mm TTCACCTCGATGACGATGAT 20 rs12913832 15 26039213 Intronic HERC2 C T rs12913832_F Set13 0.4 mm TCAACATCAGGGTAAAA- ATCATGT 150bp ttttttttttttttttttttttttttttttttttttttttttttttt- tttttttttttttttttTAGCGTGCAGAACTTGACA 1.2 mm rs12913832_R 0.4 mm GGCCCCTGATGATGATAGC 21 rs2378249 20 32681751 Intronic ASIP/PIGU T C rs2378249_F Set14 0.4 mm CGCATAACCCATCCCTCTAA 136bp tttttttttttttttttttttttttttttttttttttttttttttt- ttttttttttttttttttttCCACACCTCTCCTCAGCCCA 0.18 mm rs2378249_R 0.4 mm CATTGCTTTTCAGCCCACAC S.
Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 101
array following the SNaPshot kit sample preparation guidelines, however run parameters of 2.5 kV for 10 s injection voltage and run time of 500 s at 60 8C were used for increased sensitivity. For assay sensitivity studies, genotyping results from two different individuals were assessed from serial dilutions of DNA input samples of 500 pg, 250 pg, 125 pg, 63 pg and 31 pg. Each result was investigated for allelic drop out, which includes peaks below the 50-rfu threshold that cannot be called.
The determina- tion of sensitivity was based on the production of a full proﬁle in every replicate at a particular DNA input level. 2.3. HIrisPlex DNA variants and their use for eye/hair colour prediction including in a worldwide sample The HIrisPlex assay consists of 24 DNA variants (23 SNPs and 1 INDEL), 6 of these markers, rs12913832 (HERC2), rs1800407 (OCA2), rs12896399 (SLC24A4), rs16891982 (SLC45A2 (MATP)), rs1393350 (TYR) and rs12203592 (IRF4) are taken from the IrisPlex system which has already been well established [4,5,19,42] and are used for the eye colour prediction part of the HIrisPlex system.
The results of these 6 SNPs when their minor allele is input into the HIrisPlex prediction tool are used to predict the eye colour of the individual using the IrisPlex model as previously published [5,19] with the highest probability of the three categories, brown, blue or intermediate being the predicted eye colour.
The 22 DNA variables used for hair colour prediction are Y152OCH, N29insA, rs1805006, rs11547464, rs1805007, rs1805008, rs1805009, rs1805005, rs2228479, rs1110400 and rs885479 from the MC1R gene, rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2 (MATP)), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs2378249 (PIGU/ASIP), rs12913832 (HERC2), rs1800407 (OCA2), rs16891982 (SLC45A2 (MATP)) and rs12203592 (IRF4) based on our previous publication for hair colour prediction . When their minor alleles are input into the HIrisPlex prediction tool, they are used to predict the hair colour of the individual using the HIrisPlex hair prediction model developed in this paper.
From the four hair colour categories of blond, brown, red and black, the highest probability value is indicative of the predicted hair colour following guidelines that are published within this paper and described in the next section. For worldwide hair colour prediction, we assessed the HIrisPlex assay performance on 945 samples from 51 populations of the HGDP-CEPH set. The MapViewer 7 (Golden Software, Inc., Golden, CO, USA) package was used to plot the predicted hair colour categories and the distribution of SNP genotypes on the world map. A non-metric multidimensional scaling (MDS) plot was produced to illustrate the pairwise FST distances  of the 24 eye and hair colour SNPs between populations, using SPSS 17.0.2 for Windows (SPSS Inc., Chicago, USA).
Analysis of molecular variance (AMOVA) (Excofﬁer 1992) was performed using Arlequin v3.11 . A threshold assessment of prediction probabilities for each hair colour category was also carried out including a combined eye and hair colour prediction probability threshold in the inference of a Non-European individual with Black hair and brown eyes. For the assessment of an age-dependent hair colour change, a Pearson correlation was calculated and the graph plotted using SPSS 17.0.2 for Windows (SPSS Inc., Chicago, USA).
2.4. Prediction modelling for hair colour To develop a hair colour prediction model using samples from several sites with varying levels of hair colour due to their position within Europe, central, western and southern Europe, we took a random subset of 80% of the samples from each site, Poland (n = 875), Ireland (n = 272) and Greece (n = 96). This 80% subset was used to train the model and was based on Multinomial Logistic Table 2 (Continued ) Assay position SNP CHR Position Gene Major Allele Minor Allele PCR primers Concentration Product size SBE primers Concentration 22 rs12896399 14 91843416 Intergenic SLC24A4 T G Rs12896399_F Set15 0.4 m m CTGGCGATCCAATTCTTTGT 125bp ttttttttttttttttttttttttttttttttttttttttttt- ttttttttttttttttttttttTCTTTAGGT- CAGTATATTTTGGG 1.125 m m Rs12896399_R 0.4 m m GACCCTGTGTGAGACCCAGT 23 rs1393350 11 88650694 Intronic TYR C T Rs1393350_F Set16 0.4 m m TTTCTTTATCCCCCTGATGC 124bp ttttttttttttttttttttttttttttttttttttttttt- ttttttttttttttttttttttttttCATTTGTA- AAAGACCACACAGATTT 1.1 m m Rs1393350_R 0.4 m m GGGAAGGTGAATGATAACACG 24 rs683 9 12699305 Exonic TYRP1 T G rs683_F Set17 0.4 m m CACAAAACCACCTGGTTGAA 138bp ttttttttttttttttttttttttttttttttttttttt- tttttttttttttttttttttttttttGCTTTG- AAAAGTATGCCTAGAACTTTAAT 0.175 m m rs683_R 0.4 m m TGAAAGGGTCTTCCCAGCTT S.
Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 102
Regression (MLR), as previously published by Liu et al. . In brief, individuals were categorised according to their hair phenotypes and were split into 4 categories, Blond (n = 529), Brown (n = 490), Red (n = 109) and Black (n = 115). For their genotypes, 22 of the 24 HIrisPlex DNA variations (as described above) were used to test for hair colour differentiation and use in the prediction model. By inputting the minor allele of each DNA variant, including its phenotype and applying MLR, alpha and beta values are generated that form the core of the prediction model. This model then allows the probabilistic prediction of an individuals hair colour category solely based on the input of the 22 variant minor alleles into the HIrisPlex hair colour prediction tool.
To assess the effect of the light and dark shades of hair colour that may be contributed from blond and black respectively, a similar approach was used that combined the individuals grouped in the light category (blond, n = 529) versus a dark category (black, n = 115). Red hair individuals were omitted (n = 109) from this analysis as their resulting colour is based upon an MC1R cumulative mutation and not on the continuous spectrum of light to dark (i.e. blond to black). Brown hair individuals (n = 490) were omitted, as only the extremes of light and dark were required. Therefore using this two-pronged model approach, a predicted hair colour is generated with an approximate indication of the colour being light or dark (i.e.
light brown, dark brown) due to the inﬂuence of the genotypes commonly associated with the light/dark categories, of blond and black respectively. The further 20% of the combined dataset (total n = 308), i.e. from Poland (n = 218), Ireland (n = 67) and Greece (n = 23), was used to assess the accuracy of the prediction model in terms of the ﬁnal hair colour prediction being correct or incorrect based on colour category, shade and use of the hair colour prediction guide that is described in detail in Section 3, and an assessment of optimal category thresholds was undertaken. The steps to take when acquiring a prediction based on colour and shade are outlined in a guide provided below.
3. Results and discussion 3.1. HIrisPlex genotyping assay – design and sensitivity The HIrisPlex assay was designed with the intention to cope with low template and degraded DNA, a standard concern when genotyping forensic casework samples. Therefore, care was taken Fig. 1. An assessment of the HIrisPlex assay’s sensitivity on two individuals ascertained to have high numbers of heterozygote alleles (7 and 11, respectively) for quantiﬁed DNA input at 500 pg, 125 pg, 63 pg and 31 pg. Full proﬁles were observed down to 63 pg DNA input with drop out occurring at 31 pg DNA input for insertion 1 (N29insA), SNP 15 (rs4959270), SNP 17 (rs1042602), SNP 18 (rs1800407), and SNP 23 (rs1393350) including a C allele drop in at SNP 9 (Y152OCH).
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 103
sensitive IrisPlex assay may be applied subsequently and may provide a full 6-SNP proﬁle for eye colour prediction on critical DNA samples. 3.2. HIrisPlex model-based hair colour prediction MC1R polymorphisms are largely recessive when considered individually, but also interact with each other through a genetic mechanism known as ‘‘compound heterozygosity’’ [47–49]. In our previous population-based hair colour prediction study , the MC1R variants Y152OCH, N29insA, rs1805006, rs11547464, rs1805007, rs1805008, rs1805009, rs1805005, rs2228479, rs1110400 and rs885479 were all collapsed into two markers, MC1R-R (R/R, R/wt, wt/wt) and MC1R-r (r/r, r/wt, wt/wt), depending on the penetrance of the mutant alleles.
Thus, the total 22 hair colour markers were considered as 13 markers in our previous prediction analysis, including, MC1R_R, MC1R_r, rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2 (MATP)), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs2378249 (PIGU/ASIP), rs12913832 (HERC2), rs1800407 (OCA2), rs16891982 (SLC45A2 (MATP)) and rs12203592 (IRF4). In the current study, we had two main reasons for the development of a new hair colour prediction model utilising a 22 DNA variant set without collapsing into MC1R-R and MC1R-r. First, we were able to produce a larger dataset that provides a broader representation of Europe and its highly variable hair colour regions.
Notably, we not only increased the sample size relative to our previous study  by 3-fold, but in addition to considering more Eastern Europeans from Poland (also used before) we also added individuals from Western Europe, i.e. Ireland and from Southern Europe, i.e. Greece. These three countries display very different hair colour phenotype frequencies (Table 1), which would also impact on the modelling. The use of samples from three European regions and countries provides an increase in overall sample size and also a better representation of the hair colour phenotype variation across Europe, but this also increases the different genotype combina- tions observable.
Second, some of the MC1R variants also contribute to hair colours other than red  (as seen in Table 3). As many individuals from Ireland display a higher frequency of MC1R mutations, up to 75% noted in a previous study with 30% of these being double mutations  and 78% in our own set contain at least one of the MC1R mutations without displaying the red hair phenotype, other Europeans from other regions to which our hair colour prediction tool may be applied in the future may also reﬂect this. Therefore, a new hair colour prediction model was developed to examine the input of each single DNA variation for hair colour categorical prediction, including the individual impact of all MC1R variants separately.
Fig. 2 shows a hypothesised tree model illustrating how each of the 22 DNA variants contributes towards a categorical hair colour prediction as inferred from our current data. This scenario represents the extreme of a 2 minor allele input for each single DNA variant and the largest single hair colour category effect that is seen on the models prediction, based on that input. However, it is important to note here (and as further outlined below) that it is the combination of all 22 DNA variants together in a single model that ﬁnally allows the prediction of hair colours as we suggest with this study.
Table 3 provides a measure of the strength of each DNA variant’s contribution towards each hair colour category prediction using beta values including p-values obtained from the MLR model. The analysis is based on the combined 80% model-building subset of 1243 Polish, Irish and Greek individuals assigned into a red versus non-red colour category which then displays each DNA variants contribution towards red hair colour within the model. For the other categories (i.e. blond versus non-blond, brown versus non-brown and black versus non-black), we used a total set of 1134 individuals representing the 80% model-building subset but now omitting the red hair individuals from the analyses due to their rare DNA variants and the fact that red hair is not a continuous colour but more a combined MC1R mutation effect on colour change [47,51].
As the probability values shown suggest, the results for hair colour variation from blond via brown to black (without red) are consistent with our previous ﬁndings  in several DNA variants, i.e. rs12913832 (HERC2) and rs12203592 (IRF4) with high statistical support (P 10 6 to 10 16 ) in the present enlarged dataset considering Poland, Ireland and Greece. Although less powerful, additional DNA variants also show signiﬁcant evidence (p < 0.05) for some hair colours, such as rs2402130 (SLC24A4), rs12821256 (KITLG), rs4959270 (EXOC2), rs1805006 (MC1R), rs1805007 (MC1R), rs1805008 (MC1R) for blond, rs1805006 (MC1R) and rs2402130 (SLC24A4) for brown, and rs1805007 (MC1R) for black.
Red hair colour prediction is observed with highest probability values (P 10 8 to 10 16 ) for several of the individually considered MC1R variants as expected, i.e. rs1805008, rs1805007, rs1805009 and rs11547464, and with somewhat less statistical strength (p < 0.05) for other MC1R variants, i.e. rs1805005, rs1805006 and rs1110400. However, due to the very low frequency in our set of individuals of the generally rare MC1R variant allele at N29insA (INDEL) and Y152OCH, their contribution towards red hair probabilities are particularly high (Table 3; red Fig. 2. Hypothesised scenario of the effect of each HIrisPlex SNPs minor allele input on the model for hair colour prediction as a homozygous genotype (the minor allele input is 2).
The highest effect in terms of probability for a certain hair colour category is noted and the SNP is named near that category within the ﬁgure. S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 105
Fig. 3. Illustrative example of 44 individuals on the model performance of the HIrisPlex system for hair colour prediction. The 44 individual set was taken from the Irish collection for which hair imagery was noted as neither grey nor dyed. These individuals are only a visual example of how the model performs. Probability values are given for all four hair-colour categories (black, brown, red and blond) with the highest probability value and the category for which a colour is called highlighted. Dark and light probability values are also indicated which show the amount of black and blond contribution and effect towards the ﬁnal colour prediction using the guide in Fig.
4. The individual hair ﬁgures are ordered from left to right, top to bottom, starting with the highest probabilities for black to the lowest in column one, for the remaining columns the order is the lowest to the highest probabilities for brown, red and blond, respectively.
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 106
hair beta values of 22 and 19.4 respectively), i.e. the presence of an A allele at N29insA or Y152OCH produces red hair prediction probabilities of 1. This effect is not mirrored in the other MC1R variants investigated and reﬂects the presence of these very rare alleles (heterozygote and homozygote state) within all individuals displaying a red hair phenotype in our model training set at a very low frequency (n = 6). Although this does not affect the ﬁnal prediction of red hair, it is important to note the abnormally high probability values for red when these rare variants are present.
Notably, some DNA variants outside the MC1R gene also show signiﬁcant red hair colour probabilities (p < 0.05), i.e. rs12913832 (HERC2), and rs2378249 (PIGU/ASIP).
Fig. 3 provides the results of HIrisPlex prediction for a subset of 44 Irish individuals where high-resolution non-dyed hair colour imagery was available to illustrate the model’s performance. The individuals natural hair colour images were ordered according to their predicted hair colour category probability values achieved via HIrisPlex analysis while the actual hair colour phenotypes were not considered in the ordering. From left to right, top to bottom, the images are ordered from the highest to lowest HIrisPlex prediction probabilities for black hair and then the lowest to highest prediction probabilities for brown, red and blond hair respectively.
As evident, there is a high correlation with the predicted hair colour category from HIrisPlex and the hair colour phenotype observed from visual inspection of these images. Table 4 shows the accuracy of hair colour prediction in the 20% model-testing subset of the Polish, Irish, and Greek individuals (n = 308). It is important to emphasise here that these individuals were not used for model building. The highest probability category approach (as opposed to the prediction-guide approach explained in the next paragraph) considers the colour category with the highest predicted probability as the ﬁnal predicted colour and does not take other categories into account for the ﬁnal prediction.
Using this approach, we tested various probability thresholds, from no threshold, to p > 0.7, which we previously recommended for eye colour prediction using the IrisPlex system [4,19]. As seen in Table 4, using the p > 0.7 (B) threshold increases the percentage of correct calls relative to the value obtained without using any threshold (A) for some hair colours such as red hair by 10% (i.e. from 89.5 without threshold to 100% with threshold), and for blond hair by 6.5% (i.e. from 57.2 to 63.6%), whereas no difference was seen for brown hair at 75%, and for black we saw a decrease by 8.5% (i.e.
from 28.6 to 20%). The low prediction accuracy obtained with this approach for black hair may reﬂect the difﬁculty of deﬁning the true black hair colour phenotype relative to the dark brown phenotype within this European dataset, where black hair is rare. Notably, the low correct call rate of 28.6% for black (without using a threshold) is mainly caused by 30 individuals with non- black self-reported phenotypes that were predicted as black by the HIrisPlex model. Of these, almost all (i.e. 90%) had the brown–dark brown phenotype. We could speculate that at least some of them may have been self-categorised as black if black hair colour would be more frequent in the sampled populations and therefore easier to differentiate from dark brown in the phenotyping procedure.
Although red hair is also rare in the European population (albeit in our Polish dataset it was enriched for) this problem is less expected for red hair as red is usually well differentiable from other hair colours, perhaps with the exception of the blond-red individuals. The prediction accuracy for blond hair, being lower than those for red and brown hair colour with and without threshold, is partly due to another phenomenon that will be discussed in detail in Section 3.3; age-dependent hair colour changes. As brown hair is the intermediary stage between blond and black, no prediction threshold for this category is required as can be seen in Table 4.
Even at the 75% correct call rate, the incorrect 5/8 deﬁned themselves as being dark blond. Since we know an overlap exists between light-brown and dark-blond in people’s perception and deﬁnition of colour, it is best to consider dark blond the same Table 4 HIrisPlex hair colour prediction accuracies obtained from a 308 separate model testing set of individuals from Poland, Ireland and Greece (individuals were not considered for prediction model building for which a different set of 1243 individuals was used) using two approaches: the highest probability category approach (with and without thresholds) and the prediction guide approach (see Fig.
4 for the prediction guide).
Highest probability category approach Observed categories No threshold p value Red Blond Brown Black Total prediction(n) Red predicted 17(89.5%) 1(5.3%) 1(5.3%) 0(0%) 19(100%) Blond predicted 8(3.7%) 123(57.2%) 68(31.6%) 16(7.4%) 215(100%) Brown predicted 2(6.3%) 5(15.6%) 24(75%) 1(3.1%) 32(100%) Black predicted 1(2.4%) 2(4.8%) 27(64.3%) 12(28.6%) 42(100%) 8 2 ) n ( e p y t o n e h p l a t o T 131 120 29308 Total Highest probability category approach Observed categories >0.7p threshold Red Blond Brown Black Total prediction(n) Red predicted 8(100%) 0(0%) 0(0%) 0(0%) 8(100%) Blond predicted 2(1.8%) 70(63.6%) 33(30%) 5(4.6%) 110(100%) Brown predicted 0(0%) 0(0%) 3(75%) 1(25%) 4(100%) Black predicted 0(0%) 0(0%) 4(80%) 1(20%) 5(100%) 1 ) n ( e p y t o n e h p l a t o T 70 40 7127 Total 1 ) % 9 .
5 7 ( 2 2 ) % 7 . 6 6 ( 8 ) % 6 . 6 4 ( 1 6 ) % 3 . 4 6 ( 8 1 d e n i f e d n U 81 (58.8%) Observed categories Prediction guide approach Red Blond d-blond/l-brown D-Brown Black Total prediction (n) Red predicted 24(80%) 06(20%) 0(0%) 0(0%) 30(100%) Blond/d blond predicted 2(1.7%) 32(26.4%) 52(43%) 30(24.8%) 5(4.1%) 121(100%) L-brown/d-brown predicted 2(1.3%) 9(6%) 59(39.6%) 58(38.9%) 21(14.1%) 149(100%) Black/d-brown predicted 0 01(12.5%) 4(50%) 3(37.5%) 8(100%) 8 2 ) n ( e p y t o n e h p l a t o T 41 118 92 29308 Total S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 107
colour as light brown. Therefore, brown hair colour may also be seen at black and blond category predictions 0.7 p threshold) or nearly all (89.5% without threshold) individuals for which the red hair category was the highest prediction probability were correctly predicted as seen in Table 4. Notably, the two individuals that were incorrectly predicted red without using a threshold deﬁned themselves as blond and brown, respectively; upon inspection of a hair image of the latter individual that was available to us, it did in fact display light red hints of colour. This reﬂects another example of how the phenotyping procedure, particularly self-reported hair colour grading as done in our Irish and Greek datasets, inﬂuences DNA prediction accuracy.
However, it is important to point out here that for 11(39%) individuals that had deﬁned themselves as having red hair, the red hair probability was not the highest, relative to probabilities for non-red hair colour, and these individuals were therefore missed out with HIrisPlex using this highest-probability approach. Furthermore, for 8 (6%) of the phenotypic blond, 96 (80%) of the phenotypic brown, and 17 (59%) of the phenotypic black hair individuals the highest predicted hair colour category did not correspond to the phenotypic hair colour category and hence these individuals were missed using this highest- probability approach.
This illustrates the limitation with the highest-probability approach that we aimed to overcome by developing and applying a prediction-guide approach as discussed next.
To take full advantage of the genotype–phenotype relationship for hair colour and the 22 hair-colour predictive DNA variants included in the HIrisPlex system we developed a hair colour prediction guide considering categorical hair colour probabilities in combination with light/dark hair colour shade probabilities as obtained from the HIrisPlex genotype data (Fig. 4, see also Section 3.5 for additional practical recommendations). The reason for considering light/dark shade prediction in addition to categorical hair colour prediction in the ﬁnal approach is that the 22 DNA variants not only impact on the main hair colour categories, but also on more detailed hair colour information, which is difﬁcult to measure; hence, we express in light/dark prediction probabilities.
For this, we took the individuals from the black category, now termed dark, and the individuals from the blond category, now termed light, and designed an additional prediction model for light and dark colour shade. Therefore, the HIrisPlex genotype input ﬁnally provides the core prediction colour category with an added level or shade, i.e. light or dark. This part of the prediction should be useful as additional information to the initial prediction category, e.g. to differentiate light blond from dark blond (light brown), or light brown from dark brown/black. It becomes particularly beneﬁcial in the lower hair colour category prediction probability levels (i.e.
genotype combinations. A >0.9 threshold is used for light versus dark shade prediction. As seen in Table 4(C), using the prediction guide approach the correct call percentages were for all hair colours considerably higher than using the highest probability category approach, except for red hair. In fact, using the prediction guide approach we obtained on average 69.5% correct calls for blond, 78.5% for brown and 87.5% for black. Particularly black hair prediction was strongly improved by using the prediction guide approach with an increase of almost 60% on average relative to the highest probability category approach without a threshold.
For an explanation of why blond is the least accurately predictable hair colour with currently available DNA markers, also after applying the prediction guide, see Section 3.3. Although we saw an apparent decrease of accurate prediction for red hair with the prediction guide approach (80% versus 89.5% with highest-probability approach without threshold), this can be explained by the total number of red predictions made by the models and if they were correct or not. In particular, for the highest probability approach the model was incorrect at predicting red only 2 times but missed out on 11 actual reds from our dataset.
The prediction guide approach, although was inaccurate for red hair prediction for 6 individuals, it managed to predict 24 out of the 28 actual red hair phenotypes from our test set. In summary, the number of individuals in our 308 model-test set that were missed by HIrisPlex hair colour prediction using the prediction guide approach were 4 (14%) of the phenotypic red, 8 (19.5%) of the phenotypic blond, 7 (6%) of the phenotypic d-blond/l-brown, 28 (31%) of the phenotypic d-brown and 26 (90%) of the phenotypic black, with an overall hair colour prediction accuracy of 76%. All are considerably less than what was missed when applying the highest probability category approach, apart from black hair where we believe phenotyping inaccuracy/perception of colour plays a role as discussed above already, as 21 of those individuals were predicted as having d-brown hair and may have in fact displayed d- brown hair that was perceived as black within Europe.
We therefore recommend using the prediction guide approach for properly interpreting HIrisPlex genotype data and the probability values derived from our prediction tool to infer the most likely hair colour phenotype in future practical applications. There are several important differences between eye and hair colour, both on the phenotypic as well as the genotypic levels, that may play a role in why some eye colours (i.e. blue and brown) appear to be currently predictable from DNA with higher accuracy than some hair colours (i.e. all non-red hair colours). Rs12913832 from the HERC2 gene plays a major role in the functional aspects of iris pigmentation [10,52] and its proposed model of action reﬂects a type of on/off switch from the absence of the T allele (and the homozygous presence of the C-allele) resulting in blue eye colour, to the presence of one or two T allele(s) reﬂecting brown eye colour .
Indeed, it has been shown recently  via a series of functional genetic experiments that the rs12913832 T-allele leads to binding of several transcription factors and a chromatin loop with the promoter of the neighbouring pigmentation gene OCA2 leading to elevated OCA2 expression and dark pigmentation. In contrast, when the rs12913832 C-allele is present, transcription factor binding, loop formation and OCA2 expression are all reduced leading to light pigmentation. Because of its strong functional involvement, HERC2 rs12913832 shows the strongest predictive power on categorical eye colour with an AUC of 0.877 for blue and 0.899 for brown alone for this SNP , and it also shows a strong impact on quantitative eye colour variance explaining on average 46% of the H and S spectrum .
When comparing this to its impact on hair colour, the percentage of residual variation from black to blond explained by HERC2 rs12913832 was 10.7% in a previous study . However, the effect of rs12913832 is considerably less on hair colour than it is on eye colour for reasons yet to be unveiled, and there are no other high impact hair colour SNP that take its place. For instance, in our full dataset using 1551 individuals, the correlation of rs12913832 with eye colour is nearly twice as high (Pearson correlation r2 = 0.46, p = 2.2e 16) as its correlation with hair colour (Pearson correlation r2 = 0.24, p = 2.2e 16).
Furthermore, the colour distribution of European hair appears much wider than that of European eyes, requiring the combination of several similar gene effects . Thus, categorical hair colour prediction is expected to be more error-prone especially when involving factors such as shade and intensity, etc. at least with the DNA markers known thus far. Additional effects such as environmental contributions particularly life time that are much stronger on certain hair colours than they are on all eye colours also inﬂuence hair colour prediction accuracy more so than eye colour prediction accuracy and will be discussed in the following chapter (see Section 3.3).
3.3. Age-dependent hair colour changes and consequences for hair colour prediction Age-dependent changes in hair colour are evident from anecdotal knowledge. The most often observed age-dependent hair colour changes occurs from light blond during childhood towards dark blond/light brown as an adult, but can also occur from light brown to dark brown/almost black. Suggestions of hormonal changes during adolescence have been advocated as a possible explanation , but the molecular basis are yet to be unveiled. In order to study the effect of age-dependent hair colour change on hair colour prediction from child to adulthood we recorded via questionnaires in the Irish sample set hair colour during childhood and adulthood separately, including the approx- imate age of the hair colour change.
Of the 339 Irish individuals, 157 contained current images in which the hair was not dyed and not grey, and from these the 8 individuals that were classiﬁed as blond in adulthood were 100% correctly predicted by the HIrisPlex system following the prediction guide approach. However, for 14 individuals with light brown to black phenotypes the HIrisPlex model had faltered and gave a high blond prediction probability (>0.7 p) with high light shade probabilities (>0.9 p). On further examination of these incorrectly predicted individuals, 8 (57%) of them noted that a change in hair colour regarding a darkening from blond to brown had occurred in their younger lives at ages ranging from 9 to 12 years.
Furthermore, we found a high and statistically signiﬁcant correlation (Pearson correlation r2 = 0.81, p < 0.01) between the increase in brown (darkening of hair) and the increase in age since the hair colour change occurred for those Irish individuals for whom such data were available to us (Fig. 7), which substantiates that the hair colour change observed is age dependent in these individuals. From this data we can see that our current HIrisPlex system works to a high degree of accuracy for hair colour prediction, but there may be processes that alter the hair colour over an individual’s lifetime (possibly molecular processes) without changing the HIrisPlex predicted hair colour of the individual.
For instance, an adult that had blond hair as a young child, but now displays light–dark brown/black hair colour is likely to display blond HIrisPlex genotypes and therefore a blond hair colour prediction will be obtained. This is due to the fact that the hair colour SNPs included in the HIrisPlex system, as well as any additional hair colour associated DNA variant available today, were identiﬁed in studies dealing with adults, and not in studies that particularly searched for bio-markers informative for the age- dependent hair colour change, which is still yet to be carried out. It is important to note therefore that the HIrisPlex model cannot decipher between these change-affected individuals and blonds who remain blonds from childhood to adulthood and thus a HIrisPlex prediction of blond hair may be inaccurate to a certain S.
Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115 109