Forensic Science International: Genetics

Forensic Science International: Genetics
Forensic Science International: Genetics 7 (2013) 98–115

                                                         Contents lists available at SciVerse ScienceDirect

                                             Forensic Science International: Genetics
                                                  journal homepage:

The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA
Susan Walsh a, Fan Liu a, Andreas Wollstein a, Leda Kovatsi b, Arwin Ralf a, Agnieszka Kosiniak-Kamysz c,
Wojciech Branicki d,e, Manfred Kayser a,*
  Department of Forensic Molecular Biology, Erasmus MC University Medical Centre Rotterdam, The Netherlands
  Laboratory of Forensic Medicine & Toxicology, School of Medicine, Aristotle University of Thessaloniki, Greece
  Department of Dermatology, Collegium Medicum of the Jagiellonian University, Kraków, Poland
  Section of Forensic Genetics, Institute of Forensic Research, Kraków, Poland
  Department of Genetics and Evolution, Institute of Zoology, Jagiellonian University, Kraków, Poland

A R T I C L E I N F O                                  A B S T R A C T

Article history:                                       Recently, the field of predicting phenotypes of externally visible characteristics (EVCs) from DNA
Received 2 May 2012                                    genotypes with the final aim of concentrating police investigations to find persons completely unknown
Received in revised form 25 June 2012                  to investigating authorities, also referred to as Forensic DNA Phenotyping (FDP), has started to become
Accepted 23 July 2012
                                                       established in forensic biology. We previously developed and forensically validated the IrisPlex system
                                                       for accurate prediction of blue and brown eye colour from DNA, and recently showed that all major hair
Keywords:                                              colour categories are predictable from carefully selected DNA markers. Here, we introduce the newly
Hair colour
                                                       developed HIrisPlex system, which is capable of simultaneously predicting both hair and eye colour from
Eye colour
                                                       DNA. HIrisPlex consists of a single multiplex assay targeting 24 eye and hair colour predictive DNA
Forensic DNA Phenotyping
DNA intelligence                                       variants including all 6 IrisPlex SNPs, as well as two prediction models, a newly developed model for hair
Externally visible characteristics                     colour categories and shade, and the previously developed IrisPlex model for eye colour. The HIrisPlex
                                                       assay was designed to cope with low amounts of template DNA, as well as degraded DNA, and
                                                       preliminary sensitivity testing revealed full DNA profiles down to 63 pg input DNA. The power of the
                                                       HIrisPlex system to predict hair colour was assessed in 1551 individuals from three different parts of
                                                       Europe showing different hair colour frequencies. Using a 20% subset of individuals, while 80% were used
                                                       for model building, the individual-based prediction accuracies employing a prediction-guided approach
                                                       were 69.5% for blond, 78.5% for brown, 80% for red and 87.5% for black hair colour on average. Results
                                                       from HIrisPlex analysis on worldwide DNA samples imply that HIrisPlex hair colour prediction is reliable
                                                       independent of bio-geographic ancestry (similar to previous IrisPlex findings for eye colour). We
                                                       furthermore demonstrate that it is possible to infer with a prediction accuracy of >86% if a brown-eyed,
                                                       black-haired individual is of non-European (excluding regions nearby Europe) versus European
                                                       (including nearby regions) bio-geographic origin solely from the strength of HIrisPlex eye and hair colour
                                                       probabilities, which can provide extra intelligence for future forensic applications. The HIrisPlex system
                                                       introduced here, including a single multiplex test assay, an interactive tool and prediction guide, and
                                                       recommendations for reporting final outcomes, represents the first tool for simultaneously establishing
                                                       categorical eye and hair colour of a person from DNA. The practical forensic application of the HIrisPlex
                                                       system is expected to benefit cases where other avenues of investigation, including STR profiling, provide
                                                       no leads on who the unknown crime scene sample donor or the unknown missing person might be.
                                                                                                    ß 2012 Elsevier Ireland Ltd. Open access under CC BY-NC-ND license.

1. Introduction                                                                            DNA Phenotyping (FDP). The ability to predict the physical
                                                                                           appearance of an individual directly from crime scene material
   Over the last few years, the prediction of externally visible                           can in principle help police investigations by limiting a large
characteristics (EVCs) from DNA has been an interesting topic of                           number of potential suspects in cases where perpetrators
study for many reasons, in particular, its anticipated use within                          unknown to the investigating authorities are involved. These
forensic genetics [1–3] resulting in the chosen term Forensic                              include cases where conventional STR profiling could not
                                                                                           provide a hit within the forensic DNA (profile) database, or
                                                                                           could not provide a match with a suspect singled-out by police
    * Corresponding author. Tel.: +31 10 7038073; fax: +31 10 7044575.
                                                                                           investigation, or cases where an STR profile could simply not be
      E-mail address: (M. Kayser).                                   generated due to low quality and/or quantity of DNA available.

1872-4973 ß 2012 Elsevier Ireland Ltd. Open access under CC BY-NC-ND license.
Forensic Science International: Genetics
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115                                99

Using EVC information obtained from the crime scene material                     excluding red and either blond or brown hair colour in its
via FDP, police would then proceed with more concentrated                        prediction for many of their individuals. More recently, Valenzuela
enquires, and finally request standard forensic STR profiling only                 et al. [16] assessed 75 SNPs from 24 genes previously implicated in
for the reduced number of EVC matching suspects aiming DNA                       hair, skin and eye colour in samples of various bio-geographic
individualisation for court room use. Obviously, the more EVCs                   origins (Europe and elsewhere) and found that three of them, i.e.
that are predictable from crime scene material, the better a                     rs12913832 (HERC2), rs16891982 (SLC45A2) and rs1426654
person’s appearance can be described, and in turn the smaller                    (SLC24A5) combined gave the best prediction for light and dark
the number of appearance-matching potential suspects for                         hair colour.
subsequent forensic STR profiling. Also in missing person cases                       Armed with previous knowledge on hair colour associated DNA
where a body was found decomposed with no EVC information                        variants and in considering the most up-to-date list of DNA
discernable from visual inspection, or body parts that do not                    variants related to human hair colour variation available at the
provide EVC information including bones, FDP is expected to                      time, we recently performed an evaluation of 46 SNPs from 13
provide leads for finding the right antemortem samples or                         genes [35] for model-based population-wise hair colour predic-
family members for final STR-based identification.                                 tion aiming to find a set of most hair colour predictive DNA
    The use of DNA (or other biomarkers) for investigative purposes              variants. In this previous study we identified a set of 13 DNA
termed ‘DNA intelligence’, rather than for identification purposes                markers (2 MC1R combined marker sets and 11 single DNA
in the court room as currently applied in forensics, marks a                     markers) from 11 genes (MC1R, HERC2, OCA2, SLC45A2 (MATP),
completely new application of DNA in forensics and is currently at               KITLG, EXOC2, TYR, SLC24A4, IRF4, PIGU/ASIP and TYRP1) containing
the early stages of development. At present there is only one FDP                most hair colour predictive information. This DNA marker set
tool available that has already been developmentally validated for               provided a high degree of population-based, prevalence-adjusted
forensic use and that is the IrisPlex system, capable of predicting              overall prediction accuracy as expressed by the area under the
eye colour from DNA [4,5]. Although other studies have suggested                 curve of a receiver operating characteristic curve (AUC) with
DNA markers and methods for predicting externally visible traits,                estimates at 0.93 for red, 0.87 for black, 0.82 for brown, and 0.81
most notably eye colour [6–18] none of them introduced a tool that               for blond hair colour, where 1 means completely accurate
had undergone systematic forensic developmental validation                       prediction. However, the genotyping methodology used in this
testing as of yet. The IrisPlex system allows the prediction of                  previous screening study did not allow simultaneous genotyping
eye colour from minute amounts of DNA (31 pg DNA input full                      of all 22 identified hair colour predictive DNA markers in a single
profiles) and has proven to be 94% accurate for predicting blue and               reaction as would be appreciated in forensic DNA analysis where
brown eye colour when tested on a European set of >3800                          there can be limited amounts of starting material. Furthermore, in
individuals [19]. However, work is on-going with regards to                      the previous study, only samples with hair colour genotypes and
identifying the underlying genes and developing predictive DNA                   phenotypes from a single country in Eastern Europe, i.e. Poland,
markers for several other EVCs [3] such as skin colour [8,20,21],                were available, whereas the inclusion of individuals from other
hair colour [6,8], body height [22,23], male baldness [24], and hair             European regions, such as Western and Southern parts, would be
morphology [25,26].                                                              beneficial in order to enrich with individuals displaying hair
    The previous progress on categorical eye colour DNA predict-                 colours such as brown and black that are more common in these
ability together with the strong genetic and phenotypic relation-                parts of Europe.
ship between eye and hair colour variation, as well as the increased                 In the present study, we developed and evaluated the
understanding of the genetic basis of hair colour, all suggest that              sensitivity of a single-tube multiplex assay targeting the 22
hair colour may represent the next-promising candidate EVC for                   previously recognised hair colour predictive DNA variants as well
DNA prediction after eye colour. Hair colour (as well as eye colour),            as the six eye colour predictive SNPs from our previously
is generally known to be highly variable in people of (at least                  developed IrisPlex system (four of which are overlapping). We
partial) European descent and those from nearby regions such as                  employed the SNaPshot technology because it can be easily
the Middle East and parts of Western Asia [27], with individuals                 implemented in forensic DNA laboratories as no additional
displaying numerous variations of hair colour shade that are                     equipment or serious interference with protocols is needed to
usually summarised in four main categories of colour such as red,                apply it. Furthermore, we assessed the power of the 22 DNA
blond, brown and black. In contrast, people from any other parts of              variants to predict hair colour categories, as well as hair colour
the world (and without European/nearby genetic admixture)                        shade, via model-based prediction studies using an expanded
usually display the ancestral black hair colour (together with the               database of hair colour genotype and phenotype data for >1500
ancestral brown eye colour) phenotype. Variation in hair (and eye)               individuals from Eastern, Western and Southern parts of Europe
colour is assumed to be of European origin and is thought to have                that displayed varying degrees of hair colouration. Moreover, we
reached their currently observed frequencies via sexual selection                investigated via analysing a worldwide set of individuals from 51
(i.e. mate choice preferences) [28]. The genetic basis of human hair             populations (HGDP-CEPH), whether or not the reliability of hair
colour variation has been studied considerably in the last few                   colour prediction available with these 22 DNA variants depends on
years. Recent studies either employing the candidate gene                        knowledge of bio-geographic ancestry. We present and make
approach or genome-wide association and/or linkage analysis                      available for future use, the first system for parallel prediction of
have identified genes and DNA variants likely to be involved in                   hair and eye colour from DNA we termed HIrisPlex, consisting of a
human hair colour variation [6–8,12,14,29–33]. Some preliminary                  single multiplex assay for 24 eye and/or hair colour predictive DNA
attempts have already been made towards the prediction of hair                   variants and two prediction models, i.e. a newly developed model
colour from informative DNA variants. In fact, an early red hair                 for hair colour and shade prediction and the previously developed
prediction protocol based on a combination of non-synonymous                     IrisPlex model for eye colour prediction. An interactive spread-
single nucleotide polymorphisms (SNPs) in the MC1R gene that                     sheet tool for obtaining individual hair colour, hair colour shade,
incur the red hair phenotype effect was already developed for                    and eye colour prediction probabilities from HIrisPlex genotypes as
forensic use more than ten years ago [34] and its accuracy was 84%               well as a prediction guide for accurate interpretation of individual
in the prediction of red hair individuals. Sulem et al. [6] in their             hair colour and shade probabilities are made available to enhance
genome-wide association study for European pigmentation traits                   the practical use of the HIrisPlex system in future applications such
developed a hair colour prediction tool, which was capable of                    as forensics.
Forensic Science International: Genetics
100                                               S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115

2. Materials and methods                                                                  population and green eye colour in the Irish population were
                                                                                          intentionally enriched due to their rare occurrence, therefore both
2.1. Subjects, imagery and hair and eye colour classification                              phenotypes do not reflect natural population frequencies.

    DNA samples and hair colour information was collected from                            2.2. DNA samples and HIrisPlex genotyping
1551 European subjects living in Poland (n = 1093), the Republic of
Ireland (n = 339) and Greece (n = 119). All participants gave                                 DNA from the Polish samples was extracted as described
informed consent. The study was approved in part by the Ethics                            previously [35]. Saliva samples collected from individuals in
Committee of the Jagiellonian University, number KBET/17/B/2005                           Ireland were extracted using the Puregene DNA isolation kit
and the Commission on Bioethics of the Regional Board of Medical                          (Qiagen, Hilden, Germany). Buccal swabs collected from individu-
Doctors in Krakow number 48 KBL/OIL/2008. Hair and eye colour                             als in Greece were extracted using an in-house organic extraction
phenotypes were collected by a combination of self-assessment                             protocol. DNA from the H952 subset of the HGDP-CEPH panel that
and professional single observer grading (Polish data). The                               represents 952 individuals from 51 worldwide populations [36,37]
professional grader (AKK) for the polish dataset is a medical                             were purchased from CEPH. Due to lack of DNA in some samples
doctor (dermatologist) who evaluated hair colour upon observa-                            belonging to the HGDP-CEPH 952 set, 7 individuals could not be
tion, and questioning of individuals in circumstances where hair                          genotyped by the HIrisPlex assay, and therefore the final number of
was dyed or grey. For hair colour phenotype self-assessment,                              worldwide samples was 945.
individuals were asked to fill into the questionnaire, the colour of                           All samples were genotyped using the HIrisPlex assay. The
their hair during their 20s, and at what age grey/white hairs started                     assay includes 23 SNPs and 1 insertion/deletion (INDEL) polymor-
to appear (Irish collection), this avoided the effects of hair greying                    phism, altogether 24 DNA variants, from 11 genes: MC1R, HERC2,
and whitening on phenotyping. Sample collection in Ireland                                OCA2, SLC24A4, SLC45A2, IRF4, EXOC2, TRYP1, TYR, KITLG, and PIGU/
included high-resolution eye and hair photographic imagery. In a                          ASIP. Further information on these 24 markers can be found in
brief description, hair and eye images were taken using a Nikon                           Table 2, including primer sequences. The 24 PCR primer pairs were
D3100 with an AF-S Micro Nikkor 60 mm macro lens, the aperture,                           designed using the default parameters of the program Primer3Plus
shutterspeed and ISO were fixed to f = 22, 1/125, and 200                                  [38], which is a free web-based design software. PCR fragments
respectively. A ring flash (model Speedlight SB-R200) and an                               were designed to be as short as possible to cater for degraded DNA,
average distance of 7 cm was used from the eye and from the back                          and therefore all are less than 160 bp in length. To reduce the
of the head for hair imagery. This ensured consistent sampling and                        possibility of primer pairs interacting with each other, the program
regulated lighting conditions, including lens settings of a 0.2 and                       Autodimer [39] was used to analyse primer sequences. Surround-
0.23 fixed focal length. All individuals were asked to fill in a                            ing sequence regions were also searched with BLAST [40] against
questionnaire that included basic information, such as gender and                         dbSNP [41] to reduce the chance of a primers location covering a
age as well as data concerning eye and hair pigmentation                                  known interfering SNP site for efficient primer binding.
phenotype. However, due to many Irish individuals having dyed                                 For the population genotyping, genomic DNA quantities
or grey hair, self-reported hair colour classifications were used for                      ranging from 300 pg to 3 ng in 1 ml formats were amplified per
this set in model training. For the Greek collection, a buccal swab                       individual in a 10 ml reaction volume consisting of 1 PCR buffer,
was taken from each individual and a self-reported questionnaire                          2.5 mM MgCl2, 220 mM of each dNTP, and 1.75 U AmpliTaq Gold
regarding hair and eye colour information was collected. For both                         DNA polymerase (Applied Biosystems Inc., Foster City, CA)
the Irish and Greek set, hair colour was classified into 7 categories:                     including PCR primer concentrations found in Table 2. Thermo-
blond (5.9%), light-brown (34%), dark-brown (45.2%), auburn                               cycling was performed on the 96-well GeneAmp1 PCR system
(5.7%), blond-red (1.3%), red (2.2%), and black (5.7%). For the Polish                    9700 (Applied Biosystems) under the following conditions (1)
dataset, this data was collected as previously reported [35] and                          95 8C for 10 min, (2) 33 cycles of 95 8C for 30 s and 61 8C for 30 s, (3)
hair colour was classified into 7 categories: blond (13.7%), dark-                         5 min at 61 8C. PCR products were cleaned with ExoSAP-IT (USB
blond (44.2%), brown (22.6%), auburn (1%), blond-red (3.9%), red                          Corp., Cleveland, OH), as recommended by the manufacturer.
(3.8%), and black (10.8%)). For hair colour prediction analyses, we                       Following removal of unincorporated dNTPs and primers. The
grouped blond and dark-blond into one blond category (42.6%),                             multiplex SBE (single base extension) assay was performed using
light brown and dark brown into one brown category (39.3%) and                            2 ml of product with 1 ml of ABI SNaPshot kit (Applied Biosystems,
auburn, blond-red, and red into one red category (8.8%) with black                        Foster City, CA) reaction mix in a total reaction volume of 5 ml.
as an additional fourth category (9.3%). Eye colour was classified                         Single base extension (SBE) primer sequences and concentrations
into 3 categories blue, brown and intermediate (including green).                         used in the assay can be found in Table 2. Thermocycling
The term category in this context refers to the grouping of similar                       conditions were as follows: 96 8C for 2 min and 25 cycles of
phenotypic colours into one group to separate them from another                           96 8C for 10 s, 50 8C for 5 s and 60 8C for 30 s. Products were cleaned
colour group, i.e. blond category, black category. Table 1 displays                       using SAP (USB Corp.), following manufacturers guidelines and
the numbers of hair and eye colour phenotypes including sex,                              1 ml of cleaned product was run on the ABI 3130xl Genetic
within all 3 populations sampled. Notably red hair in the Polish                          Analyser (Applied Biosystems) with POP-7 on a 36 cm capillary

Table 1
Phenotype frequencies according to hair and eye colour categories (including sex) for the full combined set of individuals from Poland, Ireland and Greece.

 Hair colour      Blond   Dark           Dark     Brown          Blond    Red     Black    Total   Eye         Intermediate     Brown   Total   Male   Female   Total
                          blond*/light   brown    red/auburn     red                               colour –    (green,
                          brown                                                                    blue        heterochromia)

 Poland           150     483*           247      11             43       41      118      1093    590         164              339     1093    449    644      1093
 Ireland           16      111           158      23              6       10       15       339    172          90               77      339     77    262       339
 Greece            11       45            49       3              0        0       11       119     13          15               91      119     51     68       119

 Total            177      639           454      37             49       51      144      1551    775         269              507     1551    577    974      1551
      represents individuals who were reported as dark blond in the dark blond/light brown category.
Forensic Science International: Genetics
Table 2
Information about the 24 DNA variants of the HIrisPlex assay, including PCR and single base extension (SBE) primer sequences and concentrations.

 Assay    SNP           CHR Position                 Gene       Major Minor PCR primers                Concentration                               Product SBE                                                 Concentration
 position                                                       Allele Allele                                                                      size    primers

   1       N29insA      16    89985753    Exonic     MC1R       C       insA    MC1Rset1F       Set1   0.55 mm         GCAGGGATCCCAGAGAAGAC        117bp   CCCCAGCTGGGGCTGGCTGCCAA                             1.3 mm
   2       rs11547464   16    89986091    Exonic     MC1R       G       A       MC1Rset1R              0.55 mm         TCAGAGATGGACACCTCCAG                ttttttttttttGCCATCGCCGTGGACC                        0.1 mm
   3       rs885479     16    89986154    Exonic     MC1R       C       T       MC1Rset2F       Set2   0.5 mm          CTGGTGAGCTTGGTGGAGA         158bp   ttttttttttttttttttGATGGCCGCAACGGCT                  1.25 mm
   4       rs1805008    16    89986144    Exonic     MC1R       C       T       MC1Rset2R              0.5 mm          TCCAGCAGGAGGATGACG                  tttttttttttttACAGCATCGTGACCCTGCCG                   0.375 mm
   5       rs1805005    16    89985844    Exonic     MC1R       G       T       MC1Rset3F       Set3   0.5 mm          GTCCAGCCTCTGCTTCCTG         147bp   tttttttttttttttTGGTGGAGAACG-                        0.75 mm
   6       rs1805006    16    89985918 Exonic        MC1R       C       A       MC1Rset3R              0.5 mm          AGCGTGCTGAAGACGACAC                 ttttttttttttttttttttCTGCCTGGC-                      0.75 mm
   7       rs1805007    16    89986117 Exonic        MC1R       C       T       MC1Rset4F       Set4   0.4 mm          CAAGAACTTCAACCTCTTTCTCG 106bp       tttttttttttttttttttttttttCTCCATCTTC-                1 mm
   8       rs1805009    16    89986546 Exonic        MC1R       G       C       MC1Rset4R              0.4 mm          CACCTCCTTGAGCGTCCTG                 ttttttttttttttttttttttttttttttATCTGC-               0.4 mm
   9       Y152OCH      16    89986122 Exonic        MC1R       C       A                                                                                  ttttttttttttttttttttttttttttttCATCTT-               0.6 mm

                                                                                                                                                                                                                               S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115
 10        rs2228479    16    89985940 Exonic        MC1R       G       A                                                                                  ttttttttttttttttttttttttttttttttttttC-              0.375 mm
 11        rs1110400    16    89986130 Exonic        MC1R       T       C                                                                                  ttttttttttttttttttttttttttttttCTTCTA-               0.3 mm
 12        rs28777      5     33994716 Intronic      SLC45A2    A       C       rs28777_F       Set5   0.4 mm          TACTCGTGTGGGAGTTCCAT        150bp   tttttttttttttttttttttttttttttttttttttttCA-          1.2 mm
                                                                                rs28777_R              0.4 mm          TCTTTGATGTCCCCTTCGAT
 13        rs16891982 5       33987450 Exonic        SLC45A2    G       C       Rs16891982_F Set6      0.4 mm          TCCAAGTTGTGCTAGACCAGA       128bp   tttttttttttttttttttttttttttttttttttttttttttt-       1 mm
                                                                                Rs16891982_R           0.4 mm          CGAAAGAGGAGTCGAGGTTG
 14        rs12821256 12      87852466 Intergenic KITLG         A       G       rs12821256_F Set7      0.4 mm          ATGCCCAAAGGATAAGGAAT        118bp   ttttttttttttttttttttttttttttttttttttttt-            0.1 mm
                                                                                rs12821256_R           0.4 mm          GGAGCCAAGGGCATGTTACT
 15        rs4959270    6     402748      Intergenic EXOC2      C       A       Rs4959270_F     Set8   0.4 mm          TGAGAAATCTACCCCCACGA        140bp   ttttttttttttttttttttttttttttttttttttttttt-          0.375 mm
                                                                                Rs4959270_R            0.4 mm          GTGTTCTTACCCCCTGTGGA
 16        rs12203592 6       341321      Intronic   IRF4       C       T       rs12203592_F    Set9   0.4 mm          AGGGCAGCTGATCTCTTCAG        126bp   tttttttttttttttttttttttttttttttttttttttttttt-       0.3 mm
                                                                                rs12203592_R          0.4 mm           GCTTCGTCATATGGCTAAACCT
 17        rs1042602    11    88551344 Exonic        TYR        G       T       rs1042602 _F    Set10 0.4 mm           CAACACCCATGTTTAACGACA       124bp   ttttttttttttttttttttttttttttttttttttttttttttt-      1.25 mm
                                                                                rs1042602 _R          0.4 mm           GCTTCATGGGCAAAATCAAT
 18        rs1800407    15    25903913 Exonic        OCA2       G       A       rs1800407_F     Set11 0.4 mm           AAGGCTGCCTCTGTTCTACG        124bp   tttttttttttttttttttttttttttttttttttttttttttttt-     0.1 mm
                                                                                rs1800407_R           0.4 mm           CGATGAGACAGAGCATGATGA
 19        rs2402130    14    91870956 Intronic      SLC24A4    A       G       rs2402130_F     Set12 0.4 mm           ACCTGTCTCACAGTGCTGCT        150bp   tttttttttttttttttttttttttttttttttttttttttttttttt-   0.75 mm
                                                                                rs2402130_R           0.4 mm           TTCACCTCGATGACGATGAT
 20        rs12913832 15      26039213 Intronic      HERC2      C       T       rs12913832_F    Set13 0.4 mm           TCAACATCAGGGTAAAA-          150bp   ttttttttttttttttttttttttttttttttttttttttttttttt-    1.2 mm
                                                                                                                       ATCATGT                             tttttttttttttttttTAGCGTGCAGAACTTGACA
                                                                                rs12913832_R          0.4 mm           GGCCCCTGATGATGATAGC
 21        rs2378249    20    32681751 Intronic      ASIP/PIGU T        C       rs2378249_F     Set14 0.4 mm           CGCATAACCCATCCCTCTAA        136bp   tttttttttttttttttttttttttttttttttttttttttttttt- 0.18 mm
                                                                                rs2378249_R            0.4 mm          CATTGCTTTTCAGCCCACAC

Forensic Science International: Genetics
102                                                                                                                                                                                                                                       S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115
                                                                                                                                                                                                                                                                                  array following the SNaPshot kit sample preparation guidelines,
                                                 1.125 mm                                                                                                                                                                                                                         however run parameters of 2.5 kV for 10 s injection voltage and

                                                                                                                                                                       0.175 mm
                                                                                                              1.1 mm
                                                                                                                                                                                                                                                                                  run time of 500 s at 60 8C were used for increased sensitivity.
                                                                                                                                                                                                                                                                                      For assay sensitivity studies, genotyping results from two
                                                                                                                                                                                                                                                                                  different individuals were assessed from serial dilutions of DNA
                                                                                                                                                                                                                                                                                  input samples of 500 pg, 250 pg, 125 pg, 63 pg and 31 pg. Each
                                                                                                                                                                                                                                                                                  result was investigated for allelic drop out, which includes peaks

                                                                                                                                                                                                                                                                                  below the 50-rfu threshold that cannot be called. The determina-


                                                                                                                                                                                                                                                                                  tion of sensitivity was based on the production of a full profile in

                                                                                                                                                                                                                                                                                  every replicate at a particular DNA input level.

                                                                                                                                                                                                                                                                                  2.3. HIrisPlex DNA variants and their use for eye/hair colour

                                                                                                                                                                                                                                                                                  prediction including in a worldwide sample

                                                                                                                                                                                                                                                                                      The HIrisPlex assay consists of 24 DNA variants (23 SNPs and 1

                                                                                                                                                                                                                                                                                  INDEL), 6 of these markers, rs12913832 (HERC2), rs1800407
                                                                                                                                                                                                                                                                                  (OCA2), rs12896399 (SLC24A4), rs16891982 (SLC45A2 (MATP)),
                       Product SBE

                                                                                                                                                                                                                                                                                  rs1393350 (TYR) and rs12203592 (IRF4) are taken from the IrisPlex
                                                                                                                                                                                                                                                                                  system which has already been well established [4,5,19,42] and are



                                                                                                                                                                                                                                                                                  used for the eye colour prediction part of the HIrisPlex system. The
                                                                                                                                                                                                                                                                                  results of these 6 SNPs when their minor allele is input into the

                                                                                                                                                                                                                                                                                  HIrisPlex prediction tool are used to predict the eye colour of the




                                                                                                                                                                                                                                                                                  individual using the IrisPlex model as previously published [5,19]
                                                                                                                                                                                                                                                                                  with the highest probability of the three categories, brown, blue or
                                                                                                                                                                                                                                                                                  intermediate being the predicted eye colour.
                                                                                                                                                                                                                                                                                      The 22 DNA variables used for hair colour prediction are
                                                                                                                                                                                                                                                                                  Y152OCH, N29insA, rs1805006, rs11547464, rs1805007,
                                                                                                                                                                                                                                                                                  rs1805008, rs1805009, rs1805005, rs2228479, rs1110400 and
                                                                                                                                                                                                                                                                                  rs885479 from the MC1R gene, rs1042602 (TYR), rs4959270
                                                                                                                                                                                                                                                                                  (EXOC2), rs28777 (SLC45A2 (MATP)), rs683 (TYRP1), rs2402130

                                                                                                                                                                                                                                                                                  (SLC24A4),     rs12821256      (KITLG),    rs2378249     (PIGU/ASIP),
                                                                                                                                                                                                                                                                                  rs12913832 (HERC2), rs1800407 (OCA2), rs16891982 (SLC45A2
                                                                                                                                                                                                                                                                                  (MATP)) and rs12203592 (IRF4) based on our previous publication
                                                 Rs12896399_F Set15 0.4 mm

                                                                                                                    0.4 mm
                                                                                                              Set16 0.4 mm

                                                                                                                                                                  0.4 mm
                                                                                                                                                            Set17 0.4 mm

                                                                                                                                                                                                                   0.4 mm

                                                                                                                                                                                                                                                                                  for hair colour prediction [35]. When their minor alleles are input
                                                                                                                                                                                                                                                                                  into the HIrisPlex prediction tool, they are used to predict the hair
                                                                                                                                                                                                                                                                                  colour of the individual using the HIrisPlex hair prediction model
                                                                                                                                                                                                                                                                                  developed in this paper. From the four hair colour categories of

                                                                                                                                                                                                                                                                                  blond, brown, red and black, the highest probability value is
                       Major Minor PCR primers

                                                                                                                                                                                                                                                                                  indicative of the predicted hair colour following guidelines that are

                                                                                                                                                                                                                                                                                  published within this paper and described in the next section.
                                                                                                                                                                                                                                                                                      For worldwide hair colour prediction, we assessed the HIrisPlex
                                                                                                                                                                                                                                                                                  assay performance on 945 samples from 51 populations of the
                       Allele Allele

                                                                                                                                                                                                                                                                                  HGDP-CEPH set. The MapViewer 7 (Golden Software, Inc., Golden,


                                                                                                                                                                                                                                                                                  CO, USA) package was used to plot the predicted hair colour

                                                                                                                                                                                                                                                                                  categories and the distribution of SNP genotypes on the world map.
                                                                                                                                                                                                                                                                                  A non-metric multidimensional scaling (MDS) plot was produced


                                                                                                                                                                                                                                                                                  to illustrate the pairwise FST distances [43] of the 24 eye and hair
                                                  91843416 Intergenic SLC24A4

                                                                                                                                                                                                                                                                                  colour SNPs between populations, using SPSS 17.0.2 for Windows


                                                                                                                                                                                                                                                                                  (SPSS Inc., Chicago, USA). Analysis of molecular variance (AMOVA)
                                                                                                                                                                                                                                                                                  (Excoffier 1992) was performed using Arlequin v3.11 [44]. A
                                                                                                                                                                                                                                                                                  threshold assessment of prediction probabilities for each hair
                                                                                                               88650694 Intronic

                                                                                                                                                                        12699305 Exonic

                                                                                                                                                                                                                                                                                  colour category was also carried out including a combined eye and
                                                                                                                                                                                                                                                                                  hair colour prediction probability threshold in the inference of a
                                                                                                                                                                                                                                                                                  Non-European individual with Black hair and brown eyes. For the
                       CHR Position

                                                                                                                                                                                                                                                                                  assessment of an age-dependent hair colour change, a Pearson
                                                                                                                                                                                                                                                                                  correlation was calculated and the graph plotted using SPSS 17.0.2
                                                                                                                                                                                                                                                                                  for Windows (SPSS Inc., Chicago, USA).
                                                  rs12896399 14



                                                                                                                                                                                                                                                                                  2.4. Prediction modelling for hair colour
Table 2 (Continued )

                                                                                                                                                                                                                                                                                     To develop a hair colour prediction model using samples from

                                                                                                                                                                                                                                                                                  several sites with varying levels of hair colour due to their position
                                                                                                                                                                                                                                                                                  within Europe, central, western and southern Europe, we took a

                                                                                                                                                                                                                                                                                  random subset of 80% of the samples from each site, Poland

                                                                                                                                                                                                                                                                                  (n = 875), Ireland (n = 272) and Greece (n = 96). This 80% subset



                                                                                                                                                                                                                                                                                  was used to train the model and was based on Multinomial Logistic
Forensic Science International: Genetics
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115                                             103

Regression (MLR), as previously published by Liu et al. [42]. In brief,                   model approach, a predicted hair colour is generated with an
individuals were categorised according to their hair phenotypes                           approximate indication of the colour being light or dark (i.e. light
and were split into 4 categories, Blond (n = 529), Brown (n = 490),                       brown, dark brown) due to the influence of the genotypes
Red (n = 109) and Black (n = 115). For their genotypes, 22 of the 24                      commonly associated with the light/dark categories, of blond
HIrisPlex DNA variations (as described above) were used to test for                       and black respectively. The further 20% of the combined dataset
hair colour differentiation and use in the prediction model. By                           (total n = 308), i.e. from Poland (n = 218), Ireland (n = 67) and
inputting the minor allele of each DNA variant, including its                             Greece (n = 23), was used to assess the accuracy of the prediction
phenotype and applying MLR, alpha and beta values are generated                           model in terms of the final hair colour prediction being correct or
that form the core of the prediction model. This model then allows                        incorrect based on colour category, shade and use of the hair colour
the probabilistic prediction of an individuals hair colour category                       prediction guide that is described in detail in Section 3, and an
solely based on the input of the 22 variant minor alleles into the                        assessment of optimal category thresholds was undertaken. The
HIrisPlex hair colour prediction tool. To assess the effect of the light                  steps to take when acquiring a prediction based on colour and
and dark shades of hair colour that may be contributed from blond                         shade are outlined in a guide provided below.
and black respectively, a similar approach was used that combined
the individuals grouped in the light category (blond, n = 529)                            3. Results and discussion
versus a dark category (black, n = 115). Red hair individuals were
omitted (n = 109) from this analysis as their resulting colour is                         3.1. HIrisPlex genotyping assay – design and sensitivity
based upon an MC1R cumulative mutation and not on the
continuous spectrum of light to dark (i.e. blond to black). Brown                            The HIrisPlex assay was designed with the intention to cope
hair individuals (n = 490) were omitted, as only the extremes of                          with low template and degraded DNA, a standard concern when
light and dark were required. Therefore using this two-pronged                            genotyping forensic casework samples. Therefore, care was taken

Fig. 1. An assessment of the HIrisPlex assay’s sensitivity on two individuals ascertained to have high numbers of heterozygote alleles (7 and 11, respectively) for quantified
DNA input at 500 pg, 125 pg, 63 pg and 31 pg. Full profiles were observed down to 63 pg DNA input with drop out occurring at 31 pg DNA input for insertion 1 (N29insA), SNP
15 (rs4959270), SNP 17 (rs1042602), SNP 18 (rs1800407), and SNP 23 (rs1393350) including a C allele drop in at SNP 9 (Y152OCH).
Forensic Science International: Genetics
104                                               S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115

to ensure small PCR amplicon sizes of
Forensic Science International: Genetics
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115                                               105

sensitive IrisPlex assay may be applied subsequently and may
provide a full 6-SNP profile for eye colour prediction on critical
DNA samples.

3.2. HIrisPlex model-based hair colour prediction

    MC1R polymorphisms are largely recessive when considered
individually, but also interact with each other through a genetic
mechanism known as ‘‘compound heterozygosity’’ [47–49]. In our
previous population-based hair colour prediction study [35], the
MC1R variants Y152OCH, N29insA, rs1805006, rs11547464,
rs1805007, rs1805008, rs1805009, rs1805005, rs2228479,
rs1110400 and rs885479 were all collapsed into two markers,
MC1R-R (R/R, R/wt, wt/wt) and MC1R-r (r/r, r/wt, wt/wt),
depending on the penetrance of the mutant alleles. Thus, the
total 22 hair colour markers were considered as 13 markers in our
previous prediction analysis, including, MC1R_R, MC1R_r,
rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2 (MATP)),
rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG),
rs2378249 (PIGU/ASIP), rs12913832 (HERC2), rs1800407 (OCA2),
rs16891982 (SLC45A2 (MATP)) and rs12203592 (IRF4). In the
current study, we had two main reasons for the development of a
new hair colour prediction model utilising a 22 DNA variant set
without collapsing into MC1R-R and MC1R-r. First, we were able to
produce a larger dataset that provides a broader representation of
Europe and its highly variable hair colour regions. Notably, we not
only increased the sample size relative to our previous study [35]
by 3-fold, but in addition to considering more Eastern Europeans
from Poland (also used before) we also added individuals from
Western Europe, i.e. Ireland and from Southern Europe, i.e. Greece.
These three countries display very different hair colour phenotype
frequencies (Table 1), which would also impact on the modelling.
The use of samples from three European regions and countries
provides an increase in overall sample size and also a better
representation of the hair colour phenotype variation across
Europe, but this also increases the different genotype combina-
                                                                                  Fig. 2. Hypothesised scenario of the effect of each HIrisPlex SNPs minor allele input
tions observable. Second, some of the MC1R variants also                          on the model for hair colour prediction as a homozygous genotype (the minor allele
contribute to hair colours other than red [35] (as seen in Table                  input is 2). The highest effect in terms of probability for a certain hair colour
3). As many individuals from Ireland display a higher frequency of                category is noted and the SNP is named near that category within the figure.
MC1R mutations, up to 75% noted in a previous study with 30% of
these being double mutations [50] and 78% in our own set contain
at least one of the MC1R mutations without displaying the red hair                non-brown and black versus non-black), we used a total set of
phenotype, other Europeans from other regions to which our hair                   1134 individuals representing the 80% model-building subset but
colour prediction tool may be applied in the future may also reflect               now omitting the red hair individuals from the analyses due to
this. Therefore, a new hair colour prediction model was developed                 their rare DNA variants and the fact that red hair is not a
to examine the input of each single DNA variation for hair colour                 continuous colour but more a combined MC1R mutation effect on
categorical prediction, including the individual impact of all MC1R               colour change [47,51]. As the probability values shown suggest, the
variants separately.                                                              results for hair colour variation from blond via brown to black
    Fig. 2 shows a hypothesised tree model illustrating how each of               (without red) are consistent with our previous findings [35] in
the 22 DNA variants contributes towards a categorical hair colour                 several DNA variants, i.e. rs12913832 (HERC2) and rs12203592
prediction as inferred from our current data. This scenario                       (IRF4) with high statistical support (P 10 6 to 10 16) in the present
represents the extreme of a 2 minor allele input for each single                  enlarged dataset considering Poland, Ireland and Greece. Although
DNA variant and the largest single hair colour category effect that is            less powerful, additional DNA variants also show significant
seen on the models prediction, based on that input. However, it is                evidence (p < 0.05) for some hair colours, such as rs2402130
important to note here (and as further outlined below) that it is the             (SLC24A4), rs12821256 (KITLG), rs4959270 (EXOC2), rs1805006
combination of all 22 DNA variants together in a single model that                (MC1R), rs1805007 (MC1R), rs1805008 (MC1R) for blond,
finally allows the prediction of hair colours as we suggest with this              rs1805006 (MC1R) and rs2402130 (SLC24A4) for brown, and
study.                                                                            rs1805007 (MC1R) for black. Red hair colour prediction is observed
    Table 3 provides a measure of the strength of each DNA variant’s              with highest probability values (P 10 8 to 10 16) for several of the
contribution towards each hair colour category prediction using                   individually considered MC1R variants as expected, i.e. rs1805008,
beta values including p-values obtained from the MLR model. The                   rs1805007, rs1805009 and rs11547464, and with somewhat less
analysis is based on the combined 80% model-building subset of                    statistical strength (p < 0.05) for other MC1R variants, i.e.
1243 Polish, Irish and Greek individuals assigned into a red versus               rs1805005, rs1805006 and rs1110400. However, due to the very
non-red colour category which then displays each DNA variants                     low frequency in our set of individuals of the generally rare MC1R
contribution towards red hair colour within the model. For the                    variant allele at N29insA (INDEL) and Y152OCH, their contribution
other categories (i.e. blond versus non-blond, brown versus                       towards red hair probabilities are particularly high (Table 3; red
106                                                 S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115

Fig. 3. Illustrative example of 44 individuals on the model performance of the HIrisPlex system for hair colour prediction. The 44 individual set was taken from the Irish
collection for which hair imagery was noted as neither grey nor dyed. These individuals are only a visual example of how the model performs. Probability values are given for
all four hair-colour categories (black, brown, red and blond) with the highest probability value and the category for which a colour is called highlighted. Dark and light
probability values are also indicated which show the amount of black and blond contribution and effect towards the final colour prediction using the guide in Fig. 4. The
individual hair figures are ordered from left to right, top to bottom, starting with the highest probabilities for black to the lowest in column one, for the remaining columns the
order is the lowest to the highest probabilities for brown, red and blond, respectively.
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115                                             107

hair beta values of 22 and 19.4 respectively), i.e. the presence of                        no threshold, to p > 0.7, which we previously recommended for
an A allele at N29insA or Y152OCH produces red hair prediction                             eye colour prediction using the IrisPlex system [4,19]. As seen in
probabilities of 1. This effect is not mirrored in the other MC1R                          Table 4, using the p > 0.7 (B) threshold increases the percentage of
variants investigated and reflects the presence of these very rare                          correct calls relative to the value obtained without using any
alleles (heterozygote and homozygote state) within all individuals                         threshold (A) for some hair colours such as red hair by 10% (i.e.
displaying a red hair phenotype in our model training set at a very                        from 89.5 without threshold to 100% with threshold), and for blond
low frequency (n = 6). Although this does not affect the final                              hair by 6.5% (i.e. from 57.2 to 63.6%), whereas no difference was
prediction of red hair, it is important to note the abnormally high                        seen for brown hair at 75%, and for black we saw a decrease by
probability values for red when these rare variants are present.                           8.5% (i.e. from 28.6 to 20%). The low prediction accuracy obtained
Notably, some DNA variants outside the MC1R gene also show                                 with this approach for black hair may reflect the difficulty of
significant red hair colour probabilities (p < 0.05), i.e. rs12913832                       defining the true black hair colour phenotype relative to the dark
(HERC2), and rs2378249 (PIGU/ASIP).                                                        brown phenotype within this European dataset, where black hair is
    Fig. 3 provides the results of HIrisPlex prediction for a subset of                    rare. Notably, the low correct call rate of 28.6% for black (without
44 Irish individuals where high-resolution non-dyed hair colour                            using a threshold) is mainly caused by 30 individuals with non-
imagery was available to illustrate the model’s performance. The                           black self-reported phenotypes that were predicted as black by the
individuals natural hair colour images were ordered according to                           HIrisPlex model. Of these, almost all (i.e. 90%) had the brown–dark
their predicted hair colour category probability values achieved via                       brown phenotype. We could speculate that at least some of them
HIrisPlex analysis while the actual hair colour phenotypes were                            may have been self-categorised as black if black hair colour would
not considered in the ordering. From left to right, top to bottom, the                     be more frequent in the sampled populations and therefore easier
images are ordered from the highest to lowest HIrisPlex prediction                         to differentiate from dark brown in the phenotyping procedure.
probabilities for black hair and then the lowest to highest                                Although red hair is also rare in the European population (albeit in
prediction probabilities for brown, red and blond hair respectively.                       our Polish dataset it was enriched for) this problem is less expected
As evident, there is a high correlation with the predicted hair                            for red hair as red is usually well differentiable from other hair
colour category from HIrisPlex and the hair colour phenotype                               colours, perhaps with the exception of the blond-red individuals.
observed from visual inspection of these images.                                           The prediction accuracy for blond hair, being lower than those for
    Table 4 shows the accuracy of hair colour prediction in the 20%                        red and brown hair colour with and without threshold, is partly
model-testing subset of the Polish, Irish, and Greek individuals                           due to another phenomenon that will be discussed in detail in
(n = 308). It is important to emphasise here that these individuals                        Section 3.3; age-dependent hair colour changes. As brown hair is
were not used for model building. The highest probability category                         the intermediary stage between blond and black, no prediction
approach (as opposed to the prediction-guide approach explained                            threshold for this category is required as can be seen in Table 4.
in the next paragraph) considers the colour category with the                              Even at the 75% correct call rate, the incorrect 5/8 defined
highest predicted probability as the final predicted colour and does                        themselves as being dark blond. Since we know an overlap exists
not take other categories into account for the final prediction.                            between light-brown and dark-blond in people’s perception and
Using this approach, we tested various probability thresholds, from                        definition of colour, it is best to consider dark blond the same

Table 4
HIrisPlex hair colour prediction accuracies obtained from a 308 separate model testing set of individuals from Poland, Ireland and Greece (individuals were not considered for
prediction model building for which a different set of 1243 individuals was used) using two approaches: the highest probability category approach (with and without
thresholds) and the prediction guide approach (see Fig. 4 for the prediction guide).

Highest probability category approach                          Observed categories
No threshold p value                               Red         Blond        Brown                      Black      Total prediction(n)
Red predicted                                      17(89.5%) 1(5.3%)        1(5.3%)                    0(0%)      19(100%)
Blond predicted                                    8(3.7%)     123(57.2%) 68(31.6%)                    16(7.4%) 215(100%)
Brown predicted                                    2(6.3%)     5(15.6%)     24(75%)                    1(3.1%)    32(100%)
Black predicted                                    1(2.4%)     2(4.8%)      27(64.3%)                  12(28.6%) 42(100%)
Total phenotype(n)                                          28          131                        120         29 308 Total

Highest probability category approach                         Observed categories
>0.7p threshold                                    Red        Blond        Brown                        Black      Total prediction(n)
Red predicted                                      8(100%)    0(0%)        0(0%)                        0(0%)      8(100%)
Blond predicted                                    2(1.8%)    70(63.6%)    33(30%)                      5(4.6%)    110(100%)
Brown predicted                                    0(0%)      0(0%)        3(75%)                       1(25%)     4(100%)
Black predicted                                    0(0%)      0(0%)        4(80%)                       1(20%)     5(100%)
Total phenotype(n)                                         10           70                           40          7 127 Total
Undefined                                          18(64.3%) 61(46.6%) 80(66.7%)                        22(75.9%) 181 (58.8%)

                                                                 Observed categories
Prediction guide approach                          Red           Blond         d-blond/l-brown D-Brown Black                               Total prediction (n)
Red predicted                                      24(80%)                   0 6(20%)             0(0%)      0(0%)                         30(100%)
Blond/d blond predicted                            2(1.7%)       32(26.4%)     52(43%)            30(24.8%) 5(4.1%)                        121(100%)
L-brown/d-brown predicted                          2(1.3%)       9(6%)         59(39.6%)          58(38.9%) 21(14.1%)                      149(100%)
Black/d-brown predicted                                        0             0 1(12.5%)           4(50%)     3(37.5%)                      8(100%)
Total phenotype (n)                                           28           41                 118         92                            29 308 Total
108                                             S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115

colour as light brown. Therefore, brown hair colour may also be                         highest-probability approach that we aimed to overcome by
seen at black and blond category predictions 0.7 p threshold) or nearly all (89.5%                  in combination with light/dark hair colour shade probabilities as
without threshold) individuals for which the red hair category was                      obtained from the HIrisPlex genotype data (Fig. 4, see also Section
the highest prediction probability were correctly predicted as seen                     3.5 for additional practical recommendations). The reason for
in Table 4. Notably, the two individuals that were incorrectly                          considering light/dark shade prediction in addition to categorical
predicted red without using a threshold defined themselves as                            hair colour prediction in the final approach is that the 22 DNA
blond and brown, respectively; upon inspection of a hair image of                       variants not only impact on the main hair colour categories, but
the latter individual that was available to us, it did in fact display                  also on more detailed hair colour information, which is difficult to
light red hints of colour. This reflects another example of how the                      measure; hence, we express in light/dark prediction probabilities.
phenotyping procedure, particularly self-reported hair colour                           For this, we took the individuals from the black category, now
grading as done in our Irish and Greek datasets, influences DNA                          termed dark, and the individuals from the blond category, now
prediction accuracy. However, it is important to point out here that                    termed light, and designed an additional prediction model for light
for 11(39%) individuals that had defined themselves as having red                        and dark colour shade. Therefore, the HIrisPlex genotype input
hair, the red hair probability was not the highest, relative to                         finally provides the core prediction colour category with an added
probabilities for non-red hair colour, and these individuals were                       level or shade, i.e. light or dark. This part of the prediction should
therefore missed out with HIrisPlex using this highest-probability                      be useful as additional information to the initial prediction
approach. Furthermore, for 8 (6%) of the phenotypic blond, 96                           category, e.g. to differentiate light blond from dark blond (light
(80%) of the phenotypic brown, and 17 (59%) of the phenotypic                           brown), or light brown from dark brown/black. It becomes
black hair individuals the highest predicted hair colour category                       particularly beneficial in the lower hair colour category prediction
did not correspond to the phenotypic hair colour category                               probability levels (i.e. category prediction
S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115                                 109

genotype combinations. A >0.9 threshold is used for light versus                  reasons yet to be unveiled, and there are no other high impact hair
dark shade prediction. As seen in Table 4(C), using the prediction                colour SNP that take its place. For instance, in our full dataset using
guide approach the correct call percentages were for all hair                     1551 individuals, the correlation of rs12913832 with eye colour is
colours considerably higher than using the highest probability                    nearly twice as high (Pearson correlation r2 = 0.46, p = 2.2e 16) as
category approach, except for red hair. In fact, using the prediction             its correlation with hair colour (Pearson correlation r2 = 0.24,
guide approach we obtained on average 69.5% correct calls for                     p = 2.2e 16). Furthermore, the colour distribution of European
blond, 78.5% for brown and 87.5% for black. Particularly black hair               hair appears much wider than that of European eyes, requiring the
prediction was strongly improved by using the prediction guide                    combination of several similar gene effects [54]. Thus, categorical
approach with an increase of almost 60% on average relative to the                hair colour prediction is expected to be more error-prone
highest probability category approach without a threshold. For an                 especially when involving factors such as shade and intensity,
explanation of why blond is the least accurately predictable hair                 etc. at least with the DNA markers known thus far. Additional
colour with currently available DNA markers, also after applying                  effects such as environmental contributions particularly life time
the prediction guide, see Section 3.3. Although we saw an apparent                that are much stronger on certain hair colours than they are on all
decrease of accurate prediction for red hair with the prediction                  eye colours also influence hair colour prediction accuracy more so
guide approach (80% versus 89.5% with highest-probability                         than eye colour prediction accuracy and will be discussed in the
approach without threshold), this can be explained by the total                   following chapter (see Section 3.3).
number of red predictions made by the models and if they were
correct or not. In particular, for the highest probability approach               3.3. Age-dependent hair colour changes and consequences for hair
the model was incorrect at predicting red only 2 times but missed                 colour prediction
out on 11 actual reds from our dataset. The prediction guide
approach, although was inaccurate for red hair prediction for 6                        Age-dependent changes in hair colour are evident from
individuals, it managed to predict 24 out of the 28 actual red hair               anecdotal knowledge. The most often observed age-dependent
phenotypes from our test set. In summary, the number of                           hair colour changes occurs from light blond during childhood
individuals in our 308 model-test set that were missed by                         towards dark blond/light brown as an adult, but can also occur
HIrisPlex hair colour prediction using the prediction guide                       from light brown to dark brown/almost black. Suggestions of
approach were 4 (14%) of the phenotypic red, 8 (19.5%) of the                     hormonal changes during adolescence have been advocated as a
phenotypic blond, 7 (6%) of the phenotypic d-blond/l-brown, 28                    possible explanation [55], but the molecular basis are yet to be
(31%) of the phenotypic d-brown and 26 (90%) of the phenotypic                    unveiled. In order to study the effect of age-dependent hair colour
black, with an overall hair colour prediction accuracy of 76%. All are            change on hair colour prediction from child to adulthood we
considerably less than what was missed when applying the highest                  recorded via questionnaires in the Irish sample set hair colour
probability category approach, apart from black hair where we                     during childhood and adulthood separately, including the approx-
believe phenotyping inaccuracy/perception of colour plays a role                  imate age of the hair colour change. Of the 339 Irish individuals,
as discussed above already, as 21 of those individuals were                       157 contained current images in which the hair was not dyed and
predicted as having d-brown hair and may have in fact displayed d-                not grey, and from these the 8 individuals that were classified as
brown hair that was perceived as black within Europe. We                          blond in adulthood were 100% correctly predicted by the HIrisPlex
therefore recommend using the prediction guide approach for                       system following the prediction guide approach. However, for 14
properly interpreting HIrisPlex genotype data and the probability                 individuals with light brown to black phenotypes the HIrisPlex
values derived from our prediction tool to infer the most likely hair             model had faltered and gave a high blond prediction probability
colour phenotype in future practical applications.                                (>0.7 p) with high light shade probabilities (>0.9 p). On further
    There are several important differences between eye and hair                  examination of these incorrectly predicted individuals, 8 (57%) of
colour, both on the phenotypic as well as the genotypic levels, that              them noted that a change in hair colour regarding a darkening from
may play a role in why some eye colours (i.e. blue and brown)                     blond to brown had occurred in their younger lives at ages ranging
appear to be currently predictable from DNA with higher accuracy                  from 9 to 12 years. Furthermore, we found a high and statistically
than some hair colours (i.e. all non-red hair colours). Rs12913832                significant correlation (Pearson correlation r2 = 0.81, p < 0.01)
from the HERC2 gene plays a major role in the functional aspects of               between the increase in brown (darkening of hair) and the increase
iris pigmentation [10,52] and its proposed model of action reflects                in age since the hair colour change occurred for those Irish
a type of on/off switch from the absence of the T allele (and the                 individuals for whom such data were available to us (Fig. 7), which
homozygous presence of the C-allele) resulting in blue eye colour,                substantiates that the hair colour change observed is age
to the presence of one or two T allele(s) reflecting brown eye colour              dependent in these individuals. From this data we can see that
[10]. Indeed, it has been shown recently [52] via a series of                     our current HIrisPlex system works to a high degree of accuracy for
functional genetic experiments that the rs12913832 T-allele leads                 hair colour prediction, but there may be processes that alter the
to binding of several transcription factors and a chromatin loop                  hair colour over an individual’s lifetime (possibly molecular
with the promoter of the neighbouring pigmentation gene OCA2                      processes) without changing the HIrisPlex predicted hair colour
leading to elevated OCA2 expression and dark pigmentation. In                     of the individual. For instance, an adult that had blond hair as a
contrast, when the rs12913832 C-allele is present, transcription                  young child, but now displays light–dark brown/black hair colour
factor binding, loop formation and OCA2 expression are all reduced                is likely to display blond HIrisPlex genotypes and therefore a blond
leading to light pigmentation. Because of its strong functional                   hair colour prediction will be obtained. This is due to the fact that
involvement, HERC2 rs12913832 shows the strongest predictive                      the hair colour SNPs included in the HIrisPlex system, as well as
power on categorical eye colour with an AUC of 0.877 for blue and                 any additional hair colour associated DNA variant available today,
0.899 for brown alone for this SNP [42], and it also shows a strong               were identified in studies dealing with adults, and not in studies
impact on quantitative eye colour variance explaining on average                  that particularly searched for bio-markers informative for the age-
46% of the H and S spectrum [53]. When comparing this to its                      dependent hair colour change, which is still yet to be carried out. It
impact on hair colour, the percentage of residual variation from                  is important to note therefore that the HIrisPlex model cannot
black to blond explained by HERC2 rs12913832 was 10.7% in a                       decipher between these change-affected individuals and blonds
previous study [8]. However, the effect of rs12913832 is                          who remain blonds from childhood to adulthood and thus a
considerably less on hair colour than it is on eye colour for                     HIrisPlex prediction of blond hair may be inaccurate to a certain
110                                                S. Walsh et al. / Forensic Science International: Genetics 7 (2013) 98–115

degree (30% (Supplementary Table 3) in our dataset). This                                  questionable individuals during a police inquiry. For differentiat-
limitation in DNA-based hair colour prediction will remain as                              ing whether a crime scene sample donor still had his/her natural
long as bio-markers informative for indicating age-dependent hair                          hair colour, or perhaps turned grey or white already, a molecular
colour changes are not identified. Furthermore, this age-dependent                          age estimation performed on crime scene samples such as blood
study was conducted using images solely taken from a small Irish                           would be useful in combination with the HIrisPlex application.
set (childhood hair colour was not available for the Polish and the                        Previously, our group developed a DNA test for chronological age,
Greek set); it is worth mentioning that this may reflect a trend in                         which allows age-group estimation on an accurate level [57].
other countries within Europe, however we do not have this                                 Obviously, any dyed hair colour, as long as it produces a hair colour
information as of present. Therefore more samples and increased                            different from the natural hair colour category, would not be
accuracy testing of the HIrisPlex system on a broader collection                           identifiable with HIrisplex or any other DNA-based hair colour
around Europe would be advantageous to get a better measure of                             prediction tool. However, in general it is believed that many people
this phenomenon. Furthermore, activities shall be placed for                               who dye their hair as a result of hair greying, and with the intention
finding the processes/genes responsible for age-dependent hair                              of hiding the fact that their hair has greyed, try to achieve their
colour changes and developing respective bio-markers that may                              natural hair colour category via dyeing, especially in the case of
increase hair colour prediction accuracy in the future.                                    men, to avoid stigmatisms associated with hair colouring. In such
   A different aspect of age-dependent hair colour change is the                           cases HIrisPlex hair colour prediction can still be useful even
loss of hair colour when turning grey and white at a more or less                          though the hair is dyed.
advanced age, which likely represents a different mechanism of
action [56] than changing from one hair colour to another. We                              3.4. HIrisPlex analysis on a worldwide scale
examined the Irish population of 339 individuals for which we had
questionnaire information on the age at which grey or white hairs                             Due to the fact that the HIrisPlex hair prediction model was
had started to grow. As shown in Supplementary Fig. 1, after the                           created using individuals solely from Europe, as it should be for a
age of 30 there are more individuals starting to produce grey or                           European trait, to verify its use outside of Europe we performed
white hairs relative to those who do not, confirming anecdotal                              HIrisPlex analysis on worldwide DNA samples from the H952
knowledge. However, we have no data on how long it will take for                           subset of the HGDP-CEPH panel that represents 952 individuals
those individuals who started to have grey hairs to turn grey to a                         from 51 populations [36,37]. Due to lack of DNA in some samples, a
substantially obvious phenotypic degree. For practical consider-                           final number of 945 worldwide samples were used. Fig. 5 displays
ations, knowing the natural hair colour for an individual during its                       the prediction of the four hair-colour categories blond, brown, red
youth that now at more advanced age displays an obvious grey or                            and black on a worldwide scale. This figure does not use any
white hair phenotype will not be directly useful in an investigative                       threshold parameters and therefore it is worthy to note that the
search, but this information can still be useful albeit less strongly,                     prediction levels of blond hair in Europe (especially with
when asked for natural hair colour prior to greying in these                               probability values
Next part ... Cancel