Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI

Page created by Benjamin Jimenez
 
CONTINUE READING
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
International Journal of
            Molecular Sciences

Article
Virtual Screening of C. Sativa Constituents for the
Identification of Selective Ligands for Cannabinoid
Receptor 2
Mikołaj Mizera 1 , Dorota Latek 2                 and Judyta Cielecka-Piontek 1, *
 1    Department of Pharmacognosy, Poznan University of Medical Sciences, 60-781 Poznań, Poland;
      mikolajmizera@gmail.com
 2    Faculty of Chemistry, University of Warsaw, 02-093 Warsaw, Poland; dlatek@chem.uw.edu.pl
 *    Correspondence: jpiontek@ump.edu.pl; Tel.: +48-854-67-10
                                                                                                      
 Received: 28 April 2020; Accepted: 24 July 2020; Published: 26 July 2020                             

 Abstract: The selective targeting of the cannabinoid receptor 2 (CB2) is crucial for the development of
 peripheral system-acting cannabinoid analgesics. This work aimed at computer-assisted identification
 of prospective CB2-selective compounds among the constituents of Cannabis Sativa. The molecular
 structures and corresponding binding affinities to CB1 and CB2 receptors were collected from ChEMBL.
 The molecular structures of Cannabis Sativa constituents were collected from a phytochemical database.
 The collected records were curated and applied for the development of quantitative structure-activity
 relationship (QSAR) models with a machine learning approach. The validated models predicted the
 affinities of Cannabis Sativa constituents. Four structures of CB2 were acquired from the Protein Data
 Bank (PDB) and the discriminatory ability of CB2-selective ligands and two sets of decoys were tested.
 We succeeded in developing the QSAR model by achieving Q2 5-CV > 0.62. The QSAR models helped
 to identify three prospective CB2-selective molecules that are dissimilar to already tested compounds.
 In a complementary structure-based virtual screening study that used available PDB structures of
 CB2, the agonist-bound, Cryogenic Electron Microscopy structure of CB2 showed the best statistical
 performance in discriminating between CB2-active and non-active ligands. The same structure also
 performed best in discriminating between CB2-selective ligands from non-selective ligands.

 Keywords: QSAR; endocannabinoid system; Cannabis Sativa

1. Introduction
      The medical effect of Cannabis sp. bioactive ingredients is the subject of extensive research [1–3].
It has been discovered that Cannabis sp. contains over 500 various compounds, with the cannabinoid
group itself having about 110 molecules [4]. Phytocannabinoids present in Cannabis sp. have slight
differences in their chemical structures (types: cannabidiol, cannabichromene, cannabitriol,
cannabicycol, cannabinodiol, cannabinol, cannabielsoin, cannabigerol, ∆9 -tetrahydrocannabinol,
∆8 -tetrahydrocannabinate) and are not selective. The most well-known phytocannabinoids are THC
(∆8 -tetrahydrocannabinol) and CBD (cannabidiol). Research to date reports that both THC and CBD
have an affinity for various types of endocannabinoid system receptors [5–7]. For example, cannabidiol
is a non-competitive CB1 antagonist, CB2 inverse agonist, GPR55 and GPR18 antagonist, Peroxisome
proliferator-activated receptor (PPAR-γ) agonist, α1, α3 glycine agonist, TRPM8 antagonist, and an
agonist and antagonist of receptors of various types of serotonin 1A receptors (5-HT1A) [8–11].
The pharmacological effects of the interaction of Active pharmaceutical ingredients (APIs) with CB1
and CB2 receptors are the most investigated. The differences in the density of CB1 and CB2 receptor
distribution also determine the activities within the central nervous system [12]. The pharmacological

Int. J. Mol. Sci. 2020, 21, 5308; doi:10.3390/ijms21155308                           www.mdpi.com/journal/ijms
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
Int. J. Mol. Sci. 2020, 21, 5308                                                                    2 of 14

effect of cannabinoids results from modification of the signaling pathway of two G protein-coupled
receptors (GPCRs): cannabinoid receptors 1 and 2 (CB1 and CB2). Activation of the CB1 receptor,
which is expressed mainly in the central nervous system, is responsible for their psychotropic
action, while CB2 receptors are found mainly in the immune system [13]. This distinct distribution
of CB2 receptors in human tissues suggests that selective cannabinoids are promising APIs with
anti-inflammatory, analgesic, and anti-neuroinflammatory actions [14]. The discovery of new selective
cannabinoids can be streamlined with the application of in silico methods such as structure-based
virtual screening (VS) [15–17] or ligand-based (quantitative structure-activity relationship (QSAR))
VS [18].
      Machine learning-based QSAR models can be successfully applied in pharmaceutical research
related to drug discovery [19], drug formulation [20], and pharmaceutical analysis [21,22].
The applicability of QSAR modeling to predict the selectivity of CB receptors has also been reported in
the literature. The CB2-selective activity of 29 benzimidazole and benzothiophene derivatives was
investigated with comparative molecular field analysis (CoMFA) and comparative molecular similarity
indices analysis (CoMSIA) 3D-QSAR models [23] and showed an external predictive performance of R2
> 0.9. In the recent 4D-QSAR study involving molecular dynamics (MD), the modeling was performed
on 29 structurally similar CB2 receptor inverse agonists [24]. The 4D-QSAR approach based on partial
least squares (PLS) and multiple linear regression (MLR) resulted in Q2 = 0.719 and Q2 = 0.761 for the
PLS and MLR models, respectively. The selectivity of 29 arylpyrazole derivatives was investigated
with the application of 3D-QSAR/CoMFA analyses [25]. The QSAR model helped to identify the causes
of CB1 selectivity by producing counter maps of affinity for both CB receptor subtypes. Nevertheless,
the narrow applicability of these models due to similar scaffolds used as training examples may
be a significant limitation when they are used in the prediction of novel chemotypes. In contrast,
Floresta et al. conducted a 3D-QSAR study on a diverse dataset containing 312 molecules with reported
experimental CB1 affinities and 187 molecules with reported CB2 affinities [26]. These models showed
a predictive performance of Q2 = 0.62 and Q2 = 0.72 for the CB1 and CB2 QSAR models, respectively.
The diversity of molecular structures in the training set also allowed Floresta et al. to perform VS
for novel chemical scaffolds. The idea of the QSAR application for the identification of bioactive
compounds in plant extract was also used by Labib et al. [27]. This VS study aimed to predict the
activities of CB1 and opioid receptors among compounds isolated from Pinus roxburghii bark extract.
The model of Labib et al. explained the synergistic anti-inflammatory action of Pinus roxburghii and
provided information on the activity of several bioactive molecules identified in the extract.
      Our study aimed to predict CB2-selectivity for molecules identified in Cannabis Sativa by using a
validated QSAR model. This goal was achieved by the execution of several steps: the collection of
diverse ligands’ structures with experimental data describing CB1 and CB2 affinity; data curation;
conducting QSAR study for CB1 and CB2; and the prediction of the CB1 and CB2 affinity of
phytochemicals in the Cannabis Sativa. So far, the lack of crystal structures in cannabinoid receptors has
been a major obstacle in searching for new, selective cannabinoids. However, there have been attempts
to predict binding modes of well-known actives of CB1 and CB2 using homology models of these
receptors and molecular dynamics [17,28,29]. Recent advances in structural studies of cannabinoid
receptors 6KPC [30], 6KPF [30], 6PT0 [31], and 5ZTY [32] provided new data that supplemented drug
discovery studies [30]. In this study, we tested the applicability of available experimental structures of
CB2, solved in both active and inactive conformations, in structure-based VS to search for novel or
more CB2-selective ligands.

2. Results

2.1. Study Design
    Data on the experimentally evaluated affinity of diverse compounds to CB1 and CB2 receptors
and data on structures identified in Cannabis Sativa with unknown CB1/CB2 affinity were collected
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
Int. J. Mol. Sci. 2020, 21, 5308                                                                                                      3 of 14

from publicly accessible online resources. Next, the data with known affinity values were curated and
subsequently used for the development and validation of CB1 and CB2 QSAR models. The validated
models were used for the prediction of the CB1 and CB2 affinity of Cannabis Sativa ingredients and
the prediction of prospective CB2-selective Cannabis Sativa ingredients. The dataset collected for the
QSAR study was used to evaluate the statistical characteristics of the discriminatory ability of CB1 and
CB2Int.crystal
        J. Mol. Sci.structures     inPEER
                     2020, 21, x FOR  theREVIEW
                                          docking study.                               3 of 14

2.2.validated models were used for the prediction of the CB1 and CB2 affinity of Cannabis Sativa
     Data Curation
   ingredients and the prediction of prospective CB2-selective Cannabis Sativa ingredients. The dataset
     Datasets
   collected      of ligands
              for the         withwas
                      QSAR study     experimental    affinities
                                        used to evaluate         for CB1
                                                         the statistical   and CB2 were
                                                                         characteristics of thejoined  with corresponding
                                                                                                 discriminatory
   ability of CB1
records            andmetadata
               assay    CB2 crystal from
                                    structures in the docking
                                           ChEMBL.             study.
                                                        The resulting       dataset was composed of six columns
containing: the structure of the molecule in simplified molecular-input line-entry system (SMILES)
    2.2. Data Curation
format, a standard type of reported value (i.e., Ki, IC50, etc.), reported value, the mathematical relation
          Datasets ofvalue
of the reported         ligands towith
                                   theexperimental
                                         experimentally affinities  for CB1 and
                                                               measured            CB2 were
                                                                               value,    the ID joined  with corresponding
                                                                                                    number      of the source document
    records of assay metadata from ChEMBL. The resulting dataset was composed of six columns
with reported study, and the confidence score. The initial dataset described above was subjected
    containing: the structure of the molecule in simplified molecular-input line-entry system (SMILES)
to the   removal
    format,            of potentially
              a standard                   unreliable
                              type of reported    value data     (Figure
                                                          (i.e., Ki,  IC50, 1).
                                                                              etc.),The   initialvalue,
                                                                                      reported      datasetthe included
                                                                                                                mathematical 14,126 records
forrelation
     CB1 and      13,506     records    for  CB2   (Figure     1.1).  Records       without      reported
              of the reported value to the experimentally measured value, the ID number of the source         SMILEs,      which   included
metal    complexes
    document              and polymers
                 with reported      study, andwere      excludedscore.
                                                 the confidence        fromThe  theinitial
                                                                                       dataset     (Figure
                                                                                             dataset          1.2).above
                                                                                                      described       Forwas the remaining
    subjected
records,     thetorespective
                    the removalmolecular
                                   of potentially  unreliablewere
                                                structures       data (Figure     1). The initial
                                                                        standardized          anddataset     included 14,126
                                                                                                    2D coordinates         were generated.
Werecords
     assumed  for CB1
                   that and
                         both13,506    records which
                                 the assays,    for CB2were(Figure   1.1). Records
                                                                 carried     out onwithout
                                                                                        a targetreported
                                                                                                    proteinSMILEs,      which
                                                                                                               or a homologous        protein
    included metal complexes and polymers were excluded from the dataset (Figure 1.2). For the
were reliable. Records meeting these criteria were annotated in ChEMBL with a confidence score
    remaining records, the respective molecular structures were standardized and 2D coordinates were
of 9generated.
      and 8, respectively
                  We assumed (Figurethat both1.3).    Only activities
                                                the assays,     which were    measured
                                                                                 carried out  in on
                                                                                                  large   and consistent
                                                                                                      a target   protein or a assays were
picked     from    the   dataset    (Figure    1.4).That     is,  an   assay    was     considered
    homologous protein were reliable. Records meeting these criteria were annotated in ChEMBL with       as  suitable     for aselection if it
contained
    confidence a score
                  consistent
                         of 9 andgroup     of 10 or
                                   8, respectively      more1.3).
                                                    (Figure     molecules,        all with
                                                                     Only activities           the affinity
                                                                                        measured                measured.
                                                                                                     in large and   consistent These large
    assays
assays       wereidentified
          were       picked from     the dataset
                                 based    on the (Figure
                                                  document   1.4).That   is, an assay
                                                                   ID reported           was considered
                                                                                     in ChEMBL.         In theas next
                                                                                                                  suitable
                                                                                                                        step,forrecords with
    selection if it contained a consistent group of 10 or more molecules, all with the affinity measured.
a reported affinity in any standard value type related to Ki were kept (Figure 1.5) and subsequently
    These large assays were identified based on the document ID reported in ChEMBL. In the next step,
converted      to pKi. Duplicates analysis (Figure 1.6) was conducted by comparing records InChIKeys.
    records with a reported affinity in any standard value type related to Ki were kept (Figure 1.5) and
Duplicates
    subsequentlywere     mergedtoifpKi.
                      converted        theDuplicates
                                            standardanalysis
                                                         deviation      of duplicate
                                                                  (Figure                   measurements
                                                                             1.6) was conducted                  wasrecords
                                                                                                      by comparing     lower than 10% of
theInChIKeys.
     entire range       of measurements.
                   Duplicates    were merged if  Asthea standard
                                                        result, only      one record
                                                                    deviation             was measurements
                                                                                 of duplicate     kept, with thewas  pKi    value averaged.
                                                                                                                         lower
    than 10%
Mordred         of the entireand
              descriptors        range  of measurements.
                                     Morgan     fingerprints   As were
                                                                   a result,  only one record
                                                                           computed         for eachwascompound
                                                                                                          kept, with the
                                                                                                                       in pKi
                                                                                                                            the dataset and
thenvalue   averaged. Mordred
       concatenated         to createdescriptors
                                         a featureandvector.
                                                        MorganRecords
                                                                   fingerprintswithwere    computedfeature
                                                                                       duplicate        for eachvectors
                                                                                                                   compound  were removed
    in the dataset and then concatenated to create a feature vector. Records with duplicate feature vectors
(Figure 1.7). These included, e.g., compounds differing only by chiral hydrogen atoms and those that
    were removed (Figure 1.7). These included, e.g., compounds differing only by chiral hydrogen atoms
could
    and not
         thosebe   differentiated
                that                   by the used
                      could not be differentiated   by descriptors       and fingerprints.
                                                        the used descriptors      and fingerprints.

         Figure 1. Data records that passed through the subsequent data curation steps: (1) data acquisition,
      Figure 1. Data records that passed through the subsequent data curation steps: (1) data acquisition,
         (2) removing records with no SMILES included, (3) removing records with Confidence Score < 8, (4)
      (2)keeping
           removing
                  only records    with large
                        records from    no SMILES     included,
                                             assays, (5)          (3) removing
                                                         keeping only             records
                                                                       records with        with
                                                                                    included    Confidence
                                                                                             Ki values, (6)     Score
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
Int. J. Mol. Sci. 2020, 21, 5308                                                                                   4 of 14

 2.3.
Int.    Model
     J. Mol. Sci. Description
                  2020, 21, x FOR PEER REVIEW                                                                     4 of 14

      The independent CB1 and CB2 QSAR models were created according to the architecture presented
     The independent CB1 and CB2 QSAR models were created according to the architecture
 in Figure 2. The input feature vectors for the machine learning algorithm were molecular descriptors
presented in Figure 2. The input feature vectors for the machine learning algorithm were molecular
 and fingerprints of compounds from the training set (Figure 2.1). The feature vectors along with
descriptors and fingerprints of compounds from the training set (Figure 2.1). The feature vectors
 experimental CB1 and CB2 pKi values were used as a training set for the CB1 and CB2 QSAR models
along with experimental CB1 and CB2 pKi values were used as a training set for the CB1 and CB2
 (Figure 2.2).
QSAR models (Figure 2.2).

           Figure
            Figure2.2.Machine
                       Machinelearning-based
                                learning-basedquantitative
                                               quantitativestructure-activity
                                                             structure-activityrelationship
                                                                                relationship(QSAR)
                                                                                             (QSAR)model.
                                                                                                    model.

     The
       Thevalue
           valuepredicted
                  predictedinineach
                                eachleaf
                                      leafofofdecision
                                               decisiontrees
                                                          treesininaagradient
                                                                      gradientboosting
                                                                                 boosting(GB)(GB)ensemble
                                                                                                  ensemblewas  wasused
                                                                                                                     used
for the embedding    of a descriptor   space   (Figure  2.3). The   embedded      representation
 for the embedding of a descriptor space (Figure 2.3). The embedded representation of the descriptorof the   descriptor
space
 spacewas
        wasused
             usedtotocreate
                      createaaseparate
                                separatek-nearest
                                           k-nearestneighbor
                                                       neighbor(kNN)(kNN)model
                                                                             modelfor foreach
                                                                                          eachreceptor
                                                                                                receptor(Figure
                                                                                                            (Figure2.4).
                                                                                                                      2.4).
The
 ThekNN
      kNNalgorithm
            algorithmwaswasmodified
                             modifiedtotopredict
                                            predictan anaverage
                                                         averagepKi  pKivalue
                                                                          valueofofaacompound
                                                                                       compoundonly  onlyififall
                                                                                                              allnearest
                                                                                                                  nearest
neighbors
 neighborsofofthe query
                the  querymolecular
                            molecular structure
                                         structurewere   within
                                                      were   withina given  distance.
                                                                       a given          This This
                                                                                 distance.    maximal   distance
                                                                                                    maximal         was
                                                                                                                 distance
used   as anasapplicability
 was used                    domain
               an applicability   domainthreshold    related
                                             threshold          to the
                                                          related        confidence
                                                                    to the  confidence  of of
                                                                                            prediction
                                                                                              prediction(Figure
                                                                                                            (Figure2.5).
                                                                                                                      2.5).
Different
 Different thresholds   weretested
           thresholds were    testedtotoassess
                                          assess   their
                                                their     influence
                                                      influence        on statistical
                                                                   on the  the statistical   characteristics
                                                                                      characteristics  of kNN   ofmodels.
                                                                                                                   kNN
models.  We selected
 We selected           20 thresholds
              20 thresholds             as percentiles
                              as percentiles              5 to a100
                                               5 to 100 with         with
                                                                  step     a step
                                                                       equal  to 5%equal   to distribution
                                                                                      of the  5% of the distribution
                                                                                                             of maximal
ofEuclidean
   maximal distances
             Euclideanwithin
                         distances
                                kNN within
                                      clusterskNN   clustersfor
                                                 obtained     obtained    for the
                                                                  the training  set.training set.

 2.4.Model
2.4.  ModelValidation
            Validation

      InInFigure
          Figure 3,
                  3,we
                     wepresent
                          present thethe
                                      dependence   of cross-validated
                                          dependence                     Q2 on Q
                                                         of cross-validated    the  applicability
                                                                                 2 on               domain threshold.
                                                                                        the applicability    domain
 The predictive
threshold.   The performance      showed consistent
                   predictive performance       showedbehavior,  declining
                                                           consistent        with an
                                                                        behavior,     increasedwith
                                                                                    declining     threshold.  The best
                                                                                                        an increased
 statistical The
threshold.   characteristics Q >characteristics
                                2
                  best statistical   0.8 for the CB1Qand
                                                       2   CB2for
                                                         > 0.8  models
                                                                  the CB1 was  observed
                                                                             and          for thewas
                                                                                   CB2 models       lowest thresholds
                                                                                                        observed   for
 (QSAR
  0.6, as best practices
          suggested      [33]. of QSAR best practices [33].
                      in studies
        An autocorrelation plot (Figure 4) shows good agreement between out-of-sample predictions and
 experimental values for the majority of compounds, and importantly, for the CB2 selectivity ranges of
 pKi (pKi < 6 for CB1 and pKi > 6.5 for CB2). The majority of data points presented in Figure 4 are
 distributed between 6 and 8. Because machine learning techniques used in this study are interpolating
 techniques, we expected our model to overestimate pKi predictions at lower values and underestimate
 predictions at large ones.

      Figure 3. Q2—applicability domain threshold dependence for CB1 and CB2 k-nearest neighbor (kNN)
      models. Each point on the curves represents a Q2 calculated for external predictions for a cumulative
      fraction of molecular structures in the training dataset within a step of 5%.
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
In Figure 3, we present the dependence of cross-validated Q2 on the applicability domain
threshold. The predictive performance showed consistent behavior, declining with an increased
threshold. The best statistical characteristics Q2 > 0.8 for the CB1 and CB2 models was observed for
the lowest thresholds ( 0.6, as suggested in studies of QSAR best practices [33].

Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW                                                                          5 of 14
Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW                                                                          5 of 14
      An autocorrelation plot (Figure 4) shows good agreement between out-of-sample predictions
 and An
      experimental     valuesplot
           autocorrelation       for (Figure
                                      the majority      of compounds,
                                                 4) shows    good agreement and importantly,      for the CB2predictions
                                                                                   between out-of-sample           selectivity
 ranges
and      of pKi (pKi values
      experimental      < 6 for CB1
                                for theandmajority
                                            pKi > 6.5of  forcompounds,
                                                             CB2). The majority       of data points
                                                                            and importantly,      for presented      in Figure
                                                                                                       the CB2 selectivity
 4 are  distributed    between
ranges of pKi (pKi < 6 for CB1     6  and   8.  Because      machine    learning    techniques    used   in  this
                                           pKi > 6.5 for CB2). The majority of data points presented in Figure      study  are
4interpolating
   areFigure
      Figure     techniques,
        distributed2   between  we6 expected
                 Q2—applicability
              3. Q
              3.     —applicability   and
                                      domain
                                     domain       our model
                                            8. threshold
                                                Because
                                                threshold       to overestimate
                                                            machine
                                                            dependence
                                                           dependence   learning
                                                                          forCB1
                                                                         for  CB1andpKi
                                                                                   and   predictions
                                                                                    techniques
                                                                                        CB2
                                                                                        CB2       usedat
                                                                                             k-nearest
                                                                                            k-nearest     lower
                                                                                                         in       values
                                                                                                                    study and
                                                                                                             this (kNN)
                                                                                                        neighbor
                                                                                                       neighbor    (kNN)   are
 underestimate
interpolating
      models.      predictions
                 techniques,
                Each  point  on weat
                                the  large
                                    expected
                                     curves ones. our
                                             representsmodel
                                                          a  Q2to  overestimate
                                                                 calculated for     pKi
                                                                                 externalpredictions
                                                                                          predictions at
                                                                                                      for lower
                                                                                                           a     values
                                                                                                             cumulative   and
      models. Each point on the curves represents a Q calculated for external predictions for a cumulative
                                                              2

underestimate
      fraction of
      fraction     predictions
               of molecular
                  molecular       at largein
                              structures
                              structures  inones.
                                              the
                                              the training
                                                   training dataset
                                                             dataset within
                                                                      within aa step
                                                                                step of
                                                                                     of 5%.
                                                                                        5%.

      Figure 4. Autocorrelation of experimental and predicted pKi values for the CB1 model (blue) and the
      CB2 model
      Figure
      Figure     (orange).
                Autocorrelation
             4. Autocorrelation
             4.                 of experimental
                                of experimental and
                                                and predicted
                                                    predicted pKi
                                                              pKi values
                                                                  values for
                                                                         for the
                                                                             the CB1
                                                                                 CB1 model
                                                                                     model (blue)
                                                                                           (blue) and
                                                                                                  and the
                                                                                                      the
      CB2 model
      CB2 model (orange).
                (orange).
2.5. Virtual Screening of Cannabis Sativa Phytochemicals
2.5. Virtual
2.5. Virtual Screening
              Screening of
                         of Cannabis
                            Cannabis Sativa
                                       Sativa Phytochemicals
                                               Phytochemicals
      Sixty-eight Cannabis Sativa phytochemicals with a molecular weight between 250D and 500D
      Sixty-eight
wereSixty-eight
       subjected to  Cannabis
                      VS using
                    Cannabis   Sativa  phytochemicals
                                 our validated
                              Sativa               CB1 and
                                       phytochemicals       with
                                                              CB2aaQSAR
                                                            with     molecular   weight
                                                                            models.
                                                                     molecular             betweenthe
                                                                                      On average,
                                                                                 weight    between     250D
                                                                                                       250D     and 500D
                                                                                                                     500D
                                                                                                           selectivity
                                                                                                               and       of
were
were   subjected
compounds
      subjected    to
                from  VS  using
                       Cannabis
                   to VS        our   validated
                                  Sativa
                          using our        was less
                                      validated    CB1   and
                                                   CB1than    CB2   QSAR
                                                              for the
                                                         and CB2    QSAR    models.
                                                                        ChEMBL-derivedOn  average,
                                                                                              training
                                                                            models. On average,       the   selectivity
                                                                                                     the set.            of
                                                                                                                 Still, we
                                                                                                           selectivity   of
compounds
observed
compounds       fromCannabis
               from
            a relatively  highSativa
                       Cannabis        waswaslessless
                                autocorrelation
                                  Sativa          than  for the
                                                    ofthan
                                                        the   forChEMBL-derived
                                                             predicted   CB1 and CB2
                                                                   the ChEMBL-derivedtraining  set. of
                                                                                          activity  Still,
                                                                                              training      we observed
                                                                                                        Cannabis
                                                                                                           set.      Sativa
                                                                                                                 Still, we
a relatively
observed     high
phytochemicals      autocorrelation
                    (see Figure
           a relatively               of the predicted
                                 5) in comparison
                          high autocorrelation      of toCB1
                                                          the
                                                        the    and  CB2
                                                               ChEMBLCB1
                                                            predicted   activity of
                                                                          training   Cannabis
                                                                                    set. The
                                                                              and CB2          Sativa
                                                                                              average
                                                                                         activity       phytochemicals
                                                                                                          absoluteSativa
                                                                                                   of Cannabis        dpKi
(see Figure
between   CB25) and
phytochemicals   in (see
                    comparison
                      CB1        toin
                           was 0.69
                         Figure 5)   the
                                      and ChEMBL     training
                                           1.26 for molecules
                                       comparison               set. The average
                                                                  in Cannabis
                                                      to the ChEMBL                absolute
                                                                               Sativa
                                                                          training    and
                                                                                    set. The  dpKi
                                                                                            molecules
                                                                                              averagebetween
                                                                                                         in  the CB2
                                                                                                         absolute       and
                                                                                                                  training
                                                                                                                      dpKi
CB1respectively.
set,  wasCB2
between    0.69 and
                 andCB1
                      1.26was
                           for molecules
                               0.69 and 1.26in Cannabis    Sativainand
                                                for molecules           molecules
                                                                     Cannabis Sativainand
                                                                                       the molecules
                                                                                            training set,in respectively.
                                                                                                             the training
set, respectively.

      Figure 5. The correlation
                    correlationbetween
                                betweenaffinity
                                        affinityagainst
                                                 againstCB1
                                                         CB1and CB2
                                                              and   forfor
                                                                  CB2   compounds in Cannabis
                                                                           compounds          Sativa
                                                                                     in Cannabis     and
                                                                                                  sativa
      compounds
      Figure      reported
      and compounds         in ChEMBL.
                       reported between
             5. The correlation in ChEMBL.
                                        affinity against CB1 and CB2 for compounds in Cannabis sativa
      and compounds reported in ChEMBL.
     Despite relatively low average predicted selectivity of compounds in Cannabis Sativa, structures
C1–C3    showing
     Despite      dpKi >low
              relatively   1 and  a Tanimoto
                              average  predictedcoefficient
                                                  selectivity(TC)  < 0.5 were in
                                                              of compounds     identified
                                                                                 Cannabis(Table
                                                                                            Sativa,1).structures
                                                                                                        Low TC
values indicated
C1–C3              a significant
        showing dpKi     > 1 andstructural
                                  a Tanimotodifference  in the
                                               coefficient     identified
                                                             (TC)         molecular
                                                                  < 0.5 were           structures
                                                                               identified          in Cannabis
                                                                                           (Table 1).   Low TC
Sativa compared
values  indicated atosignificant
                      the respective most similar
                                 structural         molecular
                                            difference          structuresmolecular
                                                        in the identified  in the training   set. in Cannabis
                                                                                       structures
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
Int. J. Mol. Sci. 2020, 21, 5308                                                                                                                                                                 6 of 14

         Despite relatively low average predicted selectivity of compounds in Cannabis Sativa, structures
C1–C3 showing dpKi > 1 and a Tanimoto coefficient (TC) < 0.5 were identified (Table 1). Low TC
  Int. Int.
values          J. Mol.
                      Sci. Sci.
               indicated        2020, x21, x FOR
                                   axxsignificant PEER REVIEW
                                                      structural                                                 6 of6614
                                                                 difference in the identified molecular structures      of 14
                                                                                                                           Cannabis
                                                                                                                      6in
          J.Int.
              Mol.         2020,  21,    FOR   PEER REVIEW
 Int. J.J.Int.
 Int.        Mol.
             Mol.J.J.Mol.
                     Mol.
                     Sci.
                     Sci.
                            Sci.2020,
                           Sci.
                          2020,
                          2020,
                                 2020,
                                 21,
                                 21,
                                        21,
                                       21,
                                        FOR
                                       FOR  xxFOR
                                              FOR
                                              PEER
                                              PEER
                                                  PEER
                                                  PEER REVIEW
                                                       REVIEW
                                                   REVIEW
                                                   REVIEW                                                      66 of
                                                                                                                  of 14
                                                                                                                     14  of14
                                                                                                                        of  14
Sativa compared to the respective most similar molecular structures in the training set.
               Table
           Table        1. Compounds
                    1. Compounds      fromfrom   Cannabis
                                             Cannabis     Sativa
                                                      Sativa thatthat were
                                                                   were     predicted
                                                                         predicted    as CB2-selective
                                                                                   as CB2-selective andandareare
                                                                                                              alsoalso dissimilar
                                                                                                                    dissimilar
          Table
          Table Table
               Table     1.Compounds
                        1. Compounds
                  1. Compounds
                  1.  Compounds      from  from
                                     fromfrom    Cannabis
                                                 Cannabis
                                            Cannabis
                                            Cannabis Sativa
                                                     SativaSativa
                                                          Sativa
                                                            that  that
                                                            that that
                                                                  were
                                                                  were were
                                                                      were   predicted
                                                                            predicted
                                                                        predicted
                                                                        predicted      asCB2-selective
                                                                                      as  CB2-selective
                                                                                  as CB2-selective
                                                                                  as CB2-selective       and
                                                                                                        and
                                                                                                   and are
                                                                                                   and  are   arealso
                                                                                                              are
                                                                                                             also
                                                                                                             also  also dissimilar
                                                                                                                       dissimilar
                                                                                                                  dissimilar
                                                                                                                  dissimilar
               to
           to the the   ChEMBL
                    ChEMBL         training
                               training       set.
                                          set. set.
          to the
             the
         Table
          to    to
               to1. the
                   the
                  ChEMBLChEMBL
                        ChEMBL
                     Compounds
                  ChEMBL            training
                                   training
                              training
                              training   set. set.
                                       from
                                         set.  Cannabis Sativa that were predicted as CB2-selective and are also dissimilar
                                                                Predicted
                                                            Predicted                                                         TaniTani
         to the ChEMBL         training
                             Structure
                          Structure         set.                Predicted
                                                                Predicted
                                                           Predicted
                                                           Predicted                 Similar
                                                                                 Similar     Molecule
                                                                                          Molecule   for for ChEMBL
                                                                                                         ChEMBL                    Tani
                                                                                                                             TaniTani
                                                                                                                             Tani
                              Structure
                             Structure
                         Structure
                        Structure                                 Value
                                                              Value                  Similar
                                                                                     Similar
                                                                                Similar       Molecule
                                                                                             Molecule
                                                                                         Molecule   for   for
                                                                                                         for
                                                                                                        ChEMBL
                                                                                Similar Molecule for ChEMBL   ChEMBL
                                                                                                              ChEMBL             moto
                                                                                                                              moto
                                                             Value
                                                             Value Value
                                                                  Value                                                     moto
                                                                                                                            moto  moto
                                                                                                                                 moto
                         Structure                       Predicted   Value         Similar Molecule for ChEMBL                CoefCoef
                                                                                                                                 Tanimoto
                                                          CB1CB1    CB2CB2                                CB1CB1     CB2CB2  Coef  Coef
                                                                                                                                  Coef
                                                         CB1CB1
                                                         CB1   CB1CB2   CB2
                                                                   CB2CB2              Structure
                                                                                   Structure            CB1
                                                                                                        CB1   CB1 CB2
                                                                                                              CB1   CB2  CB2Coef
                                                                                                                         CB2      ficie
                                                                                                                              ficie
                                                                                                                                Coefficient
                                                           pKipKi
                                                          CB1           pKi
                                                                      CB2
                                                                    pKi                Structure
                                                                                       Structure
                                                                                  Structure
                                                                                  Structure                   pKi
                                                                                                            CB1
                                                                                                          pKi         CB2
                                                                                                                     pKi pKi ficie
                                                                                                                             ficie ficie
                                                                                                                                  ficie
                                                               pKipKi
                                                          pKi pKi
                                                         pKi             pKi
                                                                   pKi pKi            Structure               pKi pKi pKi
                                                                                                        pKi pKi
                                                                                                        pKi              pKi nt nt
                                                          pKi          pKi                                  pKi pKi   pKi nt   nt ntnt

                                                 5.465.46  6.876.87
                                                      5.466.87  6.87                    4.644.64
                                                                                             4.64 N/A
                                                                                                     N/A 0.280.28
                                                                                                   N/AN/A0.28 0.28
      C1 C1
          C1                                     5.465.46
                                                5.46
                                               5.46       6.87 6.87
                                                             6.87                      4.64
                                                                                       4.64 4.64
                                                                                          4.64       N/A
                                                                                                    N/A
                                                                                                  N/A    0.28 0.28
                                                                                                                0.28
     C1C1C1
     C1      (4-hydroxy-6-methoxyspiro[1,2-
         (4-hydroxy-6-methoxyspiro[1,2-
              (4-hydroxy-6-methoxyspiro[1,2-
             (4-hydroxy-6-methoxyspiro[1,2-
        (4-hydroxy-6-methoxyspiro[1,2-
           (4-hydroxy-6-methoxyspiro
        (4-hydroxy-6-methoxyspiro[1,2-                                  CHEMBL1201151
                                                                      CHEMBL1201151
         dihydroindene-3,4’-cyclohexane]-1’-yl)
      dihydroindene-3,4’-cyclohexane]-1’-yl)                             CHEMBL1201151
                                                                        CHEMBL1201151
                                                                        CHEMBL1201151
                                                                     CHEMBL1201151
                                                                     CHEMBL1201151
          dihydroindene-3,4’-cyclohexane]-1’-yl)
      [1,2-dihydroindene-3,4’-cyclohexane]
         dihydroindene-3,4’-cyclohexane]-1’-yl)
     dihydroindene-3,4’-cyclohexane]-1’-yl)
     dihydroindene-3,4’-cyclohexane]-1’-yl)
                           acetate
                        acetate
                   -1’-yl)  acetate
                           acetate
                           acetate
                       acetate
                      acetate

                C2 C2                            5.575.57
                                                 5.57      6.596.59
                                                             6.59
                                                      5.576.59  6.59                                                                              6.956.95 8.03
                                                                                                                                                    6.95         8.03 0.180.18
                                                                                                                                                              8.03
                                                                                                                                                       6.95 8.03  8.030.18   0.18
                                                                                                                                                                           0.18
                    C2
                C2 C2
               C2                               5.57
                                                5.57 5.57 6.59 6.59                                                                              6.95
                                                                                                                                                 6.95 6.95  8.03 8.03 0.18 0.18
              C2
            2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
        2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
         2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,
             2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
            2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
       2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
       2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8-
                  dimethyl-1,2,3,4,5,6,7,8a-
               dimethyl-1,2,3,4,5,6,7,8a-
               8-dimethyl-1,2,3,4,5,6,7,                                CHEMBL256753
                                                                        CHEMBL256753
                                                                      CHEMBL256753
                   dimethyl-1,2,3,4,5,6,7,8a-
                  dimethyl-1,2,3,4,5,6,7,8a-
              dimethyl-1,2,3,4,5,6,7,8a-
             dimethyl-1,2,3,4,5,6,7,8a-                                  CHEMBL256753
                                                                        CHEMBL256753
                                                                     CHEMBL256753
                                                                     CHEMBL256753
         octahydronaphthalen-2-yl]prop-2-enoic
      octahydronaphthalen-2-yl]prop-2-enoic
           8a-octahydronaphthalen-2-yl]
          octahydronaphthalen-2-yl]prop-2-enoic
         octahydronaphthalen-2-yl]prop-2-enoic
     octahydronaphthalen-2-yl]prop-2-enoic
     octahydronaphthalen-2-yl]prop-2-enoic
                         acidacid
                  prop-2-enoic   acid
                              acid
                        acidacid
                        acid

                                                 6.416.41  7.29
                                                      6.417.29
                                                               7.29
                                                             7.29
                                                                7.29                                                                              7.337.33 6.04
                                                                                                                                                    7.33
                                                                                                                                                       7.33 6.04
                                                                                                                                                                 6.04 0.410.41
                                                                                                                                                              6.046.040.41   0.41
                                                                                                                                                                           0.41
                                                6.41 6.41
                                               6.41       7.29 7.29                                                                              7.33 7.33
                                                                                                                                                 7.33       6.04 6.04 0.41 0.41
      C3 C3
     C3
     C3    C3
          C3
       6-carboxy-2-methyl-2-(4-methylpent-
         6-carboxy-2-methyl-2-(4-methylpent-3-
      6-carboxy-2-methyl-2-(4-methylpent-3-
          6-carboxy-2-methyl-2-(4-methylpent-3-
          6-carboxy-2-methyl-2-(4-methylpent-3-
     6-carboxy-2-methyl-2-(4-methylpent-3-
     6-carboxy-2-methyl-2-(4-methylpent-3-
         3-enyl)-7-pentylchromen-5-olate
              enyl)-7-pentylchromen-5-olate
           enyl)-7-pentylchromen-5-olate
               enyl)-7-pentylchromen-5-olate
              enyl)-7-pentylchromen-5-olate
          enyl)-7-pentylchromen-5-olate
          enyl)-7-pentylchromen-5-olate

   2.6.2.6.
2.6.    CB2 CB2  Structure-Based
              Structure-Based    Virtual
                              Virtual    Screening
                                      Screening    Results
                                                Results
  2.6.CB2
  2.6. 2.6.
       2.6.
       CB2
       CB2
            Structure-Based
            CB2
            CB2
             Structure-Based
             Structure-Based
                               Virtual
                 Structure-Based
                 Structure-Based       Screening
                                 Virtual
                                 Virtual
                             Virtual
                             Virtual     Screening
                                         Screening
                                     Screening
                                     Screening
                                                   Results
                                                   Results
                                                   Results
                                               Results
                                               Results
           WeWe   We performed VS using four PDB structures of the CB2 receptor (PDB id: 6KPF, 6KPC, 6PT0,
                  Weperformed
                         performed      VS    using
                                             VS    usingfour     PDB
                                                               four     PDBstructures
                                                                                structures    of the  ofofCB2CB2
                                                                                                            the    CB2receptor
                                                                                                                            receptor    (PDB   (PDB  id: id:6KPF,
                                                                                                                                                            id: 6KPF,
                                                                                                                                                                    6KPF, 6KPC, 6KPC,  6PT0, 6PT0, 6PT0,
       We Weperformed
          We            performed
                  performed
                 performed            VS
                                      VSVSusing
                                             VSusing
                                             usingusing   four
                                                       four
                                                      four    four
                                                                PDB
                                                                PDB  PDB
                                                                       PDB     structures
                                                                                structures
                                                                         structures
                                                                         structures          of the
                                                                                             of      of
                                                                                                   the     the
                                                                                                           CB2the CB2  CB2
                                                                                                                     receptor
                                                                                                                    receptor     receptor
                                                                                                                            receptor  (PDB
                                                                                                                                     (PDB     (PDB id:(PDB
                                                                                                                                                   id:     6KPF,
                                                                                                                                                          6KPF,        id:
                                                                                                                                                                         6KPC,
                                                                                                                                                                         6KPC,6KPF,
                                                                                                                                                                                6KPC, 6PT0,
                                                                                                                                                                                     6PT0,  6KPC,
                                                                                                                                                                                             6PT0,
   andand  5ZTY)
         and
        and
                5ZTY)
                 5ZTY)
                 5ZTY)  and  and
                               two
                             and
                             and
                                    two
                                      two
                                     two
                                             compounds’
                                        compounds’
                                              compounds’
                                             compounds’
                                                                    libraries.
                                                               libraries.
                                                                     libraries.
                                                                    libraries. The  The
                                                                                     The     library
                                                                                       librarylibrary  for   for
                                                                                                              the
                                                                                                              for  thefirst
                                                                                                                     the   first    compound
                                                                                                                              compound
                                                                                                                            first    compound           was  wasderived
                                                                                                                                                              was       derived
                                                                                                                                                                        derived  from from
                                                                                                                                                                                        fromthethe
                                                                                                                                                                                                 the from
  and5ZTY)
and
  and     5ZTY)
         5ZTY)         and
                       and
                      and     two
                                two
                              two      compounds’
                                          compounds’
                                      compounds’             libraries.       TheThe
                                                                     libraries.
                                                             libraries.      The      library
                                                                                         The
                                                                                      librarylibrary for the
                                                                                                     library
                                                                                                     for     for first
                                                                                                             the    the
                                                                                                                    first
                                                                                                                      for  first
                                                                                                                              the   compound
                                                                                                                             compound
                                                                                                                             compound  first         waswas
                                                                                                                                                   compound
                                                                                                                                                     was       derived
                                                                                                                                                              derived   derived from
                                                                                                                                                                               was
                                                                                                                                                                                from   from
                                                                                                                                                                                          the the
                                                                                                                                                                                         derived
                                                                                                                                                                                          the
       curated,
   curated,
         curated,        ChEMBL
                    ChEMBLChEMBL           training
                                     training
                                            training     dataset
                                                     dataset
                                                          dataset       prepared
                                                                   prepared
                                                                         prepared    for  for
                                                                                            our
                                                                                           for    our QSAR
                                                                                                    our     QSAR
                                                                                                             QSAR   modelsmodels
                                                                                                                           models   and   and
                                                                                                                                           andit     itit included
                                                                                                                                                    included
                                                                                                                                                           included           CB2-selective
                                                                                                                                                                         CB2-selective
                                                                                                                                                                               CB2-selective
        curated,
  curated,
  curated,
the   curated,    ChEMBL
                  ChEMBL ChEMBL
                        ChEMBL     trainingtraining
                                    training       dataset
                                                   dataset
                                            training      dataset       prepared
                                                                 prepared
                                                                 prepared
                                                            dataset                for
                                                                            prepared      for
                                                                                    for ourourfor  our
                                                                                                    QSAR
                                                                                                    QSAR    QSAR
                                                                                                          our      models
                                                                                                                   models
                                                                                                                   QSAR   models   andand
                                                                                                                                  and
                                                                                                                                  models    it       it
                                                                                                                                            it included
                                                                                                                                                  included
                                                                                                                                                     and  included
                                                                                                                                                                it6KPF        CB2-selective
                                                                                                                                                                        CB2-selective
                                                                                                                                                                       CB2-selective
                                                                                                                                                                      included           CB2-selective
       ligands
   ligands
         ligands        expanded
                   expanded
                        expanded      with with
                                            with     CB2-non-selective
                                                CB2-non-selective
                                                     CB2-non-selective                 ligands.
                                                                                  ligands.
                                                                                        ligands.    ForFor   thisthis
                                                                                                           For              compound
                                                                                                                       compound
                                                                                                                    this     compound                library,
                                                                                                                                              library, library,         6KPF
                                                                                                                                                                        6KPF  andandand5ZTY  5ZTY
                                                                                                                                                                                             5ZTY
        ligands
  ligands
  ligands               expanded
                 expanded
                 expanded           with
                                    with    with     CB2-non-selective
                                               CB2-non-selective
                                               CB2-non-selective                       ligands.
                                                                                ligands.
                                                                                ligands.          For
                                                                                                 For      For
                                                                                                            this
                                                                                                           this    this     compound
                                                                                                                     compound
                                                                                                                     compound                library,
                                                                                                                                            library, library,  6KPF
                                                                                                                                                               6KPF     6KPF and
                                                                                                                                                                             and   and5ZTY
                                                                                                                                                                                     5ZTY    5ZTY
ligands
   structures  expanded
       structures           achieved
                       achieved       with
                                        the  the CB2-non-selective
                                                   highest
                                               highest      area area underunder the    ligands.
                                                                                       the    receiver
                                                                                         receiver              For       this
                                                                                                                 operating
                                                                                                           operating               compound
                                                                                                                                     characteristic
                                                                                                                                characteristic                  library,
                                                                                                                                                             curve  curve  (ROC   6KPF
                                                                                                                                                                                (ROC  AUC)  AUC)and 5ZTY
         structures
        structures
  structures
  structures                achieved
                            achieved
                      achieved
                     achieved          the the
                                      the     the
                                             highest
                                             highesthighest
                                                   highest area   area
                                                           areaarea under
                                                                    under   under
                                                                           underthe the
                                                                                the     the    receiver
                                                                                               receiver
                                                                                        receiver
                                                                                       receiver                   operating
                                                                                                                 operating
                                                                                                          operating
                                                                                                          operating                   characteristic
                                                                                                                                     characteristic
                                                                                                                              characteristic
                                                                                                                              characteristic                curve
                                                                                                                                                            curve    curve
                                                                                                                                                                    curve(ROC
                                                                                                                                                                         (ROC    (ROC
                                                                                                                                                                                (ROC AUC)
                                                                                                                                                                                     AUC)    AUC)
                                                                                                                                                                                            AUC)
structures
       value
   value valuein the achieved
                    in  the
                          discrimination
                     in discrimination
                         the             the
                               discrimination
                               discrimination   highest  between
                                                    between     area
                                                          between         under
                                                                          CB2-selective
                                                                     CB2-selective
                                                                           CB2-selectivethe      receiver
                                                                                              ligandsligands
                                                                                                      ligands  versusoperating
                                                                                                                     versus
                                                                                                                      versus                 characteristic
                                                                                                                                   CB2-non-selective
                                                                                                                             CB2-non-selective
                                                                                                                                   CB2-non-selective                 ones  onescurve(see
                                                                                                                                                                               (see(see
                                                                                                                                                                            ones       Table(ROC
                                                                                                                                                                                             Table AUC)
                                                                                                                                                                                             Table
  value
  value value
            in the
            in      in
                  the   the    discrimination
                        discrimination                   between
                                                   between
                                                  between                 CB2-selective
                                                                   CB2-selective
                                                                   CB2-selective             ligands
                                                                                            ligands  ligands  versus
                                                                                                              versus versus        CB2-non-selective
                                                                                                                            CB2-non-selective
                                                                                                                           CB2-non-selective                       onesones
                                                                                                                                                                   ones       (see(see
                                                                                                                                                                             (see     Table
                                                                                                                                                                                     Table   Table
value  2).
   2). 2).
        The
         2). The
          inThe the
              The     antagonist-bound,
                       discrimination
                 antagonist-bound,
                       antagonist-bound,          5ZTY5ZTY
                                                     between
                                                        5ZTY     structure
                                                            structure
                                                                   structure       slightly
                                                                       CB2-selective
                                                                              slightly
                                                                                    slightly       outperformed
                                                                                                     ligands versus
                                                                                             outperformed
                                                                                                    outperformed           6KPF 6KPF
                                                                                                                                 6KPF       (agonist-bound),
                                                                                                                                      CB2-non-selective
                                                                                                                                      (agonist-bound),
                                                                                                                                              (agonist-bound),                   as
                                                                                                                                                                                 ones observed
                                                                                                                                                                            as observed     (see Table 2).
                                                                                                                                                                                  as observed
                                                                                                                                                                                      observed
  2). The
  2).  The            antagonist-bound,
               antagonist-bound,                 5ZTY  5ZTY       structure
                                                           structure               slightly
                                                                            slightly               outperformed
                                                                                           outperformed                  6KPF    6KPF       (agonist-bound),
                                                                                                                                     (agonist-bound),                            as
                                                                                                                                                                          as observed
                                                                                                                                                                               observed
The    inin  aaantagonist-bound,
                  detailed
   in antagonist-bound,
        a   detailed
                   detailed      analysis
                             analysis
                                  analysis   of
                                             5ZTY
                                                 5ZTY
                                                  of
                                                  ROC
                                                  of  ROC  structure
                                                                 curves
                                                            curves
                                                         structure
                                                        ROC       curves
                                                                            slightly
                                                                               obtained
                                                                          obtained
                                                                            slightly
                                                                                obtained
                                                                                           outperformed
                                                                                            in    in
                                                                                                  our
                                                                                             outperformed
                                                                                                   in    ourenrichment
                                                                                                          our
                                                                                                                         6KPF
                                                                                                                 enrichment
                                                                                                                   enrichment 6KPF
                                                                                                                                    (agonist-bound),
                                                                                                                                    study study
                                                                                                                                           study         (Figure
                                                                                                                                                   (Figure
                                                                                                                                            (agonist-bound),
                                                                                                                                                          (Figure    6).
                                                                                                                                                                          as
                                                                                                                                                                           6).
                                                                                                                                                                            The
                                                                                                                                                                            6).  The     second
                                                                                                                                                                                    second
                                                                                                                                                                                    as
                                                                                                                                                                                  The     observed
                                                                                                                                                                                           second       in a
  in aaindetailed
  in         a detailed
           detailed               analysis
                            analysis
                           analysis         of ROC
                                            of    of ROC
                                                 ROC       curves
                                                          curves curves        obtained
                                                                         obtained
                                                                        obtained          in our
                                                                                          in      in our
                                                                                                 our              enrichment
                                                                                                          enrichment
                                                                                                          enrichment               study
                                                                                                                                  study   study  (Figure
                                                                                                                                                (Figure  (Figure   6). The
                                                                                                                                                                   6).     6). The
                                                                                                                                                                          The      second
                                                                                                                                                                                   second second
       compound’s
   compound’s
         compound’s            library
                          library
                                library used used
                                             usedin  in
                                                     our
                                                      in  our
                                                           our   structure-based
                                                            structure-based
                                                                  structure-based        VS    VS     included
                                                                                                included
                                                                                                VS      included   the   the
                                                                                                                          same
                                                                                                                          the   same
                                                                                                                                 same      CB2-selective
                                                                                                                                     CB2-selective
                                                                                                                                            CB2-selective                 ligands
                                                                                                                                                                    ligandsligands      derived
                                                                                                                                                                                   derived
                                                                                                                                                                                         derived
detailed
  compound’s
  compound’s    analysis
        compound’s       library
                         library of
                               libraryROC
                                      usedused
                                      used       curves
                                               in our
                                               in    in our
                                                    our        obtained
                                                                 structure-based
                                                          structure-based
                                                          structure-based        in   our      enrichment
                                                                                               VS included
                                                                                        VS included
                                                                                       VS     included            the same
                                                                                                                 the      study
                                                                                                                         the same
                                                                                                                         same           (Figure
                                                                                                                                            CB2-selective
                                                                                                                                    CB2-selective
                                                                                                                                    CB2-selective          6).     The
                                                                                                                                                                  ligands
                                                                                                                                                                  ligands    second
                                                                                                                                                                          ligands derived
                                                                                                                                                                                 derived    compound’s
                                                                                                                                                                                         derived
   fromfrom thethe
         from       the  ChEMBL
                    ChEMBLChEMBL          training
                                     training
                                           training      dataset,
                                                    dataset,
                                                         dataset, named named
                                                                         named  here herewith
                                                                                      here     with
                                                                                                withthethe  term
                                                                                                           the   term
                                                                                                                  term      “actives”,
                                                                                                                       “actives”,
                                                                                                                             “actives”,    andand and      general
                                                                                                                                                     generalgeneral        diverse
                                                                                                                                                                      diverse
                                                                                                                                                                            diverse       decoys
                                                                                                                                                                                     decoysdecoys
  from
library
  from  from
           the
             used
           the     the
                   ChEMBL
                  ChEMBL ChEMBL
                         in    our        training
                                   training
                                   training              dataset,
                                                  dataset,
                                        structure-based
                                                  dataset,       named
                                                                 named  named
                                                                         VS   here   here
                                                                               included
                                                                              here      withwith
                                                                                       with       the
                                                                                                  the the the
                                                                                                          term
                                                                                                          term   term
                                                                                                               same         “actives”,
                                                                                                                     “actives”,
                                                                                                                            CB2-selective
                                                                                                                    “actives”,           andand
                                                                                                                                         and       general
                                                                                                                                                  general  general
                                                                                                                                                              ligandsdiverse
                                                                                                                                                                    diverse diversedecoys
                                                                                                                                                                                 derived
                                                                                                                                                                                   decoys decoys  from the
   thatthatwere
         that   were
                 were     generated
                     generated
                           generated    with with
                                                DUD-E
                                             with    DUD-E
                                                     DUD-E    [34],[34],
                                                                       named
                                                                    [34],   named
                                                                            named   here hereas
                                                                                          here     as    “non-actives”.
                                                                                                  “non-actives”.
                                                                                                    as    “non-actives”.        In   In
                                                                                                                                     VSIn  VS     against
                                                                                                                                            against
                                                                                                                                            VS      against  this  this
                                                                                                                                                                      second
                                                                                                                                                                     this   second
                                                                                                                                                                             second      library,
                                                                                                                                                                                    library,
                                                                                                                                                                                          library,
  thatthat
ChEMBL
  that   were
         were   were      generated
                    generated
                   training
                    generated         withwith
                                   dataset,
                                      with     DUD-E
                                               DUD-E DUD-E
                                                    named   [34],  [34],
                                                            [34],here named
                                                                     named  named
                                                                            with  here
                                                                                  here   here
                                                                                       the asterm
                                                                                           as      as “non-actives”.
                                                                                                 “non-actives”.
                                                                                                            “actives”,
                                                                                                “non-actives”.                Inand
                                                                                                                              In   VS
                                                                                                                                   VS Inagainst
                                                                                                                                           VS
                                                                                                                                          against against
                                                                                                                                             general       this     this
                                                                                                                                                           thisdiverse
                                                                                                                                                                     second
                                                                                                                                                                    second  second       library,
                                                                                                                                                                                  library,
                                                                                                                                                                                  decoys
                                                                                                                                                                                  library,       that  were
   6KPF6KPF
         6KPFandand    5ZTY
                     and   5ZTY
                            5ZTY      structures
                                 structures
                                        structures      performed
                                                    performed
                                                         performed           similarly,
                                                                        similarly,
                                                                              similarly,        discriminating
                                                                                          discriminating
                                                                                                 discriminating                CB2-actives
                                                                                                                          CB2-actives
                                                                                                                                CB2-actives        from   from
                                                                                                                                                           from       non-actives
                                                                                                                                                               non-actives
                                                                                                                                                                       non-actives          at
                                                                                                                                                                                       at theat the
                                                                                                                                                                                                 the
  6KPF
  6KPF  6KPFand
            and     and
                     5ZTY
                     5ZTY  5ZTY        structures
                               structures
                               structures                performed
                                                  performed
                                                  performed                  similarly,
                                                                       similarly,
                                                                       similarly,               discriminating
                                                                                         discriminating
                                                                                         discriminating                        CB2-actives
                                                                                                                        CB2-actives
                                                                                                                        CB2-actives              from
                                                                                                                                                 from     from        non-actives
                                                                                                                                                             non-actives
                                                                                                                                                             non-actives             at
                                                                                                                                                                                     at   the
                                                                                                                                                                                          theat the
generated
       level      ofwith
                       0.8
                        andandDUD-E 0.79,     [34], named
                                             respectively             here
                                                               (see(see        as
                                                                           Table 2, “non-actives”.
                                                                                       2,                              In 6KPF
                                                                                                                             VS6KPF against             this     second library,                 6KPF and
   level
  levellevel
  level
            of 0.8
         level
           of 0.8
           of   0.8ofand
                  of    0.8
                       0.8
                      and
                               0.79,
                             and
                             and
                             0.79,
                                         respectively
                                     0.79,
                              0.79,0.79,      respectively
                                              respectively
                                       respectively
                                       respectively          (see(see
                                                            (see
                                                                       Table
                                                                     (see
                                                                      Table
                                                                     Table  Table
                                                                            Table
                                                                                2, ROC
                                                                               2,   ROC2,ROC
                                                                                     ROC
                                                                                       2,  ROC
                                                                                           ROC  AUC
                                                                                              AUC
                                                                                              AUC
                                                                                                       AUC
                                                                                                        AUC
                                                                                                       AUC
                                                                                                                  values).
                                                                                                             values).
                                                                                                                   values).
                                                                                                                  values).
                                                                                                           values).
                                                                                                          values).         6KPF
                                                                                                                           6KPF    6KPF
                                                                                                                                  6KPF slightly
                                                                                                                                      slightly
                                                                                                                                               slightly
                                                                                                                                         slightly
                                                                                                                                                slightly
                                                                                                                                               slightly
                                                                                                                                                                outperformed
                                                                                                                                                           outperformed
                                                                                                                                                                  outperformed
                                                                                                                                                                outperformed
                                                                                                                                                         outperformed
                                                                                                                                                         outperformed
                                                                                                                                                                                       5ZTY
                                                                                                                                                                                      5ZTY
                                                                                                                                                                                     5ZTY
                                                                                                                                                                                             5ZTY
                                                                                                                                                                                              5ZTY
                                                                                                                                                                                             5ZTY
5ZTY       structures
       according
   according
         according    to   to
                          the   performed
                               the
                                ROC
                            toROC     ROC
                                the ROCROC AUC AUC
                                                AUC
                                                      similarly,
                                                         characteristics
                                                    characteristics
                                                          characteristics
                                                                           discriminating
                                                                              and  and6KPC
                                                                                    and    6KPC
                                                                                            6KPC  performed  CB2-actives
                                                                                                         performed
                                                                                                          performed      the   the
                                                                                                                                worst,  from
                                                                                                                                      worst,
                                                                                                                                the worst,
                                                                                                                                       worst, however, non-actives
                                                                                                                                                     however,
                                                                                                                                                      however,      the   the
                                                                                                                                                                            valuesat
                                                                                                                                                                                 values
                                                                                                                                                                           the values
                                                                                                                                                                                 values
                                                                                                                                                                                       the
                                                                                                                                                                                        were   level
                                                                                                                                                                                              were
                                                                                                                                                                                              were
                                                                                                                                                                                                      of 0.8
        according
  according
  according          to the
                     to    to
                         the   the
                               ROC        AUC
                                         AUC   AUC       characteristics
                                                   characteristics
                                                  characteristics            andand
                                                                            and      6KPC
                                                                                    6KPC   6KPC          performed
                                                                                                 performed
                                                                                                 performed                     the
                                                                                                                        the worst,
                                                                                                                        the    worst,       however,
                                                                                                                                            however, however,             the
                                                                                                                                                                   the values
                                                                                                                                                                  the     values       were
                                                                                                                                                                                      were    were
andveryvery
       0.79,close
         very    closewith
                  close    with
                 respectively  no
                            with    no    explicit
                                           (see
                                    explicit
                                     noexplicit
                                          explicit    difference
                                                   Table
                                                  difference   2,
                                                        difference ROC    between
                                                                     between  AUC
                                                                           between  activeactive
                                                                                          values).
                                                                                           activeand    and     inactive
                                                                                                              6KPF
                                                                                                          inactive
                                                                                                         and     inactive  CB2  CB2
                                                                                                                           slightly      structures.
                                                                                                                                             outperformed
                                                                                                                                   structures.
                                                                                                                                 CB2       structures.      Our   Our
                                                                                                                                                                   Our     results
                                                                                                                                                                      results   5ZTY
                                                                                                                                                                            results     showed
                                                                                                                                                                                   showed   according
                                                                                                                                                                                         showed           to
  veryvery
  very    close
          close  close
                     withwith
                     with    no     no
                             no explicit
                                   explicit            difference
                                                 difference
                                                difference         between
                                                                   betweenbetween  active
                                                                                   active active
                                                                                               andand
                                                                                               and              inactive
                                                                                                         inactive
                                                                                                         inactive        CB2CB2
                                                                                                                         CB2              structures.
                                                                                                                                  structures.
                                                                                                                                  structures.             OurOur
                                                                                                                                                          Our       results
                                                                                                                                                                    resultsresults
                                                                                                                                                                                 showed
                                                                                                                                                                                 showed showed
thethatthat
      ROC  even even
                  AUC     slight
                     slight         differences
                               differences
                             characteristics            between
                                                   between andthe  the  the
                                                                     6KPC      crystal
                                                                          crystal           structures
                                                                                       structures
                                                                                 performed                   of
                                                                                                            the   of
                                                                                                                 the   the
                                                                                                                         same
                                                                                                                    worst,    same       receptor
                                                                                                                                   receptor
                                                                                                                                   however,            have  have
                                                                                                                                                            the   an    an
                                                                                                                                                                        impactimpact  on   on
                                                                                                                                                                                            the the
  that
  that   that
        that
         eveneven
         even    even
                    slight
                    slight slight
                          slight     differences
                                    differences
                              differences
                              differences         between
                                                  betweenbetween
                                                        between   the     the crystal
                                                                         the
                                                                         crystal
                                                                        crystal crystal      structures
                                                                                            structures
                                                                                     structures
                                                                                     structures            of
                                                                                                          of    the
                                                                                                               the of
                                                                                                                  of    the same
                                                                                                                       the
                                                                                                                       same
                                                                                                                       same    same
                                                                                                                                  receptor
                                                                                                                                  receptorreceptor
                                                                                                                                          receptor   have
                                                                                                                                                     have       anvalues
                                                                                                                                                              have
                                                                                                                                                             have
                                                                                                                                                               an        an impact
                                                                                                                                                                        an
                                                                                                                                                                       impact
                                                                                                                                                                       impact  impactwere
                                                                                                                                                                                     on
                                                                                                                                                                                    on      onvery
                                                                                                                                                                                           on
                                                                                                                                                                                          the
                                                                                                                                                                                          the    the close
                                                                                                                                                                                                the
   VSVS
with    no
         VS
        VS
              results.
         results.
               explicit
               results.
               results.
                             A
                        A similar
                             AA  similar
                               difference
                                  similar
                                 similar
                                               conclusion
                                          conclusionbetween
                                                conclusion
                                               conclusion     waswaswas
                                                                   was
                                                                           derived
                                                                      derived
                                                                      activederived
                                                                           derived   from
                                                                                   and    from
                                                                                           from
                                                                                          from
                                                                                                       aa recent
                                                                                                a recent
                                                                                            inactive   a   recent
                                                                                                          recent  study
                                                                                                                CB2     study on on
                                                                                                                           structures.
                                                                                                                         study
                                                                                                                        study        on
                                                                                                                                    on
                                                                                                                                           glucagon
                                                                                                                                     glucagon
                                                                                                                                            glucagon
                                                                                                                                           glucagon  Our        receptors,
                                                                                                                                                           receptors,
                                                                                                                                                                results
                                                                                                                                                                  receptors,
                                                                                                                                                                receptors,      wherewhere
                                                                                                                                                                                 showed
                                                                                                                                                                                      where
                                                                                                                                                                                     where   an an
                                                                                                                                                                                                 that
                                                                                                                                                                                                  an even
  VS   results.
  VS results.
       ensemble
                       A   similar
                      A similar          conclusion
                                         conclusion         was      derived
                                                            was derived             from      a
                                                                                   from a recent  recent         study
                                                                                                                study        on    glucagon
                                                                                                                             onsimulations
                                                                                                                                   glucagon receptors,   receptors,            where       an an
                                                                                                                                                                                           an
                                                                                                                                                                               where crystal
   ensemble
         ensemble
        ensemble       of ofof
                                 receptor
                            receptor
                            of    receptor
                                 receptor
                                                  conformations
                                             conformations
                                                   conformations
                                                  conformations
                                                                             generated
                                                                         generated
                                                                              generated
                                                                              generated      in in  short
                                                                                                     in
                                                                                                    in
                                                                                                          short
                                                                                                           short
                                                                                                          short  MDMD   MD
                                                                                                                       MD  simulations
                                                                                                                                  simulations
                                                                                                                                 simulations
                                                                                                                                                             outperformed
                                                                                                                                                       outperformed
                                                                                                                                                              outperformed
                                                                                                                                                             outperformed            crystal
                                                                                                                                                                                           crystal
                                                                                                                                                                                           crystal
slight
  ensemble
  ensembledifferences of         between
                           receptor
                     of receptor                   the
                                            conformations crystal        structures
                                                                        generated          in of     the
                                                                                                  short       same
                                            conformations generated in short MD simulations outperformed crystalMD        receptor
                                                                                                                          simulations         have         an
                                                                                                                                                     outperformed  impact         on     the
                                                                                                                                                                                    crystal     VS   results.
       structures
   structures
         structures   in VSin
                            in VS     [35,36].
                                [35,36].
                                VS     [35,36].
        structures
  structures
Astructures          in
    similar conclusion   VSin  VS     [35,36].
                               [35,36].
                     in VS [35,36].was derived from a recent study on glucagon receptors, where an ensemble
of receptor conformations generated in short MD simulations outperformed crystal structures in
VS [35,36].
Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
Int. J. Mol. Sci. 2020, 21, 5308                                                                                      7 of 14

Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW                                                                         7 of 14
                          Table 2. Results of the enrichment study for CB2-receptor structures.
                          Table 2. Results of the enrichment study for CB2-receptor structures.
                                   CB2-Non-Selective Decoys                          General Decoys
          Metric
         Metric         6KPF
                                   CB2-Non-selective
                                   6KPC        6PT0
                                                         Decoys
                                                           5ZTY      6KPF           6KPC
                                                                                        General  Decoys 5ZTY
                                                                                              6PT0
                              6KPF       6KPC        6PT0       5ZTY          6KPF      6KPC       6PT0      5ZTY
          EF 2%            0         0            0         1.7       3.3           0           0        3.3
        EF  2%
         EF 5%           0.67 0      0      0     0    0    1.3 1.7 4.7        3.3 1.3     0 1.3      0 3.3 3.3
        EF10%
        EF  5%           1.7 0.67    0      0   0.33 0       1   1.3 4.0       4.7 2      1.3 1.7    1.3 3     3.3
         ROC
        EF 10%                 1.7         0          0.33        1            4.0 0.73    2 0.77 1.7 0.79 3
                          0.6       0.53        0.53        0.6       0.8
         AUC
       ROC  AUC                0.6        0.53        0.53       0.6           0.8       0.73       0.77      0.79

                                   A

                                   B
      Figure 6.6. The
                   Thereceiver operating
                        receiver         characteristic
                                 operating              (ROC)
                                            characteristic    curves
                                                           (ROC)     obtained
                                                                 curves       in theinenrichment
                                                                         obtained                 study study
                                                                                       the enrichment   using
      libraries including  (A) CB2-selective and CB2-non-selective ligands and (B) CB2-actives  and  CB2-non-
      using libraries including (A) CB2-selective and CB2-non-selective ligands and (B) CB2-actives and
      active ligands. ligands.
      CB2-non-active

3. Discussion
3. Discussion
     The
     The main
            main goal    of this
                   goal of  this study
                                 study was
                                        was to to predict
                                                  predict CB2-selective
                                                           CB2-selective compounds
                                                                             compounds among among thethe phytochemicals
                                                                                                           phytochemicals
that               Cannabis   Sativa.
that constitute Cannabis Sativa. We achieved this goal through the following steps: collecting aa large
     constitute                        We   achieved    this  goal through     the  following    steps:  collecting    large
set
set of experimental data from ChEMBL for the training set, data curation, model development and
    of  experimental     data  from   ChEMBL       for the  training   set, data   curation,   model    development      and
validation,
validation, andandpredicting
                     predictingthe   selectivity
                                   the selectivityof phytochemicals
                                                      of phytochemicals  fromfromCannabis   Sativa.Sativa.
                                                                                       Cannabis      In parallel, we tested
                                                                                                            In parallel,  we
the applicability
tested               of recently
         the applicability      of released
                                    recently PDB     structures
                                                released    PDB of   CB2 in structure-based
                                                                   structures                       VS by conducting
                                                                                  of CB2 in structure-based          VS an by
enrichment      study.  The   dataset  collected    by  our   group   was   curated,    which   resulted
conducting an enrichment study. The dataset collected by our group was curated, which resulted in           in the removal
of
themore
    removalthanof90%
                   moreof than
                           unreliable
                                 90% ofand    inconsistent
                                          unreliable           data. Despite
                                                        and inconsistent         theDespite
                                                                              data.   removalthe of removal
                                                                                                    this largeoffraction   of
                                                                                                                  this large
data,  theof
fraction    final dataset
               data,        included
                     the final  dataset1958   records1958
                                         included       for CB1   andfor
                                                             records    2616  records
                                                                            CB1          for CB2.
                                                                                  and 2616          Notably,
                                                                                              records          such
                                                                                                        for CB2.     a large
                                                                                                                   Notably,
dataset   is sufficient  for employing     the  machine    learning    approach.     Much    smaller
such a large dataset is sufficient for employing the machine learning approach. Much smaller datasets  datasets  have  been
successfully    used  to  develop   QSAR    models    for CB1/CB2     studies   [23,24].  Nevertheless,
have been successfully used to develop QSAR models for CB1/CB2 studies [23,24]. Nevertheless,               focusing  on  the
dataset
focusingdiversity
             on the in   developing
                      dataset           our in
                                 diversity   QSAR     model also
                                                 developing      ourcreated
                                                                      QSAR amodelchallengealsofor predicting
                                                                                                created         the activity
                                                                                                           a challenge    for
of outlier    compounds.      We    solved   this  problem     by determining       an   applicability
predicting the activity of outlier compounds. We solved this problem by determining an applicability     domain    for each
compound
domain forindividually.
                each compound Embedding       the descriptor
                                    individually.    Embedding  space
                                                                    theofdescriptor
                                                                          base GB modelsspace made
                                                                                               of baseit GB
                                                                                                         possible  to assess
                                                                                                              models  made
the prediction     confidence     based  on   Euclidean    distances    of  k-nearest    neighbors.
it possible to assess the prediction confidence based on Euclidean distances of k-nearest neighbors.   The   GB  algorithm
The GB algorithm was also successfully applied by Ancuceanu et al. [37] for cytotoxicity prediction.
Int. J. Mol. Sci. 2020, 21, 5308                                                                       8 of 14

was also successfully applied by Ancuceanu et al. [37] for cytotoxicity prediction. The performance
of GB in the modeling of various QSARs was compared to other ensemble methods in a study by
Kwon et al. [38].
      Our approach was validated with the five-fold cross-validation protocol with resulting statistical
characteristics of Q2 > 0.6 for the entire dataset and Q2 > 0.8 for the most confident predictions.
Regarding small subsets with the highest confidence predictions, our model outperformed much more
complex 3D and 4D QSAR models created for equally small sets of similar compounds [23,24]. We used
our validated QSAR models for the prediction of the CB1 and CB2 affinities of phytochemicals from
Cannabis Sativa. As a result, we identified three compounds C1–C3 (see Table 1) dissimilar to the training
dataset with a TC < 0.3 compared to the respective most similar structure, and predicted pKi difference
CB2 vs. CB1 (dpKi) ≥ 1. The common name of C1 is cannabispirol, and it is a compound isolated
from Japanese domestic Cannabis Sativa [39]. C1 has been experimentally evaluated for antimicrobial
activity [40] and was also tested in a study on targeting multidrug resistant mouse lymphoma cells [41].
Ilicic acid (C2) showed activity in G2/M cell cycle arrest of tumor cells [42]. Cannabichromente (C3) is
a precursor for biosynthesis of various cannabinoids [43].
      As for structure-based VS using PDB structures of CB2, we observed that these structures were able
to discriminate not only CB2-actives from non-active ligands (DUD-E decoys) but also CB2-selective
from non-selective ligands (the ChEMBL-derived dataset). The best results were obtained for 5ZTY
and 6KPF structures, regardless of the activation state.
      In this study, we identified potential selective cannabinoids among the constituents of Cannabis sativa.
Our QSAR model identified three compounds of potential interest with significant dissimilarity to
compounds already evaluated experimentally and reported in ChEMBL. The statistical characteristics
of the developed QSAR models suggest a high probability of successful experimental validation,
which we hope will attract attention from experimental groups interested in searching for CB2-selective
ligands of natural origin.

4. Materials and Methods

4.1. Data Collection
     Chemical structures from the ChEMBL database [44] with experimentally measured affinities for
the CB1 and CB2 receptors were used as training data. The datasets were composed of 14,126 records
for CB1 and 13,506 records for CB2. Compounds constituting the phytochemical profile of Cannabis
Sativa were acquired from the Collective Molecular Activities of Useful Plants Database (CMAUP) [45].
The ChEMBL data was exported and downloaded in comma-separated values (CSV) format, while the
CMAUP data was downloaded as an Structure Data Format (SDF) file.

4.2. Data Curation
     The method of data curation followed a modified approach of Fourches et al. [46,47]. We modified
the curation method by the addition of extra steps that were relevant for curating data acquired
from ChEMBL accompanied by additional metadata specific to this database. Namely, we grouped
experimental values provided their source studies were the same, and then removed groups with less
than 10 records. We also assessed data reliability according to the source study confidence score as
provided by ChEMBL. We then removed records with a confidence score lower than 8. A final dataset
consisted of records with InChIKeys as molecular structure identifiers with associated standardized
2D structures of compounds and pKi values. RDKit was used for standardization and InChIKeys
calculation [48]. The curated CB2 and CB1 datasets are available on the website [49].

4.3. Descriptor Calculation
    The Mordred python library [50] was used to calculate 1613 2D standard descriptors and 213
3D descriptors. 3D descriptors were used to include information about chirality in ligand structures.
Int. J. Mol. Sci. 2020, 21, 5308                                                                    9 of 14

Descriptors with variance less than 0.05 or containing invalid values were removed. The descriptors
were concatenated with 1024-bit Morgan fingerprints with a radius equal to 3.

4.4. Machine Learning
     A gradient boosting (GB) algorithm implemented in the Light Gradient Boosting Machines
library [51] was used to create base models. The GB algorithm and its parameters were selected in
an inner 5-fold cross-validation protocol based on grid search. Other algorithms including linear
regression, partial least squares, and support vector machines from the scikit-learn library [52] were
also tested. GB models for CB1 and CB2 were decision tree ensembles. The output of each tree in a
boosted ensemble was used to create embedding of a descriptor space. A kNN algorithm with k = 3 was
used to determine the applicability domain of the model based on the embedding. The applicability
domain was defined by a threshold, which is a maximum allowed distance between a query molecular
structure and nearest neighbors. Molecular structures with the Euclidean distance to nearest neighbors
greater than a given threshold were considered as out of the applicability domain. Python code for the
execution of trained models is provided in the Supplementary Materials.

4.5. Model Validation
    Models were validated using a 5-fold cross-validation protocol. Out-of-sample predictions were
used to calculate Q2 according to the following equation:

                                                     P               2
                                                       n   Yi − Ŷi
                                          Q2 = 1 − P                 2                               (1)
                                                       n   Yi − Yi

       Here, n is the number of samples in the dataset, Yi is an experimental pKi of the ith sample,
Ŷi is a predicted value of pKi of the ith sample, and Yi is a mean pKi value. We evaluated Q2 for
20 different applicability domain thresholds and considered Q2 as a confidence measure for a model
using a given threshold.

4.6. Prediction of CB2-Selectivity of Cannabis Sativa Ingredients
     A curated dataset of Cannabis Sativa constituents reported in the CMAUP database [45] was tested
against our QSAR CB1 and CB2 models. In such a way, we searched for selective molecules with
dpKi ≥ 1, where dpKi was defined as dpKi = pKiCB2 − pKiCB1 . For each predicted selective compound,
a Tanimoto coefficient (TC) was calculated to assess the similarity of predicted compounds to ones
already tested and used to exclude hits that were not novel with regard to training data. TC was
computed using 1024-bit Morgan fingerprints with a radius equal to 3.

4.7. Structure-Based Virtual Screening
      CB2 receptor structures (PDB id: 6KPC [30], 6KPF [30], 6PT0 [31], and 5ZTY [32]) were derived
from the Protein Data Bank (PDB) [53]. This set included agonist-bound (6KPC, 6KPF, 6PT0) and
antagonist-bound (5ZTY) structures corresponding to different activation states of the CB2 receptor
(active and inactive, respectively). The CB2 structures used in this study were very similar to each other
(average pairwise RMSD = 6.41, with standard deviation = 3.40), yet differing in TMH6 conformation
(5ZTY and 6KPC vs. others) bending while interacting with a G protein complex. In all these
structures ligands were located in the orthosteric binding site of CB2, although allosteric modulation
has also been observed for these receptors [54]. The four PDB structures have nearly identical binding
sites except for a few residues: W258 (W6.48), S285, and to a lesser extent: F87 (F2.57), F91 (F2.61),
H95 (H2.65), F183 (EC2) (see Figure 7). W258 and F117 (F3.36) residues form a well-known Trp-Phe
toggle switch changing its position on the receptor activation [17,30]. Interestingly, the position of
Int. J. Mol. Sci. 2020, 21, 5308                                                                                   10 of 14

W258 is slightly different not only in comparing the active vs. inactive conformations but also in three
active conformations, e.g., moved away from the ligand (6PT0 vs. two others).
Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW                                                                     10 of 14

      Figure
      Figure 7.
              7. Comparison
                 Comparison of of CB2
                                  CB2 structures
                                       structures available
                                                    available in
                                                              in the
                                                                 the Protein
                                                                      Protein Data
                                                                              Data Bank
                                                                                    Bank (PDB).
                                                                                          (PDB). Here,
                                                                                                  Here, an
                                                                                                        an inactive
                                                                                                           inactive
      CB2
      CB2 conformation
           conformation (5ZTY)
                           (5ZTY) is
                                   is shown
                                      shown in in green
                                                   green and
                                                          and is
                                                               is the
                                                                  the best
                                                                       best performing
                                                                            performing in
                                                                                        in virtual
                                                                                            virtual screening
                                                                                                    screening (VS),
                                                                                                              (VS),
      active conformation    (6KPF) is shown    in yellow,  and   other CB2  active structures  are shown
      active conformation (6KPF) is shown in yellow, and other CB2 active structures are shown in grey.    in grey.
      Residues
      Residues are
                 are labeled
                     labeled according  to the
                             according to  the 5ZTY
                                               5ZTY numbering.
                                                      numbering.

     Receptor
     Receptor structures
                structures were processed
                                     processed withwith the
                                                          the Protein
                                                              Protein Preparation
                                                                         Preparation tooltool 2017-4,
                                                                                               2017-4, Schrodinger
                                                                                                        Schrodinger LLC,
                                                                                                                      LLC,
New York, NY,NY, USA
                  USA [55]. Ligands in PDB structures were used to determine the position of docking
grids spanning over the whole CB2 active site. The         The prepared
                                                                prepared CB2 CB2 structures
                                                                                  structures were used for screening
against a CB2 actives
                actives library
                         library that
                                   that included
                                         included CB2-selective
                                                     CB2-selective and and CB2-non-selective
                                                                             CB2-non-selective compounds to assess
the ability of these CB2 structures to discriminate between    between selective
                                                                           selective and
                                                                                       and CB2-non-selective
                                                                                             CB2-non-selective ligands.
                                                                                                                  ligands.
This compound
     compound library
                   librarywas
                            wasextracted
                                   extractedfromfrom   our  ChEMBL        training   set  (see  Materials  and
                                                     our ChEMBL training set (see Materials and Methods) and     Methods)
and  included
included        compounds
          compounds      with pKiwith     7 for CB2-selective,
                                                                     CB2 CB2-selective,    while otherswhile
                                                                                                          wereothers  were
                                                                                                               considered
considered   CB2-non-selective.
CB2-non-selective.                    The second
                       The second round        of VS wasround    of VS was
                                                             performed           performed
                                                                            to test to what extentto test to what extent
                                                                                                        experimental   CB2
experimental    CB2   structures    were   able  to  discriminate     between    CB2-actives
structures were able to discriminate between CB2-actives and non-actives. Here, we used the        and non-actives.  Here,
we  used thelibrary
compounds     compounds       library
                       that was         that was
                                  generated    usinggenerated
                                                       DUD-E [34] using   DUD-E
                                                                       with         [34] with ChEMBL-derived
                                                                             ChEMBL-derived                           CB2-
                                                                                                     CB2-selective ligands
selective ligands
(see above)  used (see
                     hereabove)
                           as CB2used      hereHere,
                                     actives.    as CB2 weactives.
                                                             discardHere,     we discard CB2-non-selective
                                                                        CB2-non-selective        ligands and we ligands
                                                                                                                  did not
and  we them
include   did not    include
                in the         them in
                         compound           the compound
                                         library.               library. VSwith
                                                    VS was performed            wasGlide
                                                                                       performed
                                                                                             2017-4,with    Glide 2017-4,
                                                                                                        Schrodinger   LLC,
Schrodinger
New York, NY, LLC,USA New[56]
                            York,   NY,
                                 that      USA [56]
                                      followed      thethat  followed
                                                         ligand           the ligand
                                                                  preparation     withpreparation
                                                                                          Ligprep [57].withWeLigprep  [57].
                                                                                                                computed
We  computed    enrichment      factors   (EF) and   areas  under    the  curve  (AUC)     for
enrichment factors (EF) and areas under the curve (AUC) for receiver-operator curves (ROC) usingreceiver-operator   curves
(ROC)
Maestro using Maestro
          2017-4,        2017-4, LLC,
                  Schrodinger       Schrodinger
                                           New York,LLC,   New
                                                         NY,  USA.York, NY, USA .

   Conclusions
5. Conclusions
     We would like to emphasize that our study aimed to perform an in silico study to identify
prospective CB2-selective compounds from C. Sativa that could be confirmed   confirmed by experimental
                                                                                             experimental studies
                                                                                                             studies
in the future. This
                 This study
                       study shows
                             shows the
                                     the first
                                          first robust
                                                robust CB1/CB2
                                                        CB1/CB2 QSARQSAR models
                                                                             models that
                                                                                      that are reproducible and
applicable to a large variety of chemical scaffolds. Our  Our developed
                                                                developed model
                                                                              model was
                                                                                     was trained
                                                                                           trained on
                                                                                                   on a large and
diverse set
         setofofcompounds
                  compounds that surpassed
                               that surpassedrecent  studies
                                                  recent      [26]. We
                                                          studies   [26].based our method
                                                                           We based           on machine
                                                                                       our method          learning,
                                                                                                     on machine
learning, which utilized data from thousands of experimental studies on CB1/CB2 activities. been
which  utilized  data from thousands   of experimental    studies   on CB1/CB2    activities. Our study   was   Our
conducted
study  was without    making common
             been conducted     withoutmistakes
                                          makingsuch     as, e.g.,
                                                    common         not conducting
                                                                mistakes    such as,data  curation
                                                                                      e.g.,        or its improper
                                                                                            not conducting      data
curation or its improper execution, lack of a validation protocol, applying inappropriate statistical
characteristics for estimation of predictive performance, and finally, lack of applicability domain
determination [58–60]. We also showed that the applicability domain estimation based on a gradient
boosted model latent space allows the accurate prediction of the model confidence. What is more, we
validated the estimation of the model confidence interval, which allows users to make conclusions
You can also read