Redefining the H-NS protein family: a diversity of specialized core and accessory forms exhibit hierarchical transcriptional network integration ...

Page created by Hector Rivera
 
CONTINUE READING
10184–10198 Nucleic Acids Research, 2020, Vol. 48, No. 18                                                      Published online 7 September 2020
doi: 10.1093/nar/gkaa709

Redefining the H-NS protein family: a diversity of
specialized core and accessory forms exhibit
hierarchical transcriptional network integration
Stephen Fitzgerald1,2 , Stefani C. Kary1 , Ebtihal Y. Alshabib1,3 , Keith D. MacKenzie1,3 ,
Daniel M. Stoebel4 , Tzu-Chiao Chao1,5 and Andrew D.S. Cameron 1,3,*
1
 Department of Biology, University of Regina, Regina, Saskatchewan S4S 0A2, Canada, 2 Division of Immunity and
Infection, The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh EH25 9RG, UK, 3 Institute for

                                                                                                                                                              Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
Microbial Systems and Society, University of Regina, Regina, Saskatchewan S4S 0A2, Canada, 4 Department of
Biology, Harvey Mudd College, Claremont, CA 91711, USA and 5 Institute of Environmental Change and Society,
University of Regina, Regina, Saskatchewan S4S 0A2, Canada

Received April 29, 2020; Revised August 07, 2020; Editorial Decision August 12, 2020; Accepted August 23, 2020

ABSTRACT                                                                        INTRODUCTION
H-NS is a nucleoid structuring protein and global re-                           Bacterial nucleoid associated proteins (NAPs) are abun-
pressor of virulence and horizontally-acquired genes                            dant DNA-binding proteins that perform the dual func-
in bacteria. H-NS can interact with itself or with                              tions of regulating gene expression and shaping higher-
homologous proteins, but protein family diversity                               order chromosome structures. The heat-stable nucleoid-
                                                                                structuring protein (H-NS), also referred to as the histone-
and regulatory network overlap remain poorly de-
                                                                                like nucleoid-structuring protein, is an archetypal NAP best
fined. Here, we present a comprehensive phyloge-                                studied as a repressor of gene expression in Escherichia
netic analysis that revealed deep-branching clades,                             coli, Salmonella enterica and other Gram-negative bacteria
dispelling the presumption that H-NS is the progeni-                            (1). H-NS performs two core cellular functions: it represses
tor of varied molecular backups. Each clade is com-                             the expression of hundreds of gene targets, and it forms
posed exclusively of either chromosome-encoded or                               scaffolds that constrain chromosomal microdomains (2–5).
plasmid-encoded proteins. On chromosomes, stpA                                  This dual functionality allows H-NS to repress gene expres-
and newly discovered hlpP are core genes in spe-                                sion through a classical mechanism of competing with acti-
cific genera, whereas hfp and newly discovered                                  vator proteins for DNA binding sites in gene promoters, or
hlpC are sporadically distributed. Six clades of H-NS                           H-NS can repress transcription at a broader level by restruc-
plasmid proteins (Hpp) exhibit ancient and dedicated                            turing regions of DNA to sequester gene promoters (6–9).
associations with plasmids, including three clades                                 H-NS regulates a wide range of phenotypes in E. coli
                                                                                and Salmonella in response to environmental parameters,
with fidelity for plasmid incompatibility groups H, F                           including temperature, osmotic pressure, metal ion con-
or X. A proliferation of H-NS homologs in Erwini-                               centrations and pH (10–13). H-NS exerts global transcrip-
aceae includes the first observation of potentially                             tional control by polymerizing along extended stretches
co-dependent H-NS forms. Conversely, the observed                               of DNA and forming higher-order structures that con-
diversification of oligomerization domains may facil-                           nect disparate DNA loci through protein bridges. H-NS
itate stable co-existence of divergent homologs in                              responds to physico-chemical factors by undergoing con-
a genome. Transcriptomic and proteomic analysis                                 formational changes in filaments and bridges, thus trans-
in Salmonella revealed regulatory crosstalk and hi-                             ducing gene regulatory signals by modifying DNA topol-
erarchical control of H-NS homologs. We also dis-                               ogy and promoter activity (7,14–16). In addition to op-
covered that H-NS is both a repressor and activator                             erating as a regulatory nexus of transcriptional activity
of Salmonella Pathogenicity Island 1 gene expres-                               and cellular physiology, H-NS maintains cellular fitness by
                                                                                preventing intragenic transcription from promoter-like se-
sion, and both regulatory modes are restored by Sfh                             quences (17) and by silencing the expression of horizontally-
(HppH) in the absence of H-NS.                                                  acquired AT-rich genes (2,5,14,18–21). The ‘xenogeneic si-
                                                                                lencing’ of horizontally acquired genes has been observed

* To   whom correspondence should be addressed. Tel: +1 306 337 2568; Email: andrew.cameron@uregina.ca


C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2020, Vol. 48, No. 18 10185

in S. enterica serovar Typhimurium, where H-NS represses        annotations according to top BLAST hits has resulted in
the horizontally-acquired Salmonella Pathogenicity Islands      H-NS homologs with multiple names in use (e.g. Hfp and
(SPI)-1 and SPI-2 that mediate invasion of host tissues dur-    HNS2) (26,40), or multiple homologs in a genome all being
ing infection (2,6,22).                                         classified as H-NS. Thus, gene and protein names may not
   H-NS belongs to a large family of functionally               accurately reflect evolutionary and functional relatedness.
and structurally similar proteins found in a diversity          Furthermore, the lack of rigorous phylogenetic classifica-
of Gram-negative bacteria, including all families of            tion of H-NS homologs raises the possibility that distinct
␥ -proteobacteria, and in some genera of ␣- and ␤-              clades of H-NS homologs have gone undetected.
proteobacteria, including Rhodobacter (␣), Bordetella (␤)          Enterobacterales presents an ideal bacterial order in
and Burkholderia (␤) (23,24). In genomes that encode two        which to study the evolution and interaction of H-NS fam-
or more H-NS homologs, amino acid sequence and struc-           ily proteins as all members appear to encode a chromo-
tural conservation allow homologs to form protein–protein       somal H-NS protein, with a diversity of homologs en-
contacts (25,26). Conversely, conservation of DNA binding       coded chromosomally and on large self-transmissible plas-

                                                                                                                               Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
properties causes homologs to compete for overlapping           mids (23,24,41). Here, we present a delineation of the H-
DNA binding sites (3,27,28). The resultant diversity of         NS protein family using model genomes from four bac-
hetero-oligomers and DNA binding modes creates inter-           terial families, Enterobacteriaceae, Erwiniaceae, Pectobac-
connected gene regulatory networks that are modulated by        teriaceae and Yersiniaceae. Three chromosomally-encoded
the relative concentrations of H-NS and its homologs (29).      clades correspond to the established family members H-
   The best characterized H-NS homolog, StpA                    NS, StpA and Hfp. The phylogeny adds two newly iden-
(Suppressor of mutant td phenotype), shares 58% amino           tified clades of chromosomal proteins that we have collec-
acid identity with H-NS and a common two-domain struc-          tively dubbed Hlp (H-NS-like protein) with a modifier to re-
ture (30). StpA binds at and regulates many of the same         flect localization on chromosomes (HlpC) or allegiance to
genes as H-NS, it can form homodimers, heterodimers             the predominant host genus Pantoea (HlpP). An additional
and oligomers with H-NS through N-terminal interac-             six clades are strictly constrained to plasmids, which we
tions, and StpA can form protein filaments and bridges          named H-NS plasmid proteins (Hpp) and classified accord-
like H-NS (27,28,31–34). Although initially considered a        ing to incompatibility groups F (HppF), H (HppH) and X
molecular back-up for H-NS, the StpA regulons in E. coli        (HppX), or according to taxonomic allegiance in the three
and Salmonella have since been shown to contain genes           clades where Inc groups could not be predicted, Pantoea
outside the H-NS regulons (27,28). StpA has additional          (HppP), Erwinia (HppE) and Rahnella (HppR). To exam-
RNA chaperone activity, is more thermostable, and binds         ine how homologs modulate H-NS function, we tested how
DNA with greater affinity than H-NS (31,35,36). Distinct        the presence or absence of discrete H-NS subtypes impact
and overlapping biological properties with H-NS have also       the proteome and gene expression in a strain of Salmonella
been described for another chromosomal homolog, Hfp             Enteritidis that contains up to four H-NS homologs. This
(H-NS family protein). Hfp was identified in uropathogenic      revealed hierarchical control of gene expression with H-NS
E. coli (UPEC), where it is required for normal growth and      at the apex as a strong repressor of virulence gene expres-
repression of the bgl and K5 capsular determinant genes,        sion and a repressor of hfp and stpA expression.
and is a negative regulator of hns expression (26). Little is
known about Hfp function as it is rare and sporadically
distributed in Escherichia species.                             MATERIALS AND METHODS
   Sfh (Shigella flexneri H-NS-like protein) is an H-NS
                                                                Genome sequence analysis
homolog encoded by pSf-R27, a large, self-transmissible,
IncHI1 plasmid (37,38). Three-way protein-protein inter-        The bioinformatic workflow used to identify H-NS ho-
actions between H-NS, StpA and Sfh have been observed           mologs in Enterobacterales genomes is outlined in Supple-
in Shigella flexneri, and Sfh was shown to restore motil-       mentary Figure S1. At the project start, plasmid-encoded
ity and maintain repression of proU and bgl in an hns mu-       H-NS proteins were initially identified in 2014 by BLASTP
tant background (25). Sfh was found to bind at and regu-        searching the NCBI Microbial Plasmid database using
late a defined sub-set of H-NS regulated genes with lower       E. coli K-12 strain W3110 H-NS (BAA36117.1) and
AT content than those preferred by H-NS (3). In S. Ty-          StpA (BAA16535.1) and the IncHI1 plasmid-encoded Sfh
phimurium, Sfh can act as a ‘stealth factor’ that protects      (AAN38840.1) as query sequences (Supplementary Table
host cells from uncontrolled expression of chromosomal          S2). For each strain identified to encode a plasmid-borne
and plasmid-encoded genes caused by H-NS binding to             H-NS homolog, the whole genome sequence was searched
newly arrived plasmid genes (39). Thus, stealth function is     for all additional homologs. A separate strategy was used
explained as an extension of the xenogeneic silencing exhib-    to find homologs of Hfp (ABG69928.1) from E. coli UPEC
ited by H-NS (20).                                              strain 536, which was identified on chromosomes of several
   A robust phylogenetic analysis is required to resolve the    non-E. coli species from our initial searches (Supplementary
diversity, origins, and functional relationships of H-NS        Figure S1). All Enterobacterales genomes were searched us-
family proteins. Despite an increasing number of mecha-         ing the protein sequence as a BLASTP query to identify any
nistic studies of H-NS homologs in Escherichia, Salmonella      Hfp homologs. Any genomes found to carry Hfp homologs
and other genera of Enterobacteriaceae, the phylogenetic        were subsequently searched using H-NS, StpA, and Sfh as
distribution of these homologs is largely unclear. Also prob-   query sequences to find any other H-NS homologs that oc-
lematic is that a mix of manual and automated genome            cur in Hfp-containing genomes, as described above.
10186 Nucleic Acids Research, 2020, Vol. 48, No. 18

   In all BLAST searches, truncated H-NS homologs such           Kanamycin, gentamicin, ampicillin and chloramphenicol
as Ler or Hha (7) that consist solely of an N- or C-terminal     were used at concentrations of 50, 50, 50 and 20 ␮g ml−1
domain were excluded from consideration by restricting           respectively.
searches to proteins with >50% amino acid identity to H-
NS and a BLAST cut-off value of E < 10−10 .                      Strain construction and DNA manipulation
   The Plasmid Database (PLSDB) (https://ccb-microbe.
                                                                 To delete the stpA and hfp open reading frames in S. En-
cs.uni-saarland.de/plsdb/) (42) version 2019 10 07, con-
                                                                 teritidis EN1660, the cat and neoR resistance genes were
taining 18 457 complete plasmid sequences, was queried
                                                                 PCR amplified from plasmids pKD3 and pKD4 followed
using Salmonella enterica Typhimurium SL1344 H-NS
                                                                 by ␭-red mediated recombination to create stpA and hfp
(CBW17777.1). The resulting list of 465 homologs was cu-
                                                                 deletion mutants (49). Mutations were transduced to a fresh
rated as follows: removal of non-Enterobacterales species
                                                                 EN1660 background by generalized P22 transduction and
(39 homologs), removal of rare Enterobacterales genera not
                                                                 confirmed by DNA sequencing. The hns mutation was
included in this study (Cedecia (one homolog), Leclercia

                                                                                                                                 Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
                                                                 generated previously in S. Typhimurium LT2a (50) and was
(seven homologs), Phytobacter (one homolog), Plesiomonas
                                                                 introduced to S. Enteritidis EN1660 by P22 transduction.
(two homologs) and Raoultella (four homologs)), and re-
                                                                 The hns gene sequence and promoter region are identical
moval of two plasmids determined to be mis-assembled
                                                                 between LT2a and EN1660; further, transduction from S.
contigs that contain large segments of chromosomal genes
                                                                 Typhimurium to S. Enteritidis is not expected to result in
including hns and stpA (Supplementary Table S3). The re-
                                                                 significant genetic changes because the 10 kbp flanking hns
maining 409 plasmid-encoded H-NS homologs (Supple-
                                                                 has 99.3% identity between the two strains, with only 72 sin-
mentary Table S3) contained hundreds of identical se-
                                                                 gle nucleotide variants across the region. Plasmid pSfR27
quences, which were collapsed to yield 54 unique H-NS ho-
                                                                 was conjugated into S. Enteritidis EN1660 and derivative as
molog sequences for inclusion in the phylogenetic analysis.
                                                                 described previously (39). Briefly, donor (S. Typhimurium
                                                                 SL1344 pSfR27) and recipient strains were grown sepa-
Phylogenetic analysis                                            rately overnight at permissive temperature (25◦ C) without
                                                                 shaking. Donor and recipient strains were mixed 1:5 and
Multiple sequence alignment of all H-NS and H-NS ho-
                                                                 20 volumes of LB broth added. Mixed cultures were in-
molog sequences was performed using MAFFT with de-
                                                                 cubated statically at 25◦ C for 24 h to allow conjugation.
fault settings as part of the Max-Planck institute Bioin-
                                                                 S. Typhimurium SL1344 has a mutation in hisG so cannot
formatics toolkit (43) after identical protein sequences
                                                                 grow on M9 minimal medium without histidine supplemen-
were collapsed into single representative sequences. Pro-
                                                                 tation, allowing selection of pSfR27 positive S. Enteritidis
tein alignments were imported into MEGA6 (44) and
                                                                 EN1660 colonies on M9 minimal medium with gentamicin.
maximum-likelihood (ML) trees were constructed using
the LG+G amino acid substitution model and 500 boot-
                                                                 Genomic DNA extraction and purification, genome sequenc-
strap replicates. Tree analysis and visualisation was car-
                                                                 ing, and sequence alignments
ried out using iTOL (45) and FigTree v1.4.2 (A. Rambaut,
FigTree [http://beast.bio.ed.ac.uk/FigTree]). The reference      S. Enteritidis EN1660 hns was inoculated from frozen
genome phylogeny for Enterobacterales was provided by            glycerol stock onto Lennox agar (10 g l−1 tryptone, 5 g l−1
the Pathosystems Resource Integration Center (PATRIC)            yeast extract, 10 g l−1 NaCl, 1.5% agar; pH 7.0) and grown
(www.patricbrc.org) (46).                                        overnight at 37◦ C. Colonies were used to inoculate 5 ml
                                                                 Lennox broth and the culture was incubated for 22 h at
                                                                 37◦ C with agitation at 200 RPM. 200 ␮l of the broth cul-
Sequence motif analysis
                                                                 ture (optical density of 1.30 at 600 nm) was collected and
The multiple alignment of all H-NS homologs was sub-             used as input for genomic DNA extraction. The extraction
divided into the subtypes identified as clades in the phy-       was performed using the Rapid Bacterial Genomic DNA
logenetic analysis. WebLogo3 (http://weblogo.threeplusone.       Isolation Kit (Bio Basic; BS8225), and DNA eluted with
com) (47) was used to count amino acids at each position,        Tris-EDTA solution (10 mM Tris–HCl, 1 mM EDTA; pH
and these counts were subsequently divided by the amino          8.0). A genomic DNA library was prepared for sequencing
acid composition in the H-NS consensus to determine per-         using the NEB Ultra II FS DNA Library Prep Kit for Illu-
cent conservation and plotting as heatmaps. Sequence lo-         mina (New England BioLabs; E7805S), following the pro-
gos were generated by WebLogo3 with the units field set to       cedure for samples with ≤100 ng of input DNA. As part
‘probability’.                                                   of library preparation, five PCR cycles were used to enrich
                                                                 for adaptor-ligated DNA fragments and the sample was in-
                                                                 dexed using the NEBNext Multiplex Oligos for Illumina,
Bacterial strains and plasmids
                                                                 Index Primers Set 1 (New England BioLabs; E7735S). The
Bacterial strains used for experiments in this study, their      library was sequenced using the Illumina MiSeq platform
genotypes and source, where applicable, are listed in Supple-    and a Reagent Nano Kit V2 (paired end, 500 cycles).
mentary Table S1; all are derivatives of S. Enteritidis strain      Demultiplexed sequences were filtered for PhiX se-
EN1660 (48). Unless otherwise stated bacterial strains were      quences using bbduk (51) and the Enterobacteria
cultured with aeration (200 rpm) in Luria broth (LB), also       phage phiX174 sensu lato reference sequence (RefSeq:
referred to as Lysogeny broth (LB), (10 g l−1 tryptone,          NC 001422.1). Filtered sequencing reads were then qual-
5 g l−1 yeast extract, 10 g l−1 NaCl; pH 7.0) at 37◦ C.          ity trimmed using Trimmomatic version 0.39 (52) and
Nucleic Acids Research, 2020, Vol. 48, No. 18 10187

the following parameters: ILLUMINACLIP:TruSeq3-                    RESULTS
SE.fa:2:30:10:8:TRUE, CROP 250, HEADCROP 10,
                                                                   H-NS homologs are differentiated into deep-branching clades
LEADING 3, TRAILING 3, SLIDINGWINDOW 4:30,
MINLEN 36. Trimmed sequences were imported into                    To establish a robust phylogeny of H-NS homologs that
Geneious Prime 2019. The whole genome sequence of the              co-occur in Enterobacterales, we considered only bacterial
wildtype strain was previously published and is available at       strains encoding plasmid-borne H-NS homologs identified
GenBank (LUUA00000000.1) (48).                                     in the NCBI Microbial Plasmid database, and belonging
                                                                   to the families: Enterobacteriaceae, Erwiniaceae, Pectobac-
Quantitative reverse transcription PCR                             teriaceae and Yersiniaceae. These selection criteria yielded
                                                                   29 plasmid-encoded H-NS homologs in 15 genera (Supple-
RNA was isolated from bacterial cultures during exponen-
                                                                   mentary Table S1). For each of these strains, BLAST was
tial growth (OD600 nm = 0.12–0.15) and stationary phase
                                                                   used to identify all other H-NS homologs in the genome.
(OD600 nm = 2.0). 0.2OD600 nm units of bacteria were har-
                                                                   In addition to the well-characterized homologs StpA and

                                                                                                                                   Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
vested, held on ice and 2/5 volume of stop solution (95%
                                                                   Sfh, this workflow identified proteins identical or highly
ethanol/5% phenol) was added. Total RNA was extracted
                                                                   similar (>90% amino acid identity) to the recently discov-
using an EZ-10 Spin Column Total RNA Miniprep kit (Bio-
                                                                   ered protein Hfp. Hfp was previously only identified in
Basic, Canada) and RNA concentration was quantified us-
                                                                   uropathogenic E. coli (UPEC) species (26), prompting us
ing a Nanodrop ND-1000. 2 ␮g of RNA from each sam-
                                                                   to perform a second search of complete Enterobacterales
ple was DNase treated using Turbo™ DNase (Invitrogen™ )
                                                                   genomes for proteins with >90% amino acid identity to
in a 50 ␮l reaction and 5 ␮l of DNase treated RNA was
                                                                   Hfp. In cases where identical protein sequences were en-
then reverse transcribed using a Verso cDNA synthesis kit
                                                                   coded by two or more genomes of the same species, a sin-
(Thermo Scientific™ ) to generate cDNA pools. The relative
                                                                   gle representative genome was retained for analysis. This
abundance of target mRNA molecules was determined by
                                                                   yielded 38 additional Hfp sequences in 30 strains. Other
quantitative reverse transcription PCR (qPCR) using gene
                                                                   genera, Hafnia, Edwardsiella, Morganella, Photorhabdus,
specific primer pairs (Supplementary Table S6) and iTaq™
                                                                   Xenorhabdus and Providencia, did not encode Hfp, StpA,
Universal SYBR® Green Supermix (Bio-Rad). Quantifi-
                                                                   or any plasmid-borne H-NS proteins, so were not further
cation of mRNA was achieved using an internal calibra-
                                                                   considered in this study. Altogether, the dataset contained
tion curve generated from serially diluted genomic DNA of
                                                                   193 protein sequences from 15 genera of Enterobacterales
known quantity.
                                                                   (Tables S1 and S2), of which 116 were unique sequences (88
                                                                   chromosomal and 28 plasmid).
Mass spectrometry
                                                                      Phylogenetic relationships between H-HS homologs were
Cells from exponentially growing wildtype and mutant S.            inferred using maximum likelihood. Initial consideration of
Enteritidis EN1660 cultures (100 ml) were harvested at             chromosomally-encoded proteins revealed five evolutionar-
OD600 nm 0.12–0.15, then lysed with bead beating. Pro-             ily distinct protein clades, each with bootstrap support over
teins were extracted using acetone precipitation overnight         70% (Figure 1A and Supplementary Figure S2A). The phy-
at −20◦ C. After resolubilization in 50 mM ammonium bi-            logeny in Figure 1A is unrooted to facilitate visualization
carbonate buffer, 50 ␮g of protein sample was digested             of each high-confidence homolog type, or ‘clan’ (53), us-
with trypsin (Sigma-Aldrich). After drying, the resulting          ing a coloured branch, while simultaneously deemphasiz-
peptides were spiked with 40 fmol/␮l rabbit phosphory-             ing the tree trunk where branching order cannot be deter-
lase B in running buffer (0.1% formic acid, 3% acetonitrile)       mined due to low bootstrap support. Each clan in the un-
before analysis with a Synapt G2 HDMS (Waters) cou-                rooted phylogeny forms a clade in a tree rooted with Vibrio
pled to a Nanoacquity (Waters) nano-LC with an Acquity             cholerae H-NS (Supplementary Figure S2A). Three clades
UPLC T3HSS column (75 mm × 200 mm). The separation                 correspond to each of the previously described chromoso-
was conducted with a gradient from 3% acetonitrile/0.1%            mal H-NS homologs: H-NS (green), StpA (red) and Hfp
formic acid to 45% acetonitrile/0.1% formic acid at a flow         (blue). All genomes in this analysis encode H-NS, whereas
rate of 0.3 ␮l/min. A total of 1 ␮g digest was injected in each    StpA and Hfp have restricted distributions that are exam-
run and eluting peptides were analysed in positive data-           ined in detail below. A newly resolved clade was detected
independent-acquisition mode (MSE) with 1s scan time. In           in the Enterobacteriaceae genera Escherichia, Enterobacter,
low-energy MS mode, data were collected at a collision en-         Klebsiella and Salmonella, which we named ‘HlpC’ for H-
ergy of 4eV. High-energy collision energy was ramped be-           NS-like protein on chromosomes. Another newly resolved
tween 18 and 42 V. Leucine enkephaline was measured as             clade is comprised of proteins present only in the genus
lock mass every 30s to maintain mass accuracy throughout           Pantoea, which we named ‘HlpP’ for H-NS-like protein in
the run. The resulting spectra were analysed with the Pro-         Pantoea.
teinLynx Global Server (PLGS) v.3.02 and searched against
the UniProt S. enterica reference proteome (UP000001014,
                                                                   Plasmid-borne H-NS homologs have diverse origins
accessed January 2016). Protein abundance was measured
together with the identification using phosphorylase B as          Plasmid-borne H-NS homologs have been identified on a
calibrant. The false discovery rate of protein identifications     diversity of plasmids, but the evolutionary origins and bi-
was set to 4% per run. i.e. proteins identified in all three bi-   ological functions of these homologs remain largely unex-
ological replicates had an expected false discovery rate of        plored. To test the simplest hypothesis that plasmid-borne
10188 Nucleic Acids Research, 2020, Vol. 48, No. 18

     A                                                                                                                                                                   B

                                                                                                                                       l Tor
                                                                                                                                e O1 E
                                                                                                                                                                                                                                HppF
                                                                                                                                                                                       HppR

                                                                                                                        cholera

                                                                                                                                                                                                                                       Klebsiella
                                                                                                                                                                                                                                                      HppE

                                                                                                                                                                                                                           Kleb
                             Esch

                                                                        cter (2)

                                                                                                                                                                                                   Ra
                               Shigella (1)

                                                                             1)

                                                                                                                                                                                                       hn
                                K l e r w i n 2)

                                                                                                                                                                                                             Erw (3)

                                                                                                                                                                                                                            siell
                                                                                                                                                                                                                                                                         HlpP

                                                                        la (

                                                                          4)
                                  erich

                                                                                                                                                                                                          e
                                                                                                                 Vibrio

                                                                                                                                                                                                               lla
                                                                       r(
                           Ye

                                     bsi ia (

                                                            Citroba
                                                                   nel

                                                                                                                                                                                                                                           (12)
                                                                                                                                                                                                                  i nia

                                                                                                                                                                                                                                                               Erwinia (5)
                                      E a(

                                                                                                                                                                                                                                a (1
                                                                   cte
                     Se
                              r        sin

                                                                                                                                                                                                                                                                                                                       )
                                        e l l a 2)
                                        ia (2)

                                                        En a l m o
                       rra                                                                                                                                                                                                                                                                                           (8

                                                                                                                                                                                                                                                                                             (2)
                                                                ba

                                                                                                            r (3)
                Ra

                                                                                                                                                                                                                          (1)
                                              i

                                                                                                                                                                                                                                  )
                           tia                                                                                                                                                                                                                                                                                   a

                                                                                                                                                                   (1)

                                                                                                                                                                                                                                                                             (3)
                                                             ro
                     hn

                                                                                                                                                                     )
     Pec                                                                                                                                                                                                                                                                                                   oe

                                                                                                                                                                                                                                                                                        ea
                                                (2)

                                                                                                                                                                  (2
                       ell (1)

                                                          te
           tob                                                                                                                                                                                                                                                                                           nt
                                                          S

                                                                                                         te
                          a(

                                                                                                                                                              ria

                                                                                                                                                                                                                                                                         ea

                                                                                                                                                                                                                                                                                      n to
               acte

                                                                                                                                                              ter
                                                                                                                                                                                                                                                                                                      Pa

                                                                                                       c
                    rium 1)

                                                                                                      2)

                                                                                                                                                 E n en n e
                                                                                                roba

                                                                                                                                                           ac

                                                                                                                                                                                                                                                                     nto

                                                                                                                                                                                                                                                                                  Pa
                          ( 2)                                                                                                                                                                                                                                                                                  HppP1

                                                                                                 ia (
            Brenn                                                                                                                                                                                   H-NS

                                                                                                                                                        ob
                                                                                                                                                                           )                                                                                                                                )

                                                                                                                                                                                                                                                           % a
                   eria (1                                                                                                                                            (2

                                                                                                                                                                                                                                                    100%
                                                                                                                                                                                     (7)                                                                                                              a (9

                                                                                                                                                    ter

                                                                                                                                                                                                                                                         98 P
                                                                                             ich
                                                                                          Ente
                             )

                                                                                                                                                    Br
                                                                                                                                                                y a                                                                                                                              oe
             Dickeya (2)                                                                                                                                     ke                ium                                                                                                           ant                          ia (1)

                                                                                                                                                                                                                                                         95%
                                                                                                                                                                                                                                                                                 9%

                                                                                         her
                                                                                                                                                           c          er                                                                                                     9           P                  Erwin

                                                                                                                                                                                                                                    77
                                                                                                                                                         Di bact

                                                                                                                                                                                                                                                       97%
                                                                                                                                                                                                                                                                                                                                       )
                                                                                                                                                                                                                                                                                                                            Pantoea (9

                                                                                                                                                                                                                                       %
                        Sodalis (1)
                                                            H-NS

                                                                                                      Esc
                                                                                                                                                           c to                                                                        98                                96%           94%            100%
                               Erwinia (1)                                                                                                              Pe                                                                                  %                                                                                   HppP2
                                                                                                                                                                                                                                                                                         HppP

                                                                                                                                                                                                                                                                                                                                                        Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
                                      (3)                                                                                                                                 r (1)
                             Pantoea
                                                                99                                                                                                bacte
                                                                     %
                                                                                                                                                        Entero ia (1)
                                                                                                                                                        Escherich                                                                      84
                                                                                                                                                                                                                                            %
                                                               99%                 71%                                                                 Klebsiella (3)
                                                                                                                                                     Kleb                                      Hfp                                                                                                 Vibrio cholerae O1 El Tor

                                                                                                                                                                                                                                          95%
                                                                                                                                                                                                                                                                     99%

                                                                                                                                                                                                                                                     65%
                                                                                                                                                    Ci
                                        )                                                                                                              tro siella (
                                a (3                                                                                                                                   1)
                                                                                                                                               Sa sch
                                                                                                                                                          ba
                     P    antoe                                                                       Hfp                                       E            ct e                                                                                                                                           StpA
                                                                                                                                                 lm eric
                                                                             100

                                                                                                                                                                                                                                                                             98
                                            HlpP                                                                                                                  r(
                                                                                                                                                    on hia
                                                                         %

                                                                                                                                                                                                                                                                              %
                                                                                                                                                                     1)
                                                                              %

                                                                                                                                                      el
                                                                     98

                                                                                                                                                         la
                                                                                                                                                            (3

                                                                                                                                                                                                                                                                  En
                                                                                                                                                               )
                                                                                                                                                                                                                      )
                                                                                                                                                                 (3
                                                                                                                                                                                                                 (2

                                                                                                                                                                                                                                                                      ter
                                                                                         Tree Scale: 0.1                                                                                                                                                                                                        Tree Scale: 0.1
                                                                                                                                                                    )
                                                                                                                                                                                                            ia

                                                                                                                                                                                                                    la ( )
                                                                                                                                                                                                             i ge r ( 5
                                                                                                                                                                                                        h

                                                                                                                                                                                                                                                                             ob
                                                                                                                                                                                                                 la ( 1)
                                                                                                                                                                                                   r ic

                                                                                                                                                                                                                       2)
                      StpA

                                                                                                                                                                                                                                                                              ac
                                                                                                                                                                                                                      te
                                                                                                                                                                                              he

                                                                                                                                                                                                                       )
                                                                                                                                                                                                     Klebsiella (1)
                                                                                                                                                                                                                                            Salmonella
                                                                                                                                                                                                                                            Escherichia (1)
                                                                                                                                                                                                             lla (1
                                                                                                                                                                                                                  ac
                                                                                                                                                                                          c
                                                                                                                                                                                                                                                                                                                           HppX

                                                                                                                                                                                                                                                                                  ter
                                                                                                                                                                                     Es

                                                                                                                                                                                                                  l
                                                                                                                                                                                               ob

                                                                                                                                                                                                            n el

                                                                                                                                                                                                                                                                                               So
                                                                                                      HlpC

                                                                                                                                                                                                                                                                                      (1)
                                                                                                                                                                                              r
                                                                                                                                                                                           te

                                                                                                                                                                                                       Shige
                                                                                                                                                                                                          Sh
                                                                                                                                                                                                         mo

                                                                                                                                                                                                                                                                                                    da
                            )                                                                                                                                                         En
              acter (2 )
       Citrob                                                                                                                                                                                                                                                                                                                     Ye

                                                                                                                                                                                                                                                                                                     l is
                                                                                                                                                                                                                 Sal
                       a (1 )                                                                                                                                                                                                                                                                                                        rsin
                                                 (6)

                 e  ll

                                                                                                                                                                                                                                                                                                         (1)
           Shig              (1                                                                                                                                                                                                                                                                                                             ia (

                                                                                                                                                                                                                                                       (2)
                                                                                                                                                                                                                                                                                                                                 Yer              1)
                                               bacter

                        lla
                                  )
                                (3

                    ne                                                                                                                                                                                                                                                                                                          Cit s i n i a
                                                                                                                                                                                               HppH
                                      )

                  o
                            ia

                                                                                                                       K                                                                                                                                                                                                      E s ro
                                   (2

               lm                                                                                                     Sa lebsie                                                                                                                            HlpC                                                                       b       (2)
                          ich

                                                                                                                                                                                                                                                                                                   Erw lebs ichia (3
            Sa                                                                                                                                                                                                                                                                                                                   ch a c
                                lla

                                                                                                                                                                                                                                                                                                    Salebsi
                                                                                                                        lm        lla (

                                                                                                                                                                                                                                                                                                     K
                                                                                                                                                                                                                                                                                                                                   er t e r
                                             Entero
                      er

                                                                                           Es terobac
                                                                                           Citrobacteri (1)

                                                                                                                                        3)
                             sie

                                                                                           En

                                                                                                                           on

                                                                                                                                                                                                                                                                                                       l m e l la
                                                                                                                                                                                                                                                                                                                                     ich

                                                                                                                                                                                                                                                                                                       K
                                                                                                                                                                                                                                                                                                                                                  (1)

                                                                                                                                                                                                                                                                                                        Salmonel
                                                                                                                                                                                                                                                                                                        Esch
                   ch

                                                                                                                                                                                                                                                                                                         inia iella
                                                                                                                              ell
                                                                                             ch

                                                                                                                                                                                                                                                                                                           on
                                                                                                                                                                                                                                                                                                                                         ia
                           eb
                 Es

                                                                                                                                  a
                                                                                                er

                                                                                                                                                                                                                                                                                                                                             (1

                                                                                                                                                                                                                                                                                                              ell
                                                                                                                                    (1
                          Kl

                                                                                                                                                                                                                                                                                                              (1) (2)
                                                                                                                                                                                                                                                                                                                                                )
                                                                                                   ich

                                                                                                                                       )

                                                                                                                                                                                                                                                                                                                er

                                                                                                                                                                                                                                                                                                                  a
                                                                                                                                                                                                                                                                                                                    (1
                                                                                                       ia

                                                                                                                                                                                                                                                                                                                     (1)

                                                                                                                                                                                                                                                                                                                       )
                                                                                                          (4
                                                                                                           ter (

                                                                                                                                                                                                                                                                                                                         la (2)
                                                                                                             )
                                                                                                                 3)

                                                                                                                                                                                                                                                                                                                              )
                                                                  Family                                             Genus
                                                                  Enterobacteriaceae                                Citrobacter, Enterobacter, Escherichia, Klebsiella, Salmonella, Shigella
                                                                  Erwiniaceae                                        Erwinia, Pantoea
                                                                  Pectobacteriaceae                                 Brenneria, Dickeya, Pectobacterium, Sodalis
                                                                     Yersiniaceae                                   Rahnella, Serratia, Yersinia

Figure 1. Phylogenetic classification of H-NS family proteins in Enterobacterales. (A) Chromosomal-encoded H-NS homologs. (B) Chromosomal and
plasmid-encoded H-NS homologs. The protein clades are categorized as: H-NS (green), StpA (red), Hfp (blue), HlpC (teal), HlpP (purple), HppP (ma-
genta), HppX (orange), HppH (pink), HppE (grey), HppR (grey) and HppF (gold). The bacterial genera that encode the homologs are indicated at the
branch tips, and the number of representative protein sequences are in brackets. Clusters of branches that are too small to resolve in the figure are coloured
with grey background to improve connections between branches and names. Rooted phylogenies with each branch and species name are provided in Sup-
plementary Figure S2. Phylogenies are derived from maximum likelihood inferences of evolutionary relationships, and the scale bar indicates the number
of substitutions per site. Bootstrap scores are shown for ancestral branches that define the clades and sub-clades described in the text.

hns, a second phylogenetic analysis was conducted that in-                                                                                                                       and Salmonella on the plasmids pSf-R27 and pR27, respec-
cluded: all chromosomal H-NS homologs described above,                                                                                                                           tively, that belong to plasmid incompatibility (Inc) group H
the 29 plasmids sequences from the NCBI Microbial Plas-                                                                                                                          (25,54,55). The other proteins in this clade are also encoded
mid database, and 54 additional H-NS homolog sequences                                                                                                                           by IncH plasmids, thus we refer to members of this clade
identified in the Plasmid Database (PLSDB). Figure 1B                                                                                                                            as HppH. Two additional protein clades correspond to Inc
shows the expanded phylogeny of H-NS homologs with                                                                                                                               groups X and F, for which we have named the clades HppX
plasmid-encoded proteins included. Importantly, the addi-                                                                                                                        and HppF, respectively (Figure 1B). HppX contains the
tion of a large number of new sequences did not affect the                                                                                                                       greatest sequence diversity of any H-NS homolog clade; be-
cohesion of the chromosomal clades in Figure 1A. More-                                                                                                                           sides H-NS, it is the only homolog found in all four bacterial
over, none of the plasmid forms occur in the H-NS clade, in-                                                                                                                     families being considered here. In contrast, HppF is found
dicating that none of the plasmid forms are descended from                                                                                                                       only on plasmids in Klebsiella in this analysis. The HppF,
chromosomal H-NS.                                                                                                                                                                HppH, and HppX clades each branch from the trunk of
   Six new clades are composed exclusively of plasmid-                                                                                                                           the unrooted phylogenetic tree (black in Figure 1B); this
borne sequences, and we have dubbed proteins in each of                                                                                                                          distinction from the chromosomal homolog clades suggests
these clades as H-NS plasmid proteins (Hpp) (Supplemen-                                                                                                                          these plasmid forms are unlikely to have originated from
tary Table S4). Among these, only the protein Sfh has been                                                                                                                       duplication and divergence of a chromosomal gene in the
characterized (25,37). Sfh has been identified in Shigella                                                                                                                       Enterobacterales.
Nucleic Acids Research, 2020, Vol. 48, No. 18 10189

  Three additional clades of Hpp are present in the phy-        mentary Table S1). Within these genera only some strains
logeny, but these could not be classified according to plas-    contain Hfp, consistent with the original discovery of hfp
mid replicons due to the absence of recognizable lnc ele-       in a small number of E. coli strains (26,57). A notable fea-
ments. Instead, we named these clades according to the pre-     ture of Hfp is that it is the only protein we found encoded
dominant bacterial hosts: Erwinia (HppE), Pantoea (HppP)        multiple times on some chromosomes, as observed in some
and Rahnella (HppR) (Figure 1B). A single H-NS homolog          strains of Escherichia, Dickeya and Pectobacterium (Sup-
sequence encoded on two plasmids in PLSDB was con-              plementary Table S1). The incongruence of the Hfp phy-
nected by a deep branch (65% bootstrap support) to HlpC,        logeny compared to those of H-NS and StpA is most sim-
which was too distant to include in the HlpC clade (Fig-        ply explained by horizontal transfer between genera and
ure 1B); this branch is represented by pASM1 in Supple-         even between families Pectobacteriaceae and Enterobacte-
mentary Figure S2B. A single H-NS homolog on plasmid            riaceae (Supplementary Figure S3B). In E. coli, hfp was con-
pKOX-ea2b branches closest to HppF, but the long branch         sistently found within a genomic island inserted upstream of
and very low bootstrap support preclude its inclusion in the    asnW tRNA, as noted in uropathogenic E. coli (26,57), or

                                                                                                                               Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
HppF clade (Figure 1B).                                         serU tRNA. In Salmonella, hfp was also consistently located
                                                                within a genomic island. A recent study investigating the
                                                                region of difference 21 (ROD21) of Salmonella Enteritidis
Evolutionary origins of chromosomal H-NS homologs
                                                                led to the classification of a group of Enterobacteriaceae-
H-NS is present in all genomes analysed (Supplementary          associated ROD21-like (EARL) genomic islands (58); 33
Table S1). Low bootstrap support of Pantoea and Erwinia         of 54 EARL islands encode H-NS homologs, and we con-
H-NS branches indicates that branching order within the         firmed by sequence alignment that all cases are Hfp.
H-NS clade is not well resolved (Supplementary Figure
S2A). Nevertheless, when accounting for the low confidence
                                                                The genetic contexts of H-NS plasmid proteins
branching order, the phylogeny of core H-NS is congruent
with the phylogeny of Enterobacterales genomes (Supple-         A rapidly growing number of complete plasmid sequences
mentary Figure S3A). The simplest interpretation of the         enables expanded surveys for H-NS homologs and their ge-
phylogenies is that H-NS was present in the last common         netic context. It has been observed that plasmids contain-
ancestor of all Enterobacterales and was never lost nor re-     ing homologs of H-NS and other nucleoid associated pro-
placed over evolutionary time, consistent with its central      teins are large (usually over 100 kb) and AT-rich (average
role in cellular homeostasis. Further, there is no evidence     54% AT) (59). In late-2019, the PLSDB contained 464 plas-
of horizontal transfer of H-NS between genera.                  mids sequences with an H-NS homolog, of which 424 oc-
   The chromosomally-encoded homolog StpA is present            cur in bacteria in the order Enterobacterales. The major-
only in the Enterobacteriaceae, indicating a singular origin    ity, 362, are from Enterobacteriaceae, with a median size
or acquisition event in the common ancestor of Escherichia,     of 245 kb and median AT of 53% (Figure 2A and B). The
Salmonella, Klebsiella, Enterobacter and Citrobacter. To        much smaller number of representatives from the Erwini-
improve the resolution of the stpA evolutionary history in      aceae and Yersiniaceae were also large (median 180 and 542
the Enterobacteriaceae, maximum likelihood phylogenies          kb, respectively), but much lower average AT (median 48%
for StpA and H-NS were compared to whole-genome phy-            in both families).
logenies (Supplementary Figure S3B). Strains from each             Eleven plasmids in PLSDB encode two copies of HppP
genus containing StpA, Escherichia, Salmonella, Klebsiella,     and one plasmid encodes two copies of HppE (Supplemen-
Enterobacter and Citrobacter, formed strongly supported         tary Table S5), making this the first observation of two H-
monophyletic clusters (Bootstrap value > 95%). Although         NS homologs encoded on the same plasmid. In one case,
the StpA cladogram has lower resolution than the genome         the identical HppP occurs twice at distant locations in dif-
phylogeny, the StpA, H-NS and whole genome phylogenies          ferent genetic contexts on a large plasmid (NZ CP014208.1)
are congruent (Supplementary Figure S3B). It was previ-         in Pantoea ananatis. In the other case of a duplicate pro-
ously inferred that stpA originated from hns duplication        tein, HppE is part of a ∼10 kb region that is duplicated
and divergence in a recent ancestor of Salmonella and E.        on the ∼48 kb plasmid pEP48 (NZ CP023568.1) in Er-
coli (24,56). However, if stpA arose by duplication of hns      winia pyrifoliae. Figure 2C examines the nine plasmids that
within the Enterobacteriaceae, the StpA clade would be          encode two distinct HppP sequences. The host Pantoea
found as a distinct branch within the H-NS clade, which         species were isolated from plants, human infections, or lab-
it is not (Figure 1A). Instead, phylogenetic evidence sug-      oratory stocks in diverse locales around the world (Fig-
gests that stpA was acquired by the common ancestor of Es-      ure 2C). The two hppP genes are consistently located close
cherichia, Salmonella, Citrobacter, Enterobacter and Kleb-      to one another on plasmids that range in size from 131
siella through horizontal gene transfer from an unidentified    to 211 kb. Each of the paired hppP genes represents dis-
donor outside of Enterobacteriaceae. Conservation of the        tinct sub-clades, HppP1 and HppP2 in Figure 1B, that have
protein in all extant members of these genera suggests it       96% and 100% bootstrap support, respectively. This phy-
acquired (or already bestowed) a core genetic function at       logenetic differentiation and the diversity of plasmids and
its inception, preventing its loss from Enterobacteriaceae      bacterial hosts harbouring these paired HppP suggests a
as the bacterial family diversified into a very wide range of   long evolutionary history of linkage. The resilience of the
habitats over many tens of millions of years.                   HppP1 and HppP2 pairing is confirmed by examining their
   The evolutionary history of Hfp is more obscure. Of the      genetic loci. In all nine plasmids, the two genes are linked
15 genera included in our study, eight contain Hfp (Supple-     to feoB and an unnamed gene encoding an alcohol dehy-
10190 Nucleic Acids Research, 2020, Vol. 48, No. 18

    A                         n: 362   51     11          C
                                                                    1                      2                         3                     4                        5
                        106                                     130,944 bp             166,637 bp               166,983 bp             173,592 bp              179,590 bp
      Base pairs

                                                              NZ_CP022519.1          NZ_CP028351.1             NC_014561.1           NZ_CP016892.1         NZ_CP034150.1
                        105                                    Nanjing, China
                                                                 Pine plant
                                                                                      Lishui, China
                                                                                     Human infection
                                                                                                                    —
                                                                                                              Biocontrol agent
                                                                                                                                      Anhui, China
                                                                                                                                        Lettuce
                                                                                                                                                           Warsaw, Poland
                                                                                                                                                           Laboratory strain

                        104                                                  6                         7                         8                         9
                                Ent    Erw    Yer                       180,464 bp               182,169 bp                  197,914 bp                211,326 bp

                                                                                                                                                                                       Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
    B                   70
                                                                    NZ_CP038855.1              NZ_CP031651.1               NZ_CP014128.1             NZ_CP034476.1
                                                                       Uganda                   Alaska, USA                   DC, USA                New York, USA
      A+T content (%)

                                                                      Eucalyptus                    Soil                    Human wound                  Apple
                        60
                                                                                                                                                                            unnamed
                                                     D1              feoB                                                    hppP1 hppP2                                      gene

                                                      2
                        50                            3
                                                      4
                                                      5
                                                      6
                        40                            7
                                Ent    Erw    Yer     8
           average                                    9
                   53.1                47.8   48.0              Fe(2+) transporter                                                                                 NAD(P)-dependent
             % AT                                               permease subunit                                                                               alcohol dehydrogenase

Figure 2. Plasmids containing H-NS homologs. (A) Size distribution of plasmids sequences in PLSDB that encode H-NS homologs. Bacterial families
are indicated as Ent, Enterobacteriaceae; Erw, Erwiniaceae and Yer, Yersiniaceae. (B) Average AT content of plasmids sequences in PLSDB that encode
H-NS homologs. (C) Pantoea plasmids encoding HppP1 and HppP2. In the absence of annotated origins of replication, plasmids are oriented according
to nucleotide numbering in GenBank records. (D) Gene content flanking hppP1 and hppP2. Genes feoB and alcohol dehydrogenase (white) are conserved
on all nine plasmids and were used as boundaries for comparing the genetic loci.

drogenase. Yet, the variable number of genes flanking and                                        tions with experimentally determined functions are almost
between hppP1 and hppP2 indicates that the pairing has per-                                      100% conserved in the H-NS clade. The two exceptions are
sisted despite a significant amount of genetic change (Fig-                                      the Lon protease target site Y21 in 5% of H-NS sequences
ure 2D). A closely related protein is found alone in a plasmid                                   (Figure 3C) and DNA binding hinge K83 in 5% of H-NS
in Erwinia (NZ LN907828.1) (Figure 1B).                                                          sequences (Figure 3C and D). A broader consideration of
                                                                                                 residue conservation across all H-NS homolog sequences
                                                                                                 shows that the strong conservation of key residues in H-NS
Amino acid divergence at functionally important positions
                                                                                                 is not shared across the whole protein family (Supplemen-
We hypothesized that H-NS clades will possess distinct                                           tary Figure S4).
amino acids at positions involved in DNA binding and                                                Amino acid substitutions at 10 sites in the N-terminal do-
protein-protein interactions to (i) perform distinct func-                                       main of H-NS significantly disrupt dimer and/or oligomer
tions and (ii) prevent deleterious interference with H-NS.                                       formation (5,15,31,60–64). Substitutions at five of these lo-
Hence, we predicted sequence divergence to be concen-                                            cations predominantly disrupt dimer formation, whereas
trated at locations where functional differentiation can                                         substitutions at the other five mutations disrupt formation
arise, while mutations will be resisted at positions required                                    of higher-order multimers (Figure 3A). The dimerization
for core functions and inter-clade interactions. H-NS can be                                     residues (R12, R15, L26, L30 and L33) are 100% conserved
functionally subdivided into four functional domains: the                                        in all members of H-NS clade (Figure 3C). These same five
N-terminal dimerization and dimer-dimer interaction do-                                          residues are highly conserved across the other clades, with
mains (Figure 3A), and the C-terminal DNA binding hinge                                          all HppH proteins being identical to H-NS. Conversely, in
and DNA binding domains (Figure 3B). The specific amino                                          StpA (position 33), HlpC (position 33) and HppX (posi-
acid positions in H-NS for which biological functions have                                       tion 30) clades, all proteins are divergent from H-NS. In the
been experimentally determined are indicated by numbered                                         dimer-dimer interaction domain, HppP is the most similar
arrows (Figure 3A & B).                                                                          to H-NS whereas the proteins in the other clades are highly
   Previous studies found that H-NS, StpA, and Sfh have                                          divergent from H-NS in at least two positions. Therefore,
similar DNA biding properties, and both StpA and Sfh can                                         HppP is predicted to form the most stable heteromers with
heterodimerize with H-NS (29). However, the biochemical                                          H-NS whereas all other homolog clades are predicted to
properties and interactions of the new homologs detected                                         have a reduced ability to oligomerize with H-NS. Positions
in the present study remain unexplored. Amino acid posi-                                         where H-NS homologs consistently differ from H-NS, such
Nucleic Acids Research, 2020, Vol. 48, No. 18 10191

     A                            H-NS MSEALKILNNIRTLRAQARECTLETLEEMLEKLEVVVNERR
                                                                                  dimerization
                                                                                                                                        B            DNA binding hinge                DNA binding
                                                                                                                                                 AAVKSGTKAKRAQRPAKYSYVDENGETKTW
                                  S. enterica
                                                1                  12    15             21         26       30   33                                 83         87 89 90    93
                                                                                        Lon

                                                                                       dimer-dimer interaction                                                        DNA binding
                                                EEESAAAAEVEERTRKLQQYREMLIADGIDPNELLNSL                                                           TGQGRTPAVIKKAMDEQGKSLDDFLIKQ
                                                                             57        61     64    68      71                              110                117                                  137

     C                                                       dimerization                               dimer-dimer interaction         D        DNA binding hinge                    DNA binding
                                        position: 12    15   21         26        30        33     57      61     64     68       71        83     87      89 90          93    110                       117
                                         H-NS R R C L L L K YY                                                   M D D                      K K KR
                                                                                                                                            R                           R       TGQGRTPA H-NS
      Percent amino acid conservation

                                                             L
                                         HppP R R   L L L K YM                                                   L E D                      A
                                                                                                                                            E
                                                                                                                                              K KR                      R       TGQGRTPS HppP

                                                                                                                                                                                                                Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
                                                        H    I                    R
                                                                                  H       F                      M D N                      S
                                                                                                                                            T
                                                                                                                                            L      R      I
                                                                                                                                                          R                                               A

                                         Hfp R  R T L L L K F                                                    K D              D         K   T
                                                                                                                                              K KR                      R       TGQGRTPK Hfp
            in H-NS homologs

                                                                                                                                            Q
                                                             L
                                                             I
                                                             V                            F        N
                                                                                                          L
                                                                                                          I      Q      E
                                                                                                                                            T
                                                                                                                                            P
                                                                                                                                            R
                                                                                                                                            S
                                                                                                                                            V
                                                                                                                                                   G
                                                                                                                                                   A
                                                                                                                                                   V       V                                              R

                                         HppH R R L L L L K L                                                    L D N            D         S K SR        F
                                                                                                                                                          A
                                                                                                                                                          I
                                                                                                                                                           V
                                                                                                                                                                        R       TGQGRTPK HppH
                                         StpA R R F L L F K L
                                                            WL                           L
                                                                                            V             F
                                                                                                                 L D S
                                                                                                                 R
                                                                                                                 M
                                                                                                                 Q
                                                                                                                                  N
                                                                                                                                  T
                                                                                                                                            A
                                                                                                                                            T
                                                                                                                                            P
                                                                                                                                            S   KR G
                                                                                                                                                   A
                                                                                                                                                   T
                                                                                                                                                   S
                                                                                                                                                   V                    R       TGQGRTPK StpA   M

                                         HlpC R R M L L F K L                                                    M E D                      P K TR
                                                                                                                                            L
                                                                                                                                            S
                                                                                                                                                                H
                                                                                                                                                                C       R       SGQGRMPK HlpC
                                                             I                                                   L                          A                   T         A                  A
                                         HppX R R   L Q L K L
                                                    T
                                                             T
                                                             V
                                                             K
                                                             M
                                                                                  H
                                                                                  A       F        R      I
                                                                                                           A     Y
                                                                                                                 F
                                                                                                                 M
                                                                                                                        E
                                                                                                                        L
                                                                                                                        A
                                                                                                                                  S
                                                                                                                                  D
                                                                                                                                  T
                                                                                                                                            V
                                                                                                                                            E
                                                                                                                                            I
                                                                                                                                            T
                                                                                                                                              R K  G
                                                                                                                                                           SS
                                                                                                                                                           T V
                                                                                                                                                                G
                                                                                                                                                                K
                                                                                                                                                                R
                                                                                                                                                                P       R T
                                                                                                                                                                          V
                                                                                                                                                                                SGRGR PK HppX
                                                                                                                                                                                T
                                                                                                                                                                                      N
                                                                                                                                                                                      I
                                                                                                                                                                                      K
                                                                                                                                                                                      V    K
                                                                                                                                                                                            R
                                                                                                                                                                                            K
                                                                                                                                                                                                T
                                                                                                                                                                                                M
                                                                                                                                                                                                QK        L

                                                             Lon

                                                                         100       80        60     40      20     0                                     100     80       60     40   20    0
                                                                         Percent identity to H-NS proteins                                               Percent identity to H-NS proteins

Figure 3. Motif analysis of amino acid conservation and divergence between clades of H-NS homologs. The S. enterica H-NS amino acid sequences of
the N-terminus (A) and C-terminus (B) are presented with functional domains highlighted. Amino acid positions where substitutions compromise specific
protein functions are numbered in grey; the studies that identified these amino acids positions are cited in Supplementary Table S7 along with additional
experimental evidence based on deletion mutations. (C, D) Frequency logos and heatmap colouring to compare the percent identity of amino acids to the
H-NS consensus sequence. The clades with small numbers of protein sequences (HppE, HppF and HppP) were not considered due to the low complexity
of sequence data.

as 61, 64, 68 and 71, may cause these homologs to pref-                                                                              StpA, Hfp and Sfh (HppH) bind to many of the same
erentially homo-oligomerize and even hetero-oligomerize                                                                           chromosomal loci as H-NS in E. coli and Salmonella
with similar homologs while exhibiting reduced affinity for                                                                       (3,27,40). The DNA binding domain (T110–A117) is 100%
protein-protein interactions with H-NS.                                                                                           conserved within the H-NS clade and largely conserved
   Environmental sensing by H-NS involves conformational                                                                          across the homolog clades (Figure 3D). Conservation of the
changes that reorient ionic interactions between the N-                                                                           key amino acids within the DNA binding domain suggests
terminal and central dimerization domains (65). These in-                                                                         that each homolog has the capacity to compete with H-NS
teractions are supported by K57 (65), which is one of the                                                                         for DNA binding sites, or can reinforce DNA binding ac-
most highly conserved residues across the H-NS family of                                                                          tivity when forming heteromers.
proteins (Figure 3C; Supplementary Figure S4B). Nega-                                                                                The Lon protease degrades StpA whereas H-NS is pro-
tively charged patches within the N-terminal domain and                                                                           tected by the cysteine at position C21 (68). Only members
positively charged linker region form ionic bonds that as-                                                                        of the H-NS clade have a cysteine at this position (Figure
semble H-NS dimers (16). These charge distributions are                                                                           3B). All other family members, including StpA, have a hy-
conserved across H-NS family proteins (Supplementary                                                                              drophobic residue (I, L, V, T or F). This finding suggests
Figure S4B).                                                                                                                      that proteins in each clade (HppP, Hfp, HppH, StpA, HlpC
   Recent studies identified a DNA binding hinge in H-NS                                                                          and HppX) may be susceptible to Lon proteolysis.
where positively charged residues K83, K87, K89, R90 and
R93 are critical for DNA binding (66,67) (Figure 3B). Re-
                                                                                                                                  Crosstalk between H-NS and its homologs
placing positively charged amino acids at these positions
reduces H-NS-DNA binding specificity and reduces H-NS                                                                             Genomes encoding H-NS and one or more H-NS homologs
repression of gene expression (66,67). All H-NS proteins                                                                          present ideal systems in which to study transcriptional regu-
possess positively charged amino acids (K or R) at these                                                                          latory network integration and crosstalk between regulons.
five positions in the hinge (Figure 3D); the only substitu-                                                                       H-NS is a master regulator that represses expression of its
tion within the clade is 5% of H-NS have R83. The H-                                                                              homologs stpA (26,69) and hfp (26), and H-NS regulates
NS homolog clades are strikingly divergent in this sub-                                                                           13% of the Salmonella Typhimurium genome compared to
domain, except at R93, where only HppX is divergent from                                                                          the 5% regulated by StpA (28,70). Negative autoregulation
H-NS.                                                                                                                             is a common theme in bacterial transcription networks as
10192 Nucleic Acids Research, 2020, Vol. 48, No. 18

a feedback circuit limits mRNA levels and transcription         Hierarchical control of virulence gene expression
factor concentration (71,72). Negative autoregulation also
                                                                H-NS homologs can be gained and lost over short evo-
promotes network stability and reduces cell-to-cell varia-
                                                                lutionary times in the Enterobacterales, prompting us to
tion of abundant transcription factors like H-NS (73). The
                                                                examine the degrees to which H-NS homologs compete,
observed amino acid similarities and predicted functional
                                                                complement, or compensate for regulatory functions. H-
properties of H-NS homologs suggests that they may neg-
                                                                NS is a strong repressor of the genes that encode the type
atively repress each other’s promoters as a facet of their
                                                                three secretion system and effector proteins in Salmonella
autoregulation. Several studies have measured the relative
                                                                pathogenicity island 1 (SPI-1) that Salmonella uses to initi-
abundance of H-NS and StpA (74,75) but quantification of
                                                                ate invasive infection in a host intestine. HilD is the master
additional homologs and an assessment of H-NS and ho-
                                                                activator of SPI-1 gene expression, which is repressed by H-
molog pools when genes are deleted from the system was
                                                                NS binding to the promoter of hilD (50,78–80). The hilD
lacking.
                                                                promoter is also bound by Sfh (3), though whether Sfh re-
   We selected S. Enteritidis strain EN1660 for proteomic
                                                                presses hilD expression has yet to be explored.

                                                                                                                                  Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
and transcriptomic studies because its chromosome en-
                                                                   To test for cooperative or competitive regulation of hilD
codes hns, stpA and hfp (48). Mass spectrometry was used to
                                                                expression by combinations of H-NS and H-NS homologs,
quantify absolute amounts of proteins per 1 ␮g of injected
                                                                hilD expression was quantified in both exponential growth
protein mass, allowing us to compare the relative abundance
                                                                and stationary phases in wildtype cells and deletion mu-
of each H-NS homolog. In wildtype cells, H-NS (47 fmol)
                                                                tants. In wildtype S. Enteritidis, hilD expression was low in
was the most abundant, followed by StpA (17 fmol) and
                                                                exponential growth, consistent with the well-characterized
Hfp (2 fmol) (Figure 4A), a ratio of approximately 30:10:1.
                                                                repression of SPI-1 expression in S. Typhimurium (81,82).
This ratio of H-NS to StpA is consistent with western blot
                                                                Deletion of hns resulted in a 30-fold increase in expression in
quantification by previous studies (74–76). We constructed
                                                                this growth phase, whereas deletion of either stpA or hfp had
a hns mutant and used whole genome sequencing to con-
                                                                no detectable effect on hilD expression, indicating that H-
firm the precise deletion of hns and the absence of com-
                                                                NS is the dominant repressor of hilD in exponential growth
pensatory mutations, at least at the single nucleotide poly-
                                                                (Figure 5A).
morphism level. Consistent with the expected de-repression
                                                                   Sfh has been previously shown to occupy the hilD pro-
of stpA and hfp expression in the hns mutant, StpA (27
                                                                moter (3), but production of Sfh by introducing pSf-R27 did
fmol) and Hfp (13 fmol) protein levels increased (Figure
                                                                not restore repression (Figure 5B). Instead, Sfh in S. Enter-
4A). Deleting stpA resulted in significantly elevated levels
                                                                itidis wildtype and mutant cells caused a modest three- to
of H-NS, revealing that StpA is a direct or indirect repres-
                                                                four-fold increase in hilD expression in all genotypes (wild-
sor of H-NS. Deletion of hfp had no detectable effect on
                                                                type, hns, stpA, hfp) in exponential growth phase (Fig-
H-NS or StpA protein levels (Figure 4A).
                                                                ure 5B); this pattern and degree of transcriptional response
   Next we tested whether addition of an exogenous H-NS
                                                                is similar to that noted in Figure 4 above when pSf-R27 was
homolog, Sfh (HppH), would decrease the levels of chro-
                                                                introduced to cells for analysis of hns, stpA and hfp expres-
mosomally encoded H-NS proteins to maintain a constant
                                                                sion. Plasmid pSf-R27 also encodes the H-NS N-terminal
total concentration of H-NS family proteins. In nature, Sfh
                                                                homolog Hha, which has been shown to interact with H-
is encoded on the IncHI plasmids R27 in Salmonella en-
                                                                NS and influence virulence gene expression in Salmonella
terica serovar Typhi and pSf-R27 of Shigella flexneri 2a
                                                                (37,83). To determine if Sfh demonstrates a different reg-
strain 2457T (25,77), where it provides a stealth function by
                                                                ulatory outcome when expressed alone, we introduced a
complementing multiple hns mutant phenotypes in E. coli
                                                                clone of sfh under control of its native promoter in plas-
(37,39). Introducing pSf-R27 to cells resulted in high Sfh
                                                                mid pPD101sfh (39) into S. Enteritidis EN1660 wildtype
levels (36 fmol) approaching concentrations close to H-NS,
                                                                and hns, stpA and hfp mutant strains. Acquisition of
but this had no detectable effect on the levels of H-NS, StpA
                                                                Sfh on this alternate plasmid vector resulted in near com-
or Hfp in wildtype cells (Figure 4A).
                                                                plete complementation of the loss of H-NS, as observed in
   Transcript abundance was quantified to identify the tran-
                                                                the return of hilD expression to near wild type (Figure 5C).
scriptional regulatory connections between H-NS and the
                                                                Thus, in this experimental situation, Sfh is a strong molec-
multiple H-NS homologs. In exponential phase growth, hns
                                                                ular replacement for H-NS, but in its native context, Sfh
expression was not affected by the absence of stpA or hfp,
                                                                activity at the hilD promoter is somehow tempered by the
while addition of sfh on pSf-R27 resulted in a 3.8-fold up-
                                                                presence of the many genetic elements on pSf-R27.
regulation of hns expression (Figure 4B). stpA expression
                                                                   The same experiments were conducted in early stationary
increased 4.5-fold and hfp expression increased 16-fold in
                                                                phase cells, when hilD is expected to be naturally induced
the hns mutant compared to wildtype. Deletion of hfp had
                                                                in laboratory-cultured Salmonella (81,82). Consistent with
no detectable effect on hns or stpA transcription. The pres-
                                                                expectations, hilD expression in wildtype cells was elevated
ence of sfh on pSf-R27 caused a 4-fold increase in both
                                                                >20-fold relative to exponential growth (Figure 5D). How-
stpA and hfp expression. Altogether, H-NS demonstrated
                                                                ever, we were particularly surprised to observe that hilD
the expected role of repressing stpA and hfp expression. In-
                                                                was not expressed in hns cells in stationary phase (Fig-
creased H-NS protein levels observed in the stpA mutant
                                                                ure 5D). This highly unexpected result suggests that H-NS
were not reflected in a detectable change in transcription of
                                                                functions directly or indirectly as a transcriptional activator
hns, hinting at a post-transcriptional regulatory mechanism
                                                                of hilD in specific environments and genotypes. H-NS is an
that increased translation of hns transcripts to supplement
                                                                archetypal repressor of transcription, yet indirect transcrip-
the pool of H-NS-like proteins in the absence of StpA.
Nucleic Acids Research, 2020, Vol. 48, No. 18 10193

                        A                                                               H-NS                                        StpA                                                Hfp                                         Sfh
                                                                     100
                                                                                            *

                               Protein abundance (fmol)
                                                                      80

                                                                      60

                                                                      40

                                                                      20
                                                                                                                                                                                  **
                                                                                      NP                                                NP                                                     NP                  NP NP NP NP
                                                                       0
                                                                           Wildtype
                                                                                      hns
                                                                                            stpA
                                                                                                   hfp
                                                                                                         WT + pSfR27

                                                                                                                       Wildtype
                                                                                                                                  hns
                                                                                                                                        stpA
                                                                                                                                               hfp
                                                                                                                                                     WT + pSfR27

                                                                                                                                                                       Wildtype
                                                                                                                                                                                  hns
                                                                                                                                                                                        stpA
                                                                                                                                                                                               hfp
                                                                                                                                                                                                     WT + pSfR27

                                                                                                                                                                                                                   Wildtype
                                                                                                                                                                                                                              hns
                                                                                                                                                                                                                                    stpA
                                                                                                                                                                                                                                           hfp
                                                                                                                                                                                                                                                 WT + pSfR27

                                                                                                                                                                                                                                                               Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
                             Strains

                        B                                                                   hns                                     stpA                                                hfp
                                                                      25
                                                                                                                                                                                  ***
                                            (Relative to wildtype)

                                                                      20
                        mRNA levels

                                                                      15

                                                                      10

                                                                       5
                                                                                                         **                       **                 **
                                                                                      NP                                                NP                                                     NP
                                                                       0
                                                                           Wildtype
                                                                                      hns
                                                                                            stpA
                                                                                                   hfp
                                                                                                         WT + pSfR27

                                                                                                                       Wildtype
                                                                                                                                  hns
                                                                                                                                        stpA
                                                                                                                                               hfp
                                                                                                                                                     WT + pSfR27

                                                                                                                                                                       Wildtype
                                                                                                                                                                                  hns
                                                                                                                                                                                        stpA
                                                                                                                                                                                               hfp
                                                                                                                                                                                                     WT + pSfR27
                             Strains

Figure 4. Protein abundance and expression of H-NS family proteins in S. Enteritidis EN1660. Cellular protein concentration was quantified by mass
spectrometry (A) and relative mRNA level was quantified by quantitative PCR (B) for H-NS (green), StpA (red) and Hfp (blue) in wildtype strain EN1660
and hns, stpA, hfp mutants during exponential phase. The cellular protein concentration of Sfh was also quantified in wildtype strain EN1660 along
with quantification of the impacts of pSf-R27 on hns, stpA and hfp gene expression. The mean and standard deviation of three to four biological replicates
is shown. NP: not present (and not detected). Statistical differences were calculated by one-way ANOVA using Dunnett’s multiple comparison test of
mutants versus wild type. P values
10194 Nucleic Acids Research, 2020, Vol. 48, No. 18

    Exponential phase                                                         examination of homolog functions and regulons, expanding
                       200                                                    beyond the pervasive characterization of ‘paralogs’ that are
                             A         B               C                      accessory modifiers or mere molecular ‘backups’ to H-NS.
                                                                                 We hypothesize that the H-NS homolog clades demon-
                       150                                                    strate evolutionary trajectories that reflect the degree to
     hilD expression

                                                                              which the homologs integrate into host genomic functions.
                                                                              The apparent conservation of StpA in all extant Enterobac-
                       100                 ****                               teriaceae and HlpP in all extant Pantoea suggests that these
                                                                              two proteins acquired essential functions, ensuring their
                        50       ***                                          preservation over evolutionary time. In contrast, the mem-
                                                                              bers of most H-NS homolog clades are mobile, resulting in
                                                                              sporadic distributions that can be limited to specific fami-
                         0                                                    lies (HppP, HppE, HppF, HppH, HppR, HlpC) or distribu-
                              WT
                              hns
                             stpA
                              hfp

                                        WT
                                        hns
                                       stpA
                                        hfp

                                                        WT
                                                        hns
                                                       stpA
                                                        hfp

                                                                                                                                                Downloaded from https://academic.oup.com/nar/article/48/18/10184/5902437 by guest on 11 October 2020
                                                                              tions across multiple bacterial families (HppX, Hfp). Bacte-
                                                                              rial host ranges may be constrained by the identifiable plas-
                                           + pSf-R27   + pPD101sfh            mid replicons (incF, incH, and incX). Similarly, the plas-
                                                                              mids without recognizable incompatibility groups may be
    Early stationary phase                                                    restricted to specific bacterial species by their replication
                       200                                                    mechanisms, explaining the phylogenetically constrained
                             D         E               F
                                                                              distributions of HppP, HppE, HppR and HlpC (Figure 1B).
                       150                                                       The diversity and evolutionary histories of plasmid-
                                                           **
     hilD expression

                                                                              borne H-NS homologs have not been previously addressed.
                                                                              Our study reveals a rich diversity and broad distribution.
                       100                                                    The ancient evolutionary association with specific plasmid
                                                                              incompatibility groups may reflect integration with or even
                                                                              control of replicon function. In this model, H-NS homologs
                        50
                                 **         **                                may either directly regulate plasmid replication, and/or
                                                                              may contribute structural modification of plasmid DNA
                         0                                                    topology that is required for the initiation, progression, or
                              WT
                              hns
                             stpA
                              hfp

                                        WT
                                        hns
                                       stpA
                                        hfp

                                                        WT
                                                        hns
                                                       stpA
                                                        hfp

                                                                              resolution of plasmid replication. These hypothetical func-
                                                                              tions differ from the predominant model of stealth function.
                                       + pSf-R27       + pPD101sfh
                                                                              First, a stealth function would benefit any foreign DNA si-
                                                                              lenced by H-NS binding; this would predict H-NS homolog
                                                                              distributions to be more sporadic, not allied to specific plas-
Figure 5. Regulation of hilD expression by H-NS family proteins. hilD         mid replicon types as observed in our study. Second, the
mRNA levels were quantified in wildtype strain EN1660 and hns, stpA,        plasmids that encode H-NS homologs in Erwiniaceae and
hfp mutants in exponential phase (A) and stationary phase (D) by quanti-     Yersiniaceae display average AT content far lower than in
tative PCR. The influences of plasmid pSf-R27 encoding Sfh (HppH) in its
natural context or plasmid pPDsfh encoding Sfh alone on hilD expression
                                                                              Enterobacteriaceae (Figure 2B), suggesting that AT content
were also assessed in wildtype EN1660 and each mutant derivative in expo-     is a reflection of plasmid types in these families, not the co-
nential phase (B, C) and stationary phase (E, F). All values are expressed    evolution of AT content and H-NS homolog stealth func-
relative to wild type hilD expression in exponential phase. Data from three   tions.
to six biological replicates is shown with the mean represented by a hori-       The chromosomal homologs Hfp and HlpC also demon-
zontal bar. Statistical differences were calculated by one-way ANOVA of
each growth phase and sfh background grouping using Dunnett’s multiple        strate a propensity for horizontal transfer and loss over
comparison test of mutants versus wild type. P values
You can also read