The Role of Time, Weather and Google Trends in Understanding and Predicting Web Survey Response

Page created by Louise Crawford
 
CONTINUE READING
Survey Research Methods (2021)                                                                                                        © 2021Author(s)
Vol. 15, No. 1, pp. 1-25
doi:10.18148/srm/2021.v15i1.7633
European Survey Research Association                                                                                                   CC BY-NC 4.0

    The Role of Time, Weather and Google Trends in Understanding and
                    Predicting Web Survey Response
                                       Qixiang Fang                                                     Joep Burger
                                    Utrecht University                                              Statistics Netherlands

                                    Ralph Meijers                                                    Kees van Berkel
                               Statistics Netherlands                                              Statistics Netherlands

                              In the literature about web survey methodology, significant efforts have been made to under-
                              stand the role of time-invariant factors (e.g. gender, education and marital status) in (non-
                              )response mechanisms. Time-invariant factors alone, however, cannot account for most varia-
                              tions in (non-)responses, especially fluctuations of response rates over time. This observation
                              inspires us to investigate the counterpart of time-invariant factors, namely time-varying factors
                              and the potential role they play in web survey (non-)response. Specifically, we study the ef-
                              fects of time, weather and societal trends (derived from Google Trends data) on the daily (non-
                              )response patterns of the 2016 and 2017 Dutch Health Surveys. Using discrete-time survival
                              analysis, we find, among others, that weekends, holidays, pleasant weather, disease outbreaks
                              and terrorism salience are associated with fewer responses. Furthermore, we show that using
                              these variables alone achieves satisfactory prediction accuracy of both daily and cumulative
                              response rates when the trained model is applied to future unseen data. This approach has
                              the further benefit of requiring only non-personal contextual information and thus involving no
                              privacy issues. We discuss the implications of the study for survey research and data collection.

                              Keywords: online survey; response rates; weather; Google Trends; survival analysis

                                1      Introduction                               less biased. Nevertheless, low response rates of the web
                                                                                  mode in mixed-design surveys are still undesirable because
   Web surveys have become increasingly popular over the                          of the associated increase in total expenses and planning ef-
past decades. This growing popularity of web surveys, in                          fort. Therefore, in many ways, low response rates are a chal-
comparison with their traditional counterparts (e.g. tele-                        lenge for web surveys, necessitating research to understand
phone and face-to-face interviews), can be attributed to                          the likely mechanism underlying unit (non-)response deci-
unique advantages of web surveys such as shorter transmis-                        sions in web surveys (Fang & Wen, 2012).
sion time, lower costs and more flexible designs (Bethle-                             The past decades has seen a plethora of research on influ-
hem, 2009). However, web surveys also suffer from vari-                           encing factors of web survey (non-)response. Most of these
ous issues, the most prominent of which are lower response                        factors belong to the following three categories: respondent-
rates, which leads to compromised quality of the resulting                        related (e.g. age, income, education), region-related (e.g. de-
survey data (i.e. sometimes more bias and always less pre-                        gree of urbanisation, population density) and design-related
cision) (e.g. Fan & Yan, 2010; Fowler, Cosenza, Cripps,                           (e.g. survey length, contact mode). For a quick (non-
Edgman-Levitan, & Cleary, 2019; Manfreda, Berzelak, Ve-                           exhaustive) overview of existing findings in the web survey
hovar, Bosnjak, & Haas, 2008; Taylor & Scott, 2019). A                            literature, see Table A1 in Appendix A. These discoveries
potential solution are mixed-mode designs, which follow up                        have greatly improved our understanding of the underlying
on web surveys with other modes like telephone and face-                          mechanisms of response decisions in web surveys. However,
to-face interviews (Wolf, Joye, Smith, & Fu, 2016). In this                       they do not fully account for the variation in observed re-
way, the total response rate becomes higher, the collected                        sponse behaviour. For instance, Erdman and Bates (2017)
responses more representative and the resulting estimates                         used the best twenty-five out of over three hundreds of such
                                                                                  predictors in their study to predict block-level response rates
                                                                                  and yet the resulting model only explains about 56% of the
   Contact information: Qixiang Fang, Department of Methodol-                     variation in the response rates. It remains, therefore, neces-
ogy and Statistics, Faculty of Social and Behavioural Sciences,                   sary to investigate additional influencing factors.
Sjoerd Groenmangebouw, Padualaan 14, 3584 CH Utrecht, The                             One prominent characteristic of the three categories of
Netherlands (E-Mail: q.fang@uu.nl)                                                factors described above is that they are time-invariant, mean-

                                                                              1
2                                   QIXIANG FANG, JOEP BURGER, RALPH MEIJERS AND KEES VAN BERKEL

ing that they tend to stay constant during a survey’s data col-     ues that change by design. For survey research, which has
lection period. Even when they do vary in values, the change        a typical fieldwork period ranging from a few days to sev-
is unlikely substantial. The time-invariance of these well-         eral months, it is likely that some underlying factors of (non-
studied factors (partially) explains why the use of them alone      )response behaviour vary substantially in their values during
is insufficient to substantially explain variations in (non-        the fieldwork period. Examples are: personal availability, in-
)response patterns, simply because they cannot properly ac-         dividual emotional status, day of a week, weather, holidays,
count for temporal fluctuations of survey response rates over       and societal sentiments about a topic. Of course, respondent-
time like different days of the week (e.g. Faught, Whitten,         and region-related factors can also vary (e.g. gender, age,
& Green, 2004; Sauermann & Roach, 2013), months of the              household composition, geographical features of the city of
year (e.g. Losch et al., 2002; Svensson, Svensson, Hansen,          residence), but normally to a much lesser extent (especially
& Lagerros, 2012) and different years (Sheehan, 2006). This         during a typical survey project period) and unlikely apply to
leads us to look into the counterpart of time-invariant factors,    the majority of the sample units. Therefore, we argue that
namely time-varying factors such as day of a week, weather,         time-varying factors can help to understand and predict sur-
presence of public holidays, disease outbreaks and salience         vey responses, especially with regards to fluctuations in re-
of societal issues like privacy concerns, to name a few.            sponse rates over time which time-invariant factors by defi-
    The lack of research on the roles of such time-varying fac-     nition cannot properly account for.
tors not only limits our understanding of the underlying pro-          In this paper, we focus on time-varying factors that are
cess of response decisions in web surveys, but also hinders         also contextual factors. We define contextual factors as vari-
survey design effort aiming at increasing response rates. For       ables that usually cannot be influenced by study participants
instance, people may be unlikely to respond to surveys in           because they are determined by stochastic processes largely
holiday periods because they are not at home and/or do not          or totally external to them. Typical contextual factors as-
want to spend time on surveys. Therefore, survey response           sess potentially changing characteristics of the physical or
rates might be higher or lower depending on time-related fac-       social environment in which study participants live. Some
tors. This is also relevant to surveys that accept responses        of the aforementioned examples of time-varying factors like
over a longer period of time. In such surveys, daily response       weather, time and societal sentiments fit this definition. The
rates usually peak on the first day(s) and quickly subdue (Mi-      term, like others, is defined and used differently across dis-
nato, 2015). This means that the number of responses during         ciplines. In survival analysis research, it is often called “an-
the first day(s) significantly influence the final response rates   cillary factors” (Kalbfleisch & Prentice, 2002). In this paper,
of the surveys, suggesting that effects of time-varying factors     we prefer to use the more intuitive term “contextual factors”.
(if there is any) during the first few days might be crucial for       The focus on only contextual factors has two practical
a proper understanding of the final response rates.                 purposes. The data for contextual factors are usually non-
    Therefore, in this paper, we investigate whether and how        personal, meaning that they do not involve privacy issues.
various time-varying factors (including time, weather and so-       This is important because recent increases in general pri-
cietal trends) influence the daily (non-)response patterns of       vacy concerns among the public and the introduction of more
the web mode of the 2016 and 2017 Dutch Health Surveys,             stringent privacy regulations such as the General Data Pro-
and whether the use of these factors are can be useful in pre-      tection Regulation in the EU have led to greater difficulties
dicting survey response rates.                                      in obtaining, accessing and using personal data. Our attempt
    We structure the remainder of the article as follows. We        to model web survey response with only non-personal pre-
begin with a detailed account of time-varying (contextual)          dictors, if proven successful, can be a useful alternative for
factors and our research proposal, followed by the research         the survey research community. Furthermore, because con-
aims. Then, we describe the data, research methods and re-          textual factors are (largely) free from the influence of study
sults, respectively. We conclude with a discussion on the           subjects, there is no or little concern for reverse causality and
theoretical and practical implications of the findings and rec-     hence more internal validity for the study.
ommendations for future research.                                      To sum up, in this paper, we focus on predicting and un-
                                                                    derstanding web survey responses from time-varying contex-
           2   Time-Varying Contextual Factors                      tual factors.

2.1   Definitions                                                   2.2   Literature Review
   In contrast to time-invariant factors, time-varying factors         A significant amount of research in fields like cross-
are variables whose values may differ over time (J. D. Singer       cultural psychology, epidemiology and family sociology has
& Willett, 2003). They record an observation’s potentially          investigated the effects of time-varying contextual factors on
differing status on each associated measurement occasion.           individual outcomes. In contrast, much fewer studies in the
Some have values that change naturally; others have val-            field of survey research have attempted so and even fewer
UNDERSTANDING AND PREDICTING WEB SURVEY RESPONSE                                            3

focus on web surveys. Therefore, for a more thorough un-           sions in surveys. However, some contradictory findings sug-
derstanding of the likely effects of time-varying contextual       gest a need for further research, for instance, on the influence
factors, we conduct a literature review on this topic including    of day of a week. In addition, it is also likely that the results
not only web survey studies but also non-web survey ones.          of the existing studies are confined to local applications and
   The studies on time-varying contextual factors that we          that findings based on non-web surveys may not apply to web
identified can be categorised into two types. The first type fo-   surveys. New time-varying contextual factors should also be
cuses on the effect of time, such as year, season, month, days     researched. We propose the following time-varying contex-
of a week. For instance, Sheehan (2006) analysed 31 web            tual factors for our study.
surveys and concluded that the year in which a survey was              Time. The first category of time-varying contextual fac-
published was the most important predictor of response rates.      tors is related to time, including day of a week and public
Losch et al. (2002) found that completing a survey interview       holidays. The former is chosen because according to the
during summer in Iowa (US) required more contact attempts          literature, its effect on survey response rates is still unclear,
than in other seasons. Similarly, Göritz (2014) documented         seemingly varying across surveys and thereby needing fur-
for a German online panel that the panel members were more         ther research. The latter factor is chosen because we hypoth-
likely to start and finish studies in winter than during any       esise that during holidays, people may travel around, spend
other season. Contrasting these two findings, Svensson et al.      more time with family or want to rest and are consequently
(2012) in a Swedish longitudinal online survey found that the      less likely to participate in survey research.
highest response rate was in September. Faught et al. (2004)           Weather. The effect of weather on survey participation
noted in their experimental study on US manufacturers that         also requires more research. In this study, we include differ-
survey response rates were the highest when the email in-          ent types of daily weather measures (e.g. maximum, min-
vitation was sent on either Wednesday morning or Tuesday           imum and average) of temperature, sunshine, precipitation,
Afternoon. Contrary to this, Sauermann and Roach (2013)            wind, cloud, visibility, humidity and air pressure.
in their experiments, which was conducted among US re-                 Societal Trends. In addition to time and weather, fac-
searchers, did not find the timing of the e-mail invitation (in    tors which relate to real-time societal trends such as dis-
terms of the day of a week) to result in significantly different   ease outbreaks, privacy concerns, public outdoor engage-
response rates in a web survey. However, they did find that        ment (e.g. in festivals and on the road), terrorism salience
people were less likely to respond in the weekend and would        may also play an influencing role in affecting survey partic-
postpone the response until the next week.                         ipation decisions. These we term “societal trends” in this
   The second type of studies concerns the influence of            paper. We explain each of these societal trends factors next.
weather on survey participation. Potoski, Urbatsch, and                Disease Outbreaks. It is common knowledge that be-
Yu (2015) analysed eight surveys from 2001 to 2007 and             ing sick (physically or psychologically) can alter individual
showed that on unusually cold and warm days, wealthier             behaviour. For instance, individuals who are sick may stay
people are more likely to participate in surveys than the less     at home more and reduce outdoor or professional activities.
wealthy. Cunningham (1979) found more pleasant weather             They may lack the cognitive resources to engage in cogni-
(e.g. more sunshine, higher temperature, lower humidity)           tively demanding activities (like survey participation). They
to significantly improve a person’s willingness to assist an       may develop more negative emotions, which in turn may
interviewer.                                                       influence their pro-social behaviour. In particular, medical
   The effect of weather on survey participation, however,         conditions such as the common cold, the flu, hay fever and
likely goes beyond these two findings. Simonsohn (2010)            depression are likely to affect a large number of individuals,
showed that on cloudier days people are more likely to en-         especially during certain times of a year (e.g. cold, flu and
gage in academic activities, which share some common char-         depression in the winter; hay fever in the spring). Therefore,
acteristics with survey participation (e.g. high cognitive load    we hypothesise that disease outbreaks can have physical, be-
and low immediate returns). Therefore, it is likely that peo-      havioural and psychological consequences that in turn im-
ple on cloudy days may become more inclined to participate         pact survey participation to a varying degree depending on
in surveys. Keller et al. (2005) showed that higher air pres-      the type, severity and prevalence of the illness. We did not
sure has a positive influence on mood. In turn, positive mood      find any previous research on this topic, leading us to believe
states may lead to increased helping behaviour (e.g. fulfill-      that our hypothesis is novel and worth studying.
ing a survey request) (Weyant, 1978). Therefore, air pressure          Privacy Concerns. Research has shown that privacy
may also impact survey response decisions.                         concerns can deter individuals from survey participation. For
                                                                   instance, two studies on the 1990 and 2000 U.S. census find
2.3   Research Proposal                                            that an increase in concern about privacy and confidential-
                                                                   ity issues is consistently associated with a decrease in the
   The studies above do confirm, to some degree, the in-           probability of census participation, especially among cer-
fluence of time-varying contextual factors on response deci-       tain ethnic groups (e.g. E. Singer, Mathiowetz, & Couper,
4                                   QIXIANG FANG, JOEP BURGER, RALPH MEIJERS AND KEES VAN BERKEL

1993; E. Singer, Van Hoewyk, & Neugebauer, 2003). Using              use, travel behaviour and interpersonal relationships suggest
paradata, two other studies report that greater privacy con-         that survey participation (and consequently, survey response
cerns are linked to higher unit or item non-response (Bates,         rates) can also be indirectly affected. Given all the potential
Dahlhamer, & Singer, 2008; Dahlhamer, Simile, & Taylor,              consequences of higher terrorism salience (which likely take
2008). Given these findings, we hypothesise that the level           greater priorities in one’s life than survey participation), we
of general societal concerns about data privacy issues likely        tentatively hypothesise a negative effect of terrorism salience
predicts survey response rates. Specifically, the higher the         on survey response rates. Furthermore, note that the Nether-
levels of the privacy concerns, the lower the response rates.        lands (where the current study is based) and its nearby coun-
   Public Outdoor Engagement. Considering the nature of              tries (such as Germany and France) have suffered from ter-
some types of surveys (e.g. mailed surveys, digital surveys          rorist threats, attacks or related events and issued terrorism
that require a desktop and are not smartphone-friendly), it          warnings during the past years (e.g., see United States De-
is conceivable that people are unlikely (if not impossible) to       partment of State, 2017, 2018). This fact makes our inclusion
participate in surveys when they are engaged in outdoor ac-          of terrorism salience as a potential factor of survey response
tivities (e.g. public events, holidays, travel), even if they have   all the more relevant.
received the survey requests. This is evidenced by the previ-            Summary. Some of the existing studies investigated ef-
ous finding that completing a survey interview during sum-           fects of time-varying factors on a monthly or yearly scale.
mer required more contact attempts than in other seasons be-         These findings are certainly interesting and informative; nev-
cause people are more likely travelling (Losch et al., 2002).        ertheless, studying the effects of time-varying factors on a
Therefore, we hypothesise that the level of outdoor engage-          finer scale (e.g. weekly, daily) might provide survey re-
ments in during certain time periods may also affect survey          searchers with even more helpful insights. In this project,
response rates during that time. This can be especially true         we focus on the effects of daily time-varying factors on daily
for the surveys that we study in this research, where survey         survey response. We do not consider the effects of months
invitations are mailed to the sample units’ home addresses           or seasons here, partly because we expect the daily time-
and the surveys are not suitable for smartphones (i.e. they          varying variables we use to be able to capture any monthly
require the use of laptops or tablets).                              and seasonal trends and partly because there is not sufficient
                                                                     variation in our data to allow for reliable estimation of the
   Terrorism Salience. A large body of literature has
                                                                     relevant month and season effects that would generalise well
shown that terrorist events have impactful individual and
                                                                     to future unseen data.
societal consequences. For instance, higher levels of ter-
                                                                         To sum up, we propose the following daily time-varying
rorism salience or fears are linked to more negative emo-
                                                                     factors for investigation in the current study: day of a week,
tions for non-religious people (Fischer, Greitemeyer, Kas-
                                                                     public holidays, weather (i.e. temperature, sunshine, pre-
tenmüller, Jonas, & Frey, 2006), worse mental health (Fis-
                                                                     cipitation, wind, cloud, visibility, humidity and air pressure)
cher & Ai, 2008), more media consumption (Boyle et al.,
                                                                     and societal trends (i.e. disease outbreaks, data privacy con-
2004; Lachlan, Spence, & Seeger, 2009), increased con-
                                                                     cerns, public outdoor engagement and terrorism salience).
tact with family and friends (Goodwin, Willson, & Stanley,
2005), (irrational) travel behaviour (e.g. Baumert, de Obesso,
& Valbuena, 2019; Gigerenzer, 2006), temporarily lower so-                                3   Research Aims
cial trust (Geys & Qari, 2017) but more institutional trust
(Dinesen & Jäger, 2013; S. J. Sinclair & LoCicero, 2010),               The current study is mainly concerned with whether and
less occupational networking activities (Kastenmüller et al.,        how daily time-varying contextual factors such as day of a
2011), cancellation of sport events and a higher number of           week, holidays, weather and societal trends influence web
no shows in sport events (Frevel & Schreyer, 2020). Further-         survey response behaviour on a daily basis. To approach this
more, mortality salience (which can be induced by reports            question, we use discrete-time survival analysis to model the
of deaths in terrorist events) has been shown to increase pro-       effects of predictors on the daily conditional odds of a per-
social attitudes and behaviour (Jonas, Schimel, Greenberg,           son responding to a web survey (given that he/she has not
& Pyszczynski, 2002). These studies were conducted across            responded yet). In this way, we obtain insight into how a spe-
various countries, related to different terrorist events and with    cific factor influences the daily response decision of a person.
study participants either directly or indirectly impacted by            Furthermore, believing that a model (or a predictor) is
terrorist events. These consistent findings lead us to reason        generally more useful when it not only explains the current
that terrorism salience or fears can have a substantial impact       data but also generalises to future observations, we evaluate
on most members of a society, including potential survey             the trained models and the related predictors with regards
respondents. In the specific context of survey research, the         to their predictive performances on an independent data set.
findings that terrorism salience or fears can induce emotional       This approach helps to answer, for instance, whether temper-
and behavioural changes in, for example, health status, media        ature is a better predictor than day of a week.
UNDERSTANDING AND PREDICTING WEB SURVEY RESPONSE                                             5

                                                                                                          Table 1
                                  Survey Phase   ●     Invitation           Reminder 1      Reminder 2    Comparison of the 2016 and 2017 Dutch Health Survey
                                                                                                          (Web Mode)
                           0.4
                                                                                                                                      2016                       2017
                                                                                                           Expected Delivery          Fri. (Jan.-Jun.)
                                                                                                                                                                 Thur.
                                                                                                           Day of Invitation          Sat. (Jul.-Dec.)
Cumulative Response Rate

                           0.3                                                                             Expected Delivery          Fri. (Jan.-Jun.)
                                                                                                                                                                  Sat.
                                                                                                           Day of Reminder            Sat. (Jul.-Dec.)
                                                                                                           Sample Size                15007                      16972
                           0.2                                                                             Response Rate              34.8%                      34.2%

                                                                 ●
                                                                ●
                                                               ●                        ●
                                         ●●                                            ●
                                        ●
                                       ●
                                                              ●                       ●                  of the 2016 Dutch Health Survey. The starting point of a
                           0.1
                                                                                     ●
                                       ●                     ●
                                                                                     ●
                                                                                                         new curve indicates receiving a corresponding invitation let-
                                   ●                      ●
                                                                                 ●                       ter. Each web-mode DCP ends at the end of the curve. We
                                   ●                     ●                       ●                       can see that response rates grow the fastest in the first few
                                                                                                         days after an invitation or reminder letter and flatten quickly
                                              Feb 01                   Mar 01               Apr 01
                                                                     Date                                later, suggesting that the first few days of data collection are
Figure 1. Cumulative Response Rates of Three Web DCPs                                                    crucial for ensuring a high response rate.
in the 2016 Dutch Health Survey                                                                              The Dutch Health Survey has a relatively consistent sur-
                                                                                                         vey design over the years (up to and including 2017), thereby
                                                                                                         making comparison and integration of data from different
                                                                                                         years valid and simple. Table 1 summarises information
                                                         4       Data
                                                                                                         about expected delivery days of the letters, sample sizes and
4.1                              The Dutch Health Surveys                                                response rates of the 2016 and 2017 surveys (web-mode).
                                                                                                         Note that in the first half of 2016, the letters were scheduled
   In this study, we analyse the response decision of individ-                                           to arrive on Friday, while in the latter half on Saturday. In
uals who were invited to participate in either the 2016 or the                                           contrast, the invitation and reminder letters in 2017 were ex-
2017 Dutch Health Surveys. The Dutch Health Survey is a                                                  pected to arrive on Thursday and Saturday, respectively. This
yearly survey administered by Statistics Netherlands. It aims                                            variation in the expected arrival days of the letters (i.e. varia-
to provide an overview of the developments in health, medi-                                              tion in the designs of the surveys) can increase the robustness
cal contacts, lifestyle and preventive behaviour of the Dutch                                            of the study results, especially with regards to the effects of
population. The sampling frame comprises of persons of all                                               the day of a week predictor.
ages residing in private households. It utilises a mixed-mode                                                While we acknowledge that there are different categories
design, consisting of an initial web mode and follow-up tele-                                            of non-response behaviour such as non-contact and refusal
phone or face-to-face interviews in case of non-response in                                              (e.g. Lynn and Clarke, 2002) and that there is research value
the web mode. Only the design and the data of the web mode                                               in differentiating the sub-types of non-response behaviour,
are relevant to this study.                                                                              we treat all non-response sub-categories as one single “non-
   A yearly sample is divided into 12 cohorts, each corre-                                               response” category in this study. There are two reasons.
sponding to a data collection period (DCP). Each DCP starts                                              First, the focus of our study is on response and non-response.
with a web-mode survey, which lasts about a month. A web-                                                In this sense, we are not interested in the sub-types of non-
mode DCP begins with the by-post delivery of an invitation                                               response behaviour. Second, the number of non-contact and
letter that requests the sample unit to respond to the survey                                            refusals in our data is too low (non-contact rates < 0.4% and
online using a desktop, laptop or tablet (but not a smart-                                               refusal rates < 2.2%) for the use of discrete-time survival
phone). In case of no response from the individual after                                                 models in this study, because the denominator of hazard rates
about one week, up to two mailed reminder letters follow (at                                             becomes so low that the models would have trouble with re-
an interval of one week). The invitation and reminder letters                                            liable estimation.
contain a web link to the survey and a unique personalised
password required for login to the survey. Each web-mode                                                 4.2   Weather Data
DCP ends roughly one month following the invitation letters.
   Figure 1 illustrates the data collection process with the cu-                                            The Royal Netherlands Meteorological Institute (KNMI)
mulative response rates of the first three web-mode DCPs                                                 records daily weather information about temperature, sun-
6                                  QIXIANG FANG, JOEP BURGER, RALPH MEIJERS AND KEES VAN BERKEL

shine, precipitation, wind, cloud, visibility, humidity and air    we used to measure each trend. For disease outbreaks, we
pressure across 47 stations in the Netherlands. We retrieved       used the relevant commonly used Dutch terms concerning
the 2016 and 2017 daily weather information (i.e. 20 vari-         diseases like “flu”, “cold” and “depression”. For privacy
ables in total) from the KNMI website (KNMI, 2019). The            concerns, we used terms such as “data leaks” and “hacking”
exact variables and the associated measures are summarised         as proxies. For outdoor engagement, we used terms indicat-
in Table A2 in Appendix A.                                         ing whether people are in a traffic jam or participating in fes-
   We averaged the obtained weather records across all sta-        tivals. Lastly, for terrorism salience, we used the term “ter-
tions, instead of assigning every sample case to the nearest       rorist”. Note that we hypothesised these search terms prior
weather station, for two reasons. First, considering the small     to any analysis, rather than cherry-picked from a long list
size and geographical homogeneity of the Netherlands and           sorted by correspondence with response rates, to reduce the
only small variations of the weather data across the weather       problem of spurious correlations.
stations, we did not see a strong benefit of assigning the clos-      Like any other data, GT data also need to be checked
est weather station records to the sample cases over simply        with regards to their validity and reliability before use. A
assigning the average scores, especially when we also fac-         potential issue of validity in GT concerns the fact that the
tor in the additional effort and potential matching mistakes       search volume of a specific search term does not necessarily
associated with the matching approach. Second, averaging           measure the intended phenomenon. A key reason is that a
the weather measures across stations means that all sample         search term can bear multiple meanings. For instance, the
cases, on a given day, are associated with the exact same          Dutch term “AVG” can be short for both “Algemene veror-
scores for any weather variable. This has the advantage that       dening gegevensbescherming (General Data Protection Reg-
we can collapse the data set from the “Person-Period” data         ulation)” and “AVG Technologies” (a security software com-
format into the “Period-Level” data format (see Section 5.2        pany). According to Google Trends, the second meaning was
for more information), which has the advantage of down-            more often used than the first meaning in both 2016 and 2017
sizing the data matrix and thus reducing model computa-            in the Netherlands. Therefore, GT indices may capture noise
tion time (while obtaining the exact same model estimates).        rather than the intended trends of interest. To mitigate this
Therefore, given both considerations, we decided to average        validity issue, we followed the advice by Zhu, Wu, Wang,
the weather records across stations.                               and Qin (2012). The authors noted that GT offers a func-
                                                                   tion to check the most correlated queries (i.e. search terms)
4.3   Societal/Google Trends                                       and topics for any specific term you enter. The validity of the
   The four societal trends of interest are disease outbreaks,     terms can thus be manually assessed by checking whether the
privacy concerns, public outdoor engagement and terrorism          most correlated queries and topics correspond to the intended
salience. To our knowledge, there are currently no publicly        meaning of the search term. For instance, GT shows that the
available administrative or survey data on any of these four       most correlated topic with the Dutch term “files” (meaning
societal trends on a daily basis in the Netherlands. Measuring     “traffic jam”) was “traffic congestion” in both 2016 and 2017,
these trends, therefore, requires innovative solutions. Our so-    therefore indicating the relatively high construct validity of
lution of choice is to use Google Trends (GT) data to capture      the term “files”. Table A4 in Appendix A summarises all the
signs of these societal trends.                                    correlated queries and topics of the used GT search terms in
   GT offers periodical summaries of user search data from         this study.
2004 onwards for many regions and for any possible search             In addition, the resulting index scores from GT can vary
term. These summaries, available as indices, represent the         substantially across different requested periods and inquiry
number of Google searches that include a given search term         attempts. This speaks of measurement reliability issues,
in a specified period (e.g. day, week or month). The data are      which likely stems from two reasons. First, Google Trends
scaled for the requested period between 0 and 100, with 0          uses a simple random sample from the total search volume
indicating no search at all and 100 the highest search volume      to calculate the index scores. Therefore, the resulting index
in that period. These indices, which represent the popular-        scores are subject to high variability if the (unknown) sam-
ity of specific search terms, may offer relevant insights into     ple size is small. Second, Google does not publish details
various human activities in (almost) real time. Indeed, GT         about the underlying algorithms that calculate the scores, nor
indices have been used for various purposes, such as real-         does Google make publicly available any changes in its al-
time surveillance of disease outbreaks (Carneiro & Mylon-          gorithms. Therefore, the obtained index scores may change
akis, 2009), economic indicators (Choi & Varian, 2012) and         over time because of updates in the algorithms, even when
salience of immigration and terrorism (Mellon, 2014).              the exact same search strategy is used. To overcome this is-
   These successful applications of GT indices suggest the         sue, we used repeated sampling to enhance the measurement
possibility of using GT to capture our four societal trends        reliability of GT data. By taking as many samples as possible
of interest. Table A2 in Appendix A lists the search terms         per date in the period of interest and averaging all the scores
UNDERSTANDING AND PREDICTING WEB SURVEY RESPONSE                                            7

for each date over all repeated samples, one can obtain much      contextual predictors; and, lastly, the “interaction model”
more precise estimates. However, in this procedure a compli-      where we allow the effects of the predictors to vary with time.
cation arises due to the fact that a different sample can only       Appendix B provides descriptive statistics about all the
be taken when a different period is requested. For instance,      variables used in the training and the test data sets, separately.
to obtain two different repeated samples for the date “2017-
01-05”, one needs to request two different periods such as
                                                                  5.1   Discrete-Time Survival Analysis
“2017-01-04 to 2017-01-05” and “2017-01-05 to 2017-01-
06”, both covering the date of interest (“2017-01-05”). How-          The research questions and the features of the current data
ever, as GT scales the scores within the specified period to be   require a modelling framework capable of handling the fol-
between 0 and 100, when two different periods are requested,      lowing issues: first, the method should model the transition
the reference values in the two groups can be different, lead-    from non-response to response; second, it incorporates both
ing to differently scaled scores and invalid comparisons of       time-varying and time-fixed predictors; third, it takes care
scores between the two different periods. We overcame this        of the right censoring issue in the data, with right censoring
issue by maximising the number of overlapped dates between        meaning that for some individuals the time when a transition
every two consecutive different periods and then calibrating      (i.e. response) takes place is not observed during the survey’s
the latter period to the previous one. In this way, the com-      web mode.
parability between different samples is maximised. See Ap-            These specific issues call for survival analysis. Survival
pendix E for a detailed description of the algorithm we cre-      analysis is a body of methods commonly used to analyse
ated and used to calibrate GT data.                               time-to-event data (J. D. Singer & Willett, 2008). The fo-
                                                                  cus is on the modelling of transitions and the time it takes for
4.4   Overview of Variables
                                                                  a specific event to occur. The current research interests lie
   Appendix B summarises all the variables used in the study.     in the modelling of the transition from non-response to re-
Unless specified as categorical or dummy-coded, the vari-         sponse over a period of time, thereby making survival analy-
ables are treated as continuous.                                  sis the right analysis tool. Many survival analysis techniques
                                                                  (e.g. Cox regression) assume continuous measurement of
                        5   Methods                               time. However, in practice, data are often collected in dis-
   In this section we detail the analytic approaches used in      crete intervals, for instance, days, weeks and months. In
the study. First, we introduce discrete-time survival analysis    this case, a sub-type of survival analysis is needed, namely,
and under this analytical framework, demonstrate how we           discrete-time survival analysis. Given that our data are mea-
used logistic regression to model the effects of time-varying     sured in daily intervals, it is only appropriate to use discrete-
factors on survey response in the web mode of the 2016 and        time survival analysis.
2017 Dutch Health Surveys. We explain how we applied                  There are further advantages to using discrete-time analy-
(adaptive) Lasso regularisation with logistic regression, with    sis, in comparison to its continuous-time counterpart (Tutz &
the goal to enhance model interpretability and predictive per-    Schmid, 2016). For example, discrete-time analysis has no
formance.                                                         problem with ties (i.e. multiple events occurring at the same
   Following the modelling approaches mentioned above, we         time point) and it can be embedded into the generalised linear
trained three models with the time-varying contextual pre-        model framework, as is shown next.
dictors based on the training data set, which consists of the
complete 2016 Dutch Health Survey data and the first half of      5.2   The General Modelling Approach
the 2017 data (i.e. 18 web-mode DCPs in total). The data
is split in this way because it is important to have a good           The fundamental quantity used to assess the risk of event
trade-off between enough data variation in the training data      occurrence in a discrete-time period is hazard. Denoted by
and sufficient independence of the test data from the training    his , discrete-time hazard is the conditional probability that
data.                                                             individual i will experience the target event in time period s,
   Then, we applied the trained models to the test data set,      given that he or she did not experience it prior to time pe-
which is the remaining 2017 data (i.e. the last six web-          riod s. This translates into, in the context of this paper, the
mode DCPs), evaluated and compared their predictive per-          probability of person i responding to the survey during day
formances. These three models are: the “baseline model”,          s given the individual did not respond earlier. The value of
which includes only the baseline predictors (the number of        discrete-time hazard in time period s can be estimated as the
“days” since the previous invitation or reminder letter and       ratio of the number of individuals who experience the target
“survey phase”) that are necessary for the specification of the   event (i.e. answering the survey) in time s to the number of
intercept term in a discrete-time survival model (see Section     individuals at risk of the event in time s.
5.3); the “full model”, which includes all the time-varying           A general representation of the hazard function that con-
8                                      QIXIANG FANG, JOEP BURGER, RALPH MEIJERS AND KEES VAN BERKEL

nects the hazard his to a linear predictor η is                                    Table 2
                                                                                   Example of Person-Period Data
                    η = g(his ) = γ0s + xis γ                      (1)               Person    Event    Time     Covariate
                                                                                        i       yis       s         xis
where g(.) is a link function. It links the hazard and the linear
predictor η = γ0s + xis γ, which contains the effects of predic-                        1         0       1         x1,1
tors for individual i in time period s. The intercept γ0s is                            1         1       2         x1,2
assumed to vary over time whereas the parameter γ is fixed.                             2         0       1         x2,1
Since hazards are probabilities restricted to the interval [0,                          2         0       2         x2,2
                                                                                        ..        ..      ..         ..
1], a natural, popular candidate for the response function g(.)                          .         .       .          .
is, among others, the logit link. The corresponding hazard                              2         0       20        x2,20
function becomes                                                                        3         0        1        x3,1
                                                                                        ..        ..       ..         ..
                  his = exp(η)/(1 + exp(η))                        (2)                   .         .        .          .

   Under this logistic model, the exponential term of a pa-
rameter estimate quantifies the difference in the value of the
                                                                            Note that when a data set contains only time-varying pre-
conditional odds (instead of hazards) per unit difference in
                                                                         dictors and these predictors only vary with time but not with
the predictor. The total negative log-likelihood of the model,
                                                                         individuals, we can collapse the data set from the “Person-
assuming random (right) censoring, is given by
                                                                         Period” format into the “Period-Level” format, where each
                ti
              n X
              X                                                          row represents a given time point, the scores of the time-
       −l∝              yis log(his ) + (1 − yis ) log(1 − his )   (3)   varying predictors associated with that time point, the num-
              i=1 s=1                                                    ber of individuals experiencing the target event and the num-
                                                                         ber of individuals at risk of the event at that time. This ap-
where yis = 1 if the target event occurs for individual i dur-           proach allows us to significantly downsize the data matrix
ing time period s, and yis = 0 otherwise; n refers to the total          (from 775,890 rows to only 808 rows, in this study), and thus
number of individuals; ti the observed censored time for in-             reduce model computation time (from tens of hours to only
dividual i. This negative log-likelihood is equivalent to that           minutes, in this study), while obtaining the exact same model
of a binary response model. This analogue allows us to use               estimates as we would with the original Person-Period data
software designed for binary response models (e.g. binary lo-            format. The only difference is that, instead of using a binary
gistic regression) for model estimation, with only one mod-              logistic regression, we need to use a binomial logistic regres-
ification, namely that the number of binary observations in              sion which models count and proportion outcomes. Our data
the discrete survival model depends on the observed censor-              qualifies for such a transformation and thus we adopt this
ing and lifetimes. Thus, the number of binary observations               transformation strategy.
is ni=1 ts=1
   P Pi
             . This requires the so-called Person-Period data
format, where there is a separate row for each individual i              5.3   Model Specification
for each period s (“day” in our case) when the person is ob-
served. In each row a variable indicates whether an event oc-                An important consideration concerns the specification of
curs. The event occurs in the last observed period unless the            the intercept γ0s shown in Equation 1. γ0s can be interpreted
observation has been censored. Table 2 shows an exemplar                 as a baseline hazard, which is present for any given set of
Person-Period data set.                                                  covariates. The specification of γ0s is very flexible, vary-
    One may wonder whether the analysis of the multiple                  ing from giving all discrete time points their own parame-
records in a Person-Period data set yields appropriate param-            ters to specifying a single linear term. In the 2016 and 2017
eter estimates, standard errors and goodness-of-fit statistics           Dutch Health Surveys, an individual becomes “at risk” (i.e.
when the multiple records for each person in the data set do             of responding to the survey) when he/she receives an invi-
not appear to be independent from each other. This, fortu-               tation letter. Thus, each time point s can be conceptualised
nately, is not an issue in discrete time survival analysis be-           as a linear combination of the number of days since the ex-
cause the hazard function describes the conditional probabil-            pected delivery of the previous invitation or reminder letter
ity of event occurrence, where the conditioning depends on               (i.e. days, a continuous variable), and the specific survey
the individual surviving until each specific time period s and           phase this time point is in (Survey Phase, a categorical vari-
his or her values for the substantive predictors in each time            able with levels “Invitation”, “Reminder 1” and “Reminder
period (J. D. Singer & Willett, 2008). Therefore, records                2”).
in the Person-Period data need to only assume conditional                    Note that we measure the “days” variable as the number
independence.                                                            of days since the last letter (invitation or reminder) instead
UNDERSTANDING AND PREDICTING WEB SURVEY RESPONSE                                            9

of the number of days since the invitation letter, because this    by introducing a small bias to the model, Lasso significantly
removes dependency between the “days” and “survey phase”           reduces model variance and thereby improves a model’s out-
variables. For instance, “Day 2” together with “survey phase:      of-sample predictive performance.
Invitation” refers to the second day since the expected arrival        The value of λ needs to be carefully selected, because up
day of the invitation letter, while “Day 5” in combination         until a certain point, the increase in λ is beneficial as it only
with “survey phase: Reminder 1” indicates that this is day         reduces the variance (and hence avoids overfitting), without
5 since receiving the first reminder letter. Together, these       losing any important properties in the data. After a certain
two variables specify the baseline hazard rates for all sample     threshold, however, the model starts losing important prop-
cases on a given day.                                              erties, giving rise to bias in the model and thus underfitting.
   With “Invitation” treated as the reference level of Survey          To find the optimal λ, we followed the advice of Hastie,
Phase, the specification of γ0s becomes                            Tibshirani, and Friedman (2009), which involves the use of
 γ0s = γ00 + γ01 Days + γ02 Reminder1 + γ03 Reminder2 (4)          k-fold cross-validation (CV) . k-fold CV entails randomly di-
                                                                   viding the entire set of observations into k groups (folds) of
where Reminder1 = 1 if time period s is in the “Reminder 1”        approximately equal size. The first fold is treated as a valida-
phase and Reminder1 = 0 otherwise; likewise, Reminder2 =           tion set, and the model is fit on the remaining k-1 folds. The
1 if s is in the “Reminder 2” phase and Reminder2 = 0 other-       error measure (e.g. root mean squared error) is then com-
wise. Days remains untransformed, because common trans-            puted on the observations in the held-out fold. This proce-
formation of this variable (e.g. log, clog-log, square, cube,      dure is repeated k times: each time, a different fold of obser-
square-root) does not lead to better model fit.                    vations is treated as a validation set. This process results in k
                                                                   estimates of the validation error. Averaging all of these esti-
5.4   Lasso Regularisation                                         mates gives the k-fold CV error estimate. A typical choice of
   Logistic regression, however, has one shortcoming. It can-      k is 5 or 10, which gives accurate estimates of the validation
not handle the relatively large number of highly correlated        error while keeping the computation feasible. In this study,
predictors in the current data. For instance, there are in the     we used 10-fold CV.
current data 20 weather variables and 10 GT variables, many            Next, we chose a range of λ values and computed the 10-
of whom are highly correlated with each other (e.g. “average       fold CV error (i.e. deviance) for each value of λ. Then, we
temperature”, “maximum temperature”, “disease outbreaks:           selected the λ value for which the CV error is the lowest.
cold”, “disease outbreaks: influenza”). Therefore, the inclu-      Finally, the model was refitted using all of the available ob-
sion of all the predictors would result in a lack of model par-    servations and the selected λ value.
simony. Both the determination of relevant predictors and the          There are two further considerations regarding the use
interpretation of parameter estimates become much more dif-        of Lasso regularisation. First, the original Lasso algo-
ficult. Furthermore, having many predictors may result in an       rithm has the disadvantage that its selection of variables can
overfit model, because some of the predictors may be captur-       be inconsistent. To solve this problem, Zou (2006) pro-
ing noises rather than actual signals. Lastly, multicollinearity   posed the adaptive Lasso, whose penalty term has the form
                                                                   λ pj=1 w j |γ j |, where w j are weights. He showed that for
                                                                     P
can lead to inflated parameter variances and model depen-
dency on the relationship among the highly correlated pre-         appropriately chosen data-dependent weights, the adaptive
dictors.                                                           lasso provides consistent variable selection. Following the
   One solution is a popular machine learning technique            author’s advice, we used Ridge regularisation estimates as
called Lasso regularisation. Initially proposed by Tibshi-         weights. Note that Ridge regularisation, similar to Lasso,
rani (1996), this technique is capable of performing vari-         shrinks parameter estimates. However, unlike Lasso, Ridge
able selection (while achieving good prediction) and is also       regularisation does not reduce any estimate to exactly zero
compatible with the generalised linear modelling framework         and therefore does not perform variable selection. Second,
in discrete-time survival analysis (Tutz & Schmid, 2016).          when using Lasso, one usually assigns a less-than-full-rank
Generally speaking, Lasso regularisation works by adding a         dummy-coding procedure to a categorical variable, such that
penalty term λ (λ ≥ 0) to the negative log-likelihood function     all levels of the variable enter the model as separate variables.
−l (Equation 3), which has the effect of shrinking parameter       This allows Lasso to select what it considers to be appropri-
estimates towards zero. By doing so, Lasso retains only a          ate reference levels (i.e. the ones with a zero coefficient).
small number of important variables (i.e. the ones that have       Nevertheless, sometimes Lasso retains all levels of a cate-
non-zero parameter estimates) in a model and thus results in       gorical variable in the model. Without a reference category,
a more parsimonious and interpretable model. Because this          the interpretation of the parameter estimates of categorical
variable selection procedure is automatic, we can also conve-      variables becomes impossible. To avoid this problem, we
niently avoid the use of traditional p-values and confidence       pre-assigned a reference category to all the categorical vari-
intervals to judge the relevance of a variable. In addition,       ables (see Table A2 in Appendix A) before they entered the
10                                 QIXIANG FANG, JOEP BURGER, RALPH MEIJERS AND KEES VAN BERKEL

model.                                                            able importance scores. We used RMSE as the error measure
                                                                  in calculating variable importance.
5.5   Model Evaluation
   As the focus of the study is on the influence of predic-       5.7   Software
tors on daily response hazards, it is necessary to evaluate the
                                                                     We conducted all the analyses in R (version 3.5.0) and
predictive performance of the models with regards to their
                                                                  R studio (version 1.1.383). We used the package “glmnet”
prediction of the hazard rates when the models are applied to
                                                                  (Friedman, Hastie, & Tibshirani, 2010) for the implementa-
the test data set. For this purpose, we used root mean squared
                                                                  tion of adaptive Lasso logistic regression.
error (RMSE) as the evaluation criterion, which quantifies
the distance between the observed and the predicted daily
hazards.                                                                                   6   Results
   Using RMSE, we compared the predictive performance of
                                                                  6.1   Model Estimates and Interpretation
what we call the “full model” (which includes all the time-
varying predictors) to that of the “baseline model” (which           Figure 2 shows the exponentiated standardised parameter
includes only the baseline intercept predictors: “days” and       estimates of the predictors that are retained by the full model.
“survey phase”). With the full model where we enter all of        That is to say, these predictors are considered by the model
the predictors into the model without any interaction effect,     to have non-zero coefficients and are thus important. The
we assume that the effects of the predictors do not vary with     size of the estimates quantifies how much one standard devi-
time. This modelling approach has the advantage that the          ation change in the predictors impacts the conditional odds
model is more parsimonious and easier to interpret. How-          of survey response on a given day, under the condition that
ever, in reality, the effect of a predictor may depend on time.   the person has not responded earlier.
To account for this possibility, we also built an “interac-          As the figure suggests, both of the baseline predictors that
tion model”, where we include interaction terms between           define the model intercept are strong predictors of survey
the baseline predictors (“days“ and “survey phase“) and the       response. Specifically, the number of days since the previ-
time-varying contextual predictors and thereby allow the ef-      ous invitation or reminder letter seems to have the largest
fects of the model predictors to vary over time. Note that        effect on survey response. An exponentiated standardised
we do not interpret this model in terms of parameter esti-        coefficient of about 0.28 suggests that, assuming everything
mates, because the resulting model contains non-zero inter-       else stays constant, for one standard deviation increase in the
action terms whose corresponding main effects are shrunk to       number of days (about 5.34 days), the odds of a person re-
zero. Because of this, the model’s parameter estimates be-        sponding to the survey at that given time point are reduced by
come difficult to interpret. However, we can compare still        about 72%. Survey phase also turns out to be an important
the predictive performance of this interaction model with the     predictor: the first reminder letter increases the conditional
other two models.                                                 odds of response by about 22% than in the invitation phase,
   Furthermore, we plotted the predicted cumulative re-           while the second reminder letter lowers the conditional odds
sponse rates of the three models against the observed ones.       by about 17%, assuming everything else stays constant. To
                                                                  avoid repetitions, in the rest of the paper we interpret model
5.6   Variable Importance                                         estimates without repeating the assumption that everything
   In addition to knowing whether the model on the whole          else holds constant.
predicts well, it is also helpful to know whether a specific         Turning to the time-varying contextual predictors of in-
predictor predicts well (i.e. so-called “variable importance”).   terest: Day of a week appears as a very relevant predictor.
Specifically, one can evaluate the importance of a variable       In comparison to Monday, all non-Mondays lower the condi-
by calculating the increase in the model’s prediction error       tional odds of responses. That is to say, Monday has the most
after permuting the variable (Molnar, 2018). Permuting the        positive effect on conditional response odds, compared to
variable breaks the relationship between the variable and the     all other days. Saturday shows the strongest negative effect.
true outcome. Thus, a variable is “important” if shuffling its    With an estimate of approximately 0.66, the conditional odds
values increases the model’s prediction error, because in this    of a survey response on Saturday is about 34% less likely
case the model relies on this variable for better prediction. A   than on Monday. The effect of Sunday on response odds is
variable is “unimportant” if permuting its values leaves the      also negative, with an exponentiated estimate of 0.81. There-
model error unchanged or worse.                                   fore, we can safely conclude that weekends have a negative
   The permutation algorithm for assessing variable impor-        influence on survey response, while Monday has a positive
tance we used is based on the work of Fisher, Rudin, and          one.
Dominici (2018). For an accurate estimate, we used 20 per-           Holiday also appears to have a negative effect on response.
mutations for each variable and averaged the resulting vari-      With an exponentiated coefficient of 0.82, holidays reduce
UNDERSTANDING AND PREDICTING WEB SURVEY RESPONSE                                                           11

                                                              Days     ● 0.28
                                         Survey Phase: Reminder 1                                                                                 ● 1.22
                                         Survey Phase: Reminder 2                                               ● 0.826
                                           Day of a Week: Tuesday                                                       ● 0.911
                                        Day of a Week: Wednesday                                           ● 0.763
                                          Day of a Week: Thursday                                                   ● 0.837
                                             Day of a Week: Friday                                              ● 0.828
                                          Day of a Week: Saturday                                   ● 0.661
                                            Day of a Week: Sunday                                              ● 0.813
                                                            Holiday                                             ● 0.819
                 Variable

                                               Temperature (max.)                                                             ● 0.953
                                                 Sunshine Duration                                                             ● 0.979
                                              Precipitation Volume                                                               ● 1.006
                                    Precipitation Volume (max. hr.)                                                              ● 1.01
                                                Air Pressure (avg.)                                                             ● 0.998
                                                   Visibility (max.)                                                                  ● 1.045
                                                  Cloudiness (avg.)                                                              ● 1.015
                                      Disease Outbreaks: Depression                                                           ● 0.961
                                           Disease Outbreaks: Cold                                                            ● 0.964
                            Public Outdoor Engagement: Traffic Jam                                                             ● 0.974
                                                  Terrorist Attacks                                                            ● 0.979
                                                                       0.3      0.4   0.5     0.6    0.7      0.8      0.9      1.0      1.1    1.2   1.3
                                                                                 Exponentiated Standardised Coefficient Estimate
                Figure 2. Exponentiated Standardised Estimates of Predictors in the Full Model

the conditional response odds by about 18% compared to                                      6.2     Model Performance
non-holidays.
                                                                                               The RMSE scores of the three models (“baseline
                                                                                            model”, “full model” and “interaction model”) are 0.005528,
   The weather variables show smaller effects on survey re-                                 0.005274 and 0.004738, respectively. This suggests that the
sponse than the previous variables. Nevertheless, those with                                inclusion of the time-varying contextual predictors in the full
non-zero coefficients show a clear pattern. When the weather                                model increases the baseline model’s predictive performance
is nicer (e.g. higher temperature, longer sunshine duration,                                improves by 4.6%. In addition, allowing time-varying effects
less rain, higher air pressure, and less cloudy), conditional                               in the interaction model further reduces prediction error by
response odds are also lower. Specifically, for one SD change                               about 10%, compared with the full model.
in these weather variables, conditional response odds can                                      As some may argue that weather variables and/or GT vari-
change by a maximum of about 5%. The only exception to                                      ables largely capture monthly or seasonal trends and there-
this observed rule is the variable maximum visibility, which                                fore can be substituted by indicator variables representing
shows a clear positive effect on response.                                                  month or season, we conducted additional analyses to test
                                                                                            this argument, which shows that replacing the weather and
    Similar to the weather variables, the GT variables tend to                              GT variables with either month or season indicators lead to
have, if not zero, small effects. Among the variables intended                              poorer performances in RMSE than the models that include
to measure signs of disease outbreaks, “depression” and                                     the weather and GT variables. Specifically, using month as a
“cold” show negative effects on survey response, while the                                  replacement variable results in an RMSE score of 0.006553,
other indicators of disease outbreaks (“flu”, “hay fever” and                               while using season leads to a score of 0.005830. Both are
“influenza”) are not retained by the model. The two terms                                   much higher than any of the previous three models we tested.
concerning data privacy concerns, namely “data leak” and                                       Figure 3 presents the predicted cumulative response rates
“hacking”, have also been left out by the model. Between the                                of the three tested models across all survey phases in the test
two variables related to public outdoor engagement, “traf-                                  data set, against the observed cumulative response rates (in-
fic jam” negatively predicts survey response, while “festival“                              dicated by unconnected asterisks).
has a zero coefficient. Finally, “terrorist” also has a small                                  In the invitation survey phase, the interaction model
negative influence on survey response. Note that the interpre-                              achieves the best prediction of cumulative response rates
tation of the GT variables in terms of the sizes of their effects                           among the three models. Especially during the later stage of
is difficult and can be misleading, because the variables are                               the invitation phase, the interaction model predicts cumula-
measured on somewhat arbitrary scales.                                                      tive response rates almost perfectly. Both the full model and
You can also read