The Doughboys Network: Social Interactions and the Employment of World War I Veterans

Page created by Benjamin Ramos
 
CONTINUE READING
The Doughboys Network: Social Interactions and the
                    Employment of World War I Veterans

                                               Ron Laschever
                            University of Illinois at Urbana-Champaign

                                                  October 2009

                                                       Abstract

           This paper examines how involuntarily-formed social networks affect individual labor mar-
       ket outcomes. Using a new dataset of WWI draftees linked to the 1930 census, I identify the
       effect of a military company’s postwar employment on a veteran’s employment. The marginal
       effect of an additional peer gaining employment, all else equal, increases a veteran’s likeli-
       hood of employment by 0.8 percentage points. I develop a new framework which allows for
       decomposing the social effect into its two components, the endogenous (“the effect of others’
       outcomes”), and the contextual (“the effect of others’ characteristics”). In this setting, I find
       the endogenous effect to be much stronger.

     I thank Ran Abramitzky, Evelyn Asch, Yann Bramoullé, Kristine Brown, Greg Duncan, Joseph Ferrie, Peter
Hinrichs, Charles Manski, Rosa Matzkin, Dale Mortensen, Eugene Orlov, Enrichetta Ravina, Christopher Taber, Elie
Tamer, Burt Weisbrod, and seminar participants at Bar-Ilan, Ben-Gurion, Carnegie Mellon, Chicago GSB, Haifa, Har-
vard Business School, Hebrew U., Laval, Northwestern, Pittsburgh, RAND, Tel-Aviv, UCLA, UIUC, and Virginia
Tech for discussions and comments. Jim Buel, Lindsay Larsen, Joon Yeol Lew, Karen Muth, and Micah Yergler
provided excellent research assistance. Financial Support through Northwestern University’s Graduate School Re-
search Grant and the Center for Population Economics at the University of Chicago (subgrant NIH P01AG10120) is
gratefully acknowledged. The content is solely the responsibility of the author and does not necessarily represent the
official views of the National Institute on Aging or the National Institutes of Health. All errors are mine. Contact info:
504 E. Armory Ave., Champaign, IL 61820, USA. Email: ronL@illinois.edu
“Your face is familiar,” he said, politely. “Weren’t you in the Third Division during the war?”
        “Why, yes. I was in the ninth machine-gun battalion.”
        “I was in the Seventh Infantry until June nineteen-eighteen. I knew I’d seen you somewhere before.”
                                                      -F. Scott Fitzgerald, The Great Gatsby, 1925

1         Introduction

In recent years, economists have examined the effect of social interactions in a wide range of

areas.1 In the labor market, various surveys have documented the importance of the “informal”

channel, that is finding jobs through friends and relatives.2 The goal of this paper is to better

understand how social networks affect labor market outcomes by examining groups which were

formed involuntarily due to a quasi-random event. In addition, it introduces and illustrates an

application of a new methodology for decomposing the social effect into its two components, the

endogenous and the contextual.

        I construct a new dataset of American men who were drafted and served together during World

War I (1917-1919), and use it to examine the effect of networks formed during the war on post-

war (1930) likelihood of employment. The setting I consider allows me to address some of the

critical issues faced by many empirical studies of social influence and peer effects. There are three

primary advantages to the setting I consider. First, groups were formed due to an exogenous shock,

America’s decision to enter the war and its need to quickly raise a large army. Second, I observe
    1
      These range from theoretical work on network formation and network games (see Jackson (2004) for a survey of
networks games) to the measurement of peer effects in such settings as welfare take-up (Bertrand et al., 2000), drug
use among college students (Duncan et al., 2003), recidivism (Bayer et al., 2009), etc.
    2
      Ioannides and Loury (2004) summarize a number of surveys that find that 30-60 percent of jobs (in various
industries and of various statuses) are found through the “informal” channel. Bewley (1999, p. 368) lists 24 studies
that were published between the years 1932-1990. In these studies, the percent of jobs or job offers obtained through
friends and relatives ranges from 18 to 78 percent. As early as 1923, De Schweinitz (1932) finds that 45% of workers
in the hosiery industry in Philadelphia obtained their job through friends and relatives.

                                                            2
all members of the groups and these groups are well defined.3 Finally, I observe labor market

outcomes of interest, such as employment.

       In most instances, groups, or social networks, are formed endogenously. This in turn can lead

to many problems in inference. As emphasized by Moffitt (2001), in the case of group interactions,

correcting for this selection is even more challenging than it is in the individual case. Realizing the

importance of having randomly assigned groups, researchers in recent years have examined social

interactions in various settings in which groups were randomly assigned. However, these studies

focus on populations or outcomes that are somewhat specialized. The sample I examine represents

an important segment of the labor market, namely, white males who were in their 30’s and 40’s in

1930.4

       There are several possible explanations as to why one’s likelihood of employment might be

affected by one’s peers.5 The various explanations do not necessarily contradict one another, and

at times the distinctions are somewhat arbitrary. This paper is motivated by the role networks

play in information transmission.6 The recent work of Bayer et al. (2008), and Hellerstein et al.

(2008), contributes to a better understanding of the referral aspect of networks. In both cases, using
   3
      One approach taken to overcome this problem when using the standard datasets is the use of some proxy of the
relevant group. For instance, Bertrand et al. (2000) look at those who speak the same foreign language in the census,
and Bayer et al. (2008) consider those who reside in the same census block to be the reference group.
    4
      Group formation in my data is most closely related to the literature on random assignment to groups. One common
feature of many of those settings is that the “treated” population is very homogenous in its nature. For instance,
Sacerdote (2001), Zimmerman (2003), and Duncan et al. (2003) consider random roommate assignments for various
colleges. Bayer et al. (2009) consider cell-mate assignment for prisoners, and Kling et al. (2007), among others, have
examined the Moving to Opportunity program which focuses on the assignment of families of low socioeconomic
status.
    5
      These include referrals (See Montgomery (1991) for example), social norms or “stigma” effects (see Stutzer and
Lalive (2004) for example), information transmission, etc. Networks can be of value even if all agents are identical
and their utility does not depend on that of others, since networks can reduce the search frictions in the labor market
via transmission of information.
    6
      This has been the mechanism the theoretical model in Calvó-Armengol and Jackson (2004) has focused on. The
striking feature of that model is that very small differences in initial conditions can lead, over time, to large differences
in unemployment across groups. Cingano and Rosolia (2008) examine the effect of peers, defined as former co-
workers, on unemployment duration and discuss the role of information transmission.

                                                             3
micro-level data, they find that peers (defined as census block neighbors) are more likely to work

together. My data are better suited to examine cases in which agents pass on information about

other job openings, not necessarily in their own firm.

       In the 1930 census, I find that a group’s unemployment rate has an economically and a statis-

tically significant effect on a veteran’s own likelihood of being employed. Each additional peer in

one’s reference group who gains employment increases a veteran’s own likelihood of employment

by 0.8 percentage points.7 For an individual, all else equal, being in a group at the 75-percentile as

opposed to the 25-percentile of unemployment among all groups is equivalent to roughly half of

the effect of having a blue-collar occupation relative to a white-collar occupation.

       I provide robustness checks to address various concerns. For example, in most of the paper I

use one’s military company as the appropriate reference group. I examine what is the “correct”

reference group, and find that larger groups, such as battalions, have no statistically significant

effect beyond that of a company. I also find the employment outcomes of other military companies

within the same regiment to have no statistically significant effect. I show that the company’s

group effect persists controlling for group members’ prewar place of residence by exploiting the

variation in the groups’ composition of prewar locations.

       Finally, I further decompose the social effect by introducing a new framework, termed multi-

ple reference groups (MRG), which allows me to separately identify the two components of the

social effect. The components are commonly referred to in the literature as the endogenous and

contextual (or exogenous) effects. The endogenous effect measures the effect of a group’s out-

comes (say, the average unemployment rate of a group). The contextual effect is the effect of the
   7
    Over 93% of the sample members are employed in the 1930 census and the unemployment rate among the entire
male population during that period was less than 10%.

                                                     4
group characteristics (say, the race and average age of group members). The inability to separately

identify these two types of effects is referred to as the “reflection problem” (Manski, 1993).8 The

problem implies that one cannot identify whether some group characteristics have a direct effect

on an individual (the contextual effect) or are only affecting the group outcomes, and mistakenly

attributed to the effect of the group outcomes (the endogenous effect).

       My methodological contribution can be used to separately identify the two effects if some peo-

ple are influenced by more than one reference group. The intuition behind the result is as follows.

Consider two subgroups of people that do not know each other, but do share a common intermedi-

ate group of contacts (those who belong to both groups). One would not expect the characteristics

of one subgroup to directly affect the outcomes of another subgroup to which they are not directly

tied. If, in fact, we do find an effect, it is evidence for the existence of an endogenous effect that the

intermediaries had propagated. However, the extent to which the endogenous effect is propagated

also depends on the contextual effect. I show that the existence of multiple reference groups can aid

in identification even in cases where group characteristics are perfectly correlated with the average

of members’ characteristics. I further show how to estimate the two effects by explicitly solving for

them. This allows for a comparison of the relative importance of the two effects. The magnitude

of the two types of social effects determine the extent to which a change in one’s outcome affects

others in the group. This has important policy implications for determining the benefit of virtually

any program, be it welfare, job training, or the bussing of school children.9

       This paper is related to a growing interest in the identification of social effects. Concurrent

with my work, Cohen-Cole (2006) showed identification for a setting in which agents are affected
   8
     Manski (1993) actually discusses two forms of the reflection problem. I present both in Section 2.
   9
     A strong endogenous effect suggests that any program that targets outcomes, such as employment, will have a
“spillover” effect: increasing the employment likelihood of someone else in the network.

                                                       5
by more than one group in a linear-in-means model, but did not consider estimation. Bramoullé et

al. (2009) show identification for various types of network structures in a linear-in-means setting,

and characterize necessary and sufficient conditions for structures to be identified. Others have

examined students’ performance and the two types of social effects. Lin (2010) uses a spatial auto

regressive model framework and Cooley (2007) exploits an exogenous shift in North Carolina’s

grade advancement requirements. The recent work of De Giorgi et al. (2010) examines choice of

college major by taking advantage of the fact that different randomly assigned groups of students

take classes together. Their identification strategy, using peers-of-peers as an exclusion, highlights

that similar to the approach considered in this paper, the source of identification is the exclusion

generated by the network structure. My paper examines a case in which people belong to multiple

well-defined groups. Since individuals are often affected by multiple circles of influence (such

as neighborhood, family, friends from high school, friends from college, etc.), the framework can

be applied in many settings. I also further relax the correlation structure, and allow correlated

unobservables among one type of group.

   The rest of this paper is organized as follows. The econometric specification is discussed in

Section 2. It serves as an introduction to the notation and issues involving the estimation of peer

effects and introduces the MRG framework. Section 3 then presents the data used in this paper,

and provides some of the institutional background on the draft, as well as the choice of military

company as reference group. The results are in Section 4. Section 5 illustrates an application

of the MRG method. Section 6 examines alternative group speciations and additional robustness

checks. Section 7 concludes. The appendices, which provide further detail on the sample and data

collection methods, and the proposition proofs, are followed by the tables.

                                                  6
2         Econometric Framework

I first introduce some of the basic notation, as well as the three most commonly raised issues related

to the identification of group or social effects.10

         Assume there are g = 1::G groups, each with ng members i = 1; 2::ng . To enhance readability,

variables will be denoted in upper-case. The basic linear-in-means model, used in many empirical

specifications, can be written as:

                                                 0
                                   Yi;g =     + Xi;g + Zg0 + Mge +            i;g                             (2.1)

Each individual, indexed by i; g has an outcome of interest Y , say the binary outcome of being

employed or unemployed, a vector of covariates X which affect the likelihood of employment,

such as age, occupation, and local labor market conditions, and an error term i , a scalar capturing

the individual’s unobservable characteristics and shocks to his or her employment prospects. In

addition, each individual’s job prospects might depend on the group characteristics summarized
                                                                      !
by the vector Zg ; and the outcomes of all other members in the group Y                i;g ,   summarized by Mge .

Some mechanisms through which networks may affect employment outcomes were outlined in the

previous section. The existence of these effects correspond to a finding that either                 or   (or both)

are different than zero.

           is often referred to in the literature as the contextual (or exogenous) effect, and              as the

endogenous effect. It is possible, and in most instances quite likely, that Zg depends on the char-

acteristics of others. For instance, Zg might just be the expected value of group characteristic,
    10
     The interested reader is referred to Brock and Durlauf (2001) and Moffitt (2001) for a more thorough discussion
of some of these issues.

                                                         7
Zg = E[Xjg]: When estimating the model, own outcomes will be excluded from the group statis-

                                                                                                  1
                                                                                                         P
tic. For example, in a linear-in-means specification, Mge will be replaced with                 ng 1
                                                                                                                  Yj ; the
                                                                                                       j2g;j6=i

average outcome among group members, self excluded.11

    Two types of problems regarding identification were first formalized by Manski (1993) and are

known as the “reflection problem.” Assume that the endogenous effect depends on the expected

outcome among group members, Mge = E[Y jg]: Consider again equation (2.1). Next, assume

that: E[ i j Xi ; Zg ; i 2 g] = 0, and that Zg = E[Xjg]. Taking the expectation E[ j g] of both

sides of equation (2.1) and rearranging terms back into equation (2.1) one obtains a reduced form:

E[Y jX; Z; g] =      1
                          + X 0 + E[X jg] 1           + Zg 1    . Since group attributes depend on the char-

acteristics of its members (Zg = E[X j g]); then               and , the endogenous and contextual effects,

cannot be separately identified. Note that this result was derived even though it was assumed that

the error terms are independent of group and own characteristics. One could, however, identify the
                                                 +
existence of a social effect, namely:        1
                                                     6= 0: In Section 4.1 I present several reduced-form spec-

ifications that are consistent with the existence of a social effect. The MRG framework described

below further allows for a decomposition of this effect.

    The above derivation assumed the existence of social equilibrium (e.g., Manski, 1993). For

example, an additional assumption in the propositions will be that j j < 1; ensuring a unique

equilibrium. In the empirical section, estimates will be derived from a reduced-form specification.

These can be estimated without the need to assume a social equilibrium or that individuals have the

correct expectations about group behavior and are consistent with the existence of a social effect.
   11
      In the case of small group sizes, the use of either measure (Y g or Y g; i ) introduces problems in the coherency
of the model, depending on the functional form used. (See Heckman, 1978 for example.) The issue of coherency
is related to whether or not individuals are affected by a latent group statistic. The groups I consider in my data are
large enough so that one could plausibly assume that the sample analog is a measure of the expected value and that
the members do not make their choices by calculating or anticipating the employment outcomes of each and every
member of the group.

                                                           8
Further, to avoid any contemporaneous measures, group measures can be lagged (e.g., using pre-

assignment measures only). However, to compute the endogenous effect and the full model, the

existence of a social equilibrium is assumed.

       The second type of “reflection” discussed in Manski (1993) is one in which the unobserved

errors are correlated across group members and depend on the group attributes. Similar to the

above case, one cannot separately identify whether the observed effect of the group outcomes is

due to the “endogenous” effect or whether it is just a reflection of the group’s unobservables. I

show below that identification is possible even in the case that the unobserved errors are correlated

with group characteristics in some of the groups.

       The third concern, which is common to many group settings, is that often group formation is

endogenous.12 The military groups considered in this paper were formed involuntarily, and are

consistent with random assignment.

2.1      Identification Using Multiple Reference Groups

I now introduce a framework in which the endogenous and contextual effects can be separately

identified, even if group attributes are perfectly correlated with the characteristics of the group

members. I prove identification for the linear-in-means case. Also, I show that even if the error

terms in one of the groups are correlated with the group’s characteristics, identification is still

possible.

       The main requirement for identification is that (some) individuals belong to more than one

reference group. It is not required that all individuals belong to more than one group, nor that the
  12
    While the selection problem is likely to result in the second type of reflection problem discussed above (correlated
unobservables), even if group membership is exogenous, correlated unobservables might still exist if the researcher
cannot control for all of the group characteristics or if there are measurement errors (see Moffitt, 2001).

                                                           9
econometrician observes all group memberships for all individuals. To illustrate, I focus on the

linear-in-means case, and focus on the most basic structure, one in which at least some individuals

have more than one reference group. This is the structure depicted in Figure 1. There are two

groups, Group A and Group B, and some individuals (Subgroup 2) belong to both groups . It is

possible that those in Subgroups 1 and 3 belong to additional groups which are unobserved by the

econometrician. As discussed below, in general, identification of the existence of a social effect

does not depend on observing those additional group memberships. In the empirical application,

Group A consists of all members who had served in a certain military unit, and Group B consists

of all those residing in a certain neighborhood block.

       In this paper I do not use an Instrumental Variables (IV) type of estimate, because in the case

of group effects, this approach is problematic except for very special cases (see the discussion in

Brock and Durlauf, 2001, and Krauth, 2005). The MRG approach allows for the unobservable

component to be correlated with a group’s characteristics. Hence, the primary exclusion restriction

and source of identification is that some “peers of peers” do not affect an individual directly but

only through his or her own peers. In the empirical setting, this corresponds to assuming that a

veteran is not directly affected by another veteran’s neighbors, but only by other veterans or own

neighbors.

       Throughout, with no loss of generality, assume that all variables are of dimension one. Allow-

ing the contextual effect to vary across group types,13 the linear-in-means model for this structure
  13
       This assumption embeds the special case in which the contextual effect is assumed to be the same for all groups.

                                                           10
Figure 1: Group Subdivision Notation

                                                 1         2           3

                                          Group A ≡             Group B ≡
                                          Subgroups 1&2         Subgroups 2&3

can be written as:

       Yi;g2 =    + Xi;g2 +          A ZgroupA(1&2)   +   B ZgroupB(2&3)        + Mgroups1&2&3 +   i;g2   (2.2)

       Yi;g1 =    + Xi;g1 +          0 ZgroupA(1&2)   + Mgroups1&2 +            i;g1

       Yi;g3 =    + Xi;g3 +          0 ZgroupB(2&3)   + Mgroups2&3 +            i;g3

In the above,    and      are the individual effects,      0   is the contextual effect for those who belong to

only one group,      A   and   B   are the contextual effects for those who belong to two groups, and        is

the endogenous effect which is affected by the average outcome of all those in the groups to which

the individual belongs. Later, I allow the endogenous effect to vary across group types. Allowing

for a different contextual effect for those who only belong to one group (or for those for which

information on only one group is available to the econometrician) allows for a more general case.

   As can be seen from equation (2.2), each type of individual has a different reference group,

denoted by the index of M and Z. It is possible that those in Group 1 or 3 are influenced by

additional groups that the econometrician does not observe. Identification is still possible as long

as those additional unobserved groups not correlated in a systematic way. This point can be seen

                                                          11
more clearly by examining the assumptions and proof of Proposition 1.

   Though I take into account different group sizes below, here, for notational convenience and

with no loss of generality, assume that subgroups 1-3 are of the same size and define: Mt =

E[Y jSubgroup t], the expected value of the outcome Y for subgroup t, for the three subgroups

(e.g., M1 = E[Yi;g1 ]:

Proposition 1 In the above model (equation 2.2), under the assumptions:

(i) E[   t;g jXt ; Zg ; t   2 g] = 0

(ii) (1; E[Xjg1 ]; E[Xjg2 ]; E[Xjg3 ]) are linearly independent for some groups.

(iii) The existence of a social equilibrium (j j < 1).

In the case of perfectly correlated contextual effects: Zg = E[Xt j t 2 g] 8g; t; the parameters

 ; ;     0;   A;   B;   and     are globally identified.

   The intuition for Proposition 1 is as follows. Consider those in Subgroup 3. Even though

they are not directly linked to those in Subgroup 1, both their characteristics and their outcomes

(contextual and endogenous effects) affect those in Subgroup 1. These effects are propagated to

those in Subgroup 1 via an intermediate, those in Subgroup 2 (since those in Subgroups 1 and 3

are not directly linked). This in turn implies that any effect of those in Subgroup 3 on those in

Subgroup 1 is evidence of a social effect. Next, consider the effect of X 3 . Conditional on X 2 ; any

effect on those in Subgroup 1 is evidence of an endogenous effect. The magnitude of the effect of

X 3 depends on both the endogenous and contextual effects. The effect of X 3 is different for those

in Subgroups 1 and 2 (as captured by the coefficients           3   and   6 ):   It is the ratio of the two ( 6 = 3 )

which leads to a calculation of the size of the endogenous effect : Note that once the endogenous

effect is identified, the contextual effect is identified, since the sum of the effects is identified even

                                                           12
in the single group case.

   One implication of the proof (Appendix III) is that if    = 0 then the coefficient   3   (for the vari-

able E[X]3 ) would be zero. The reduced-form specification allows one to test for the existence of

an endogenous social effect. This can be done with far less restrictive assumptions than those used

in the full specification. The reduced-form specification has two main advantages. First, it could be

estimated without invoking a group expectation assumption or a contemporaneous specification.

For example, one could use only lag measures as covariates, which in the empirical case corre-

spond to prewar and wartime measures. Second, finding that peers-of-peers characteristics have

an effect on own outcomes is consistent with a social effect even if the network is not accurately

characterized. For example, even if there is a systematic omitted intermediary subgroup (similar

to group 2 above) through which the effect operates, then one can still test for the existence of a

social effect, i.e., whether the peers-of-peers characteristics have an effect on own outcome. The

crucial exclusion restriction is that they are not directly connected. Getting an unbiased estimate

of the endogenous effect, however, does depend on there being no systematically omitted links

among group members.

   Correlated unobservables are likely one of the biggest challenges faced by any empirical study

of social interactions. Such correlations could arise when group membership is endogenous. For

example, people with higher ability might choose their group based on the education level of their

peers. Even if the error term is uncorrelated with the group characteristics Zg ; if people choose

based on the individual characteristics of others, then the results will generally be biased.

   Following Manski (1993), I consider correlated unobservables to have a specific interpretation

that the individual unobservable term is correlated with the group average characteristics:

E[ i ji 2 g] = s E[X]g for some s 6= 0

                                                 13
Consider again the basic MRG linear-in-means model, even if E[                    t   j t 2 g] = sE[X]g for

some types of subgroups, the model is identified.

Proposition 2 In the above model, under the assumptions:

(i) E[   i;g jXi ; Xg ; Zg ; g1 ]   = E[   i;g jXi ; Xg ; Zg ; g2 ]   = sE[X]g1&g2

and E[    i;g jXi ; Xg ; Zg ; g3 ]   = 0 (correlated unobservables in one of the groups)

(ii)(1; E[Xjg1 ]; E[Xjg2 ]; E[Xjg3 ]) are linearly independent for some groups.

(iii) The existence of a social equilibrium (j j < 1).

In the case of perfectly correlated contextual effects: Zg = E[Xt j t 2 g]

the parameters ; ;            0;     A;   B;   ; and s are globally identified

   The underlying source of identification is not the functional form, such as the linear-in-means

model considered in the previous section, but rather the group structure (i.e., the availability of mul-

tiple reference groups for at least some individuals). The MRG framework is identified regardless

of the group sizes. However, estimation does depend on the size of the groups, and more impor-

tantly, on the relative size of the various subgroups. For instance, if the intersection of two groups

consists of only one member (Subgroup 2 is of size 1), then the effect of Subgroup 3 propagated

through Subgroup 2 is likely to be small, and therefore empirically difficult to detect. Therefore, I

present a modification that addresses this issue.

   Group A are those in the military unit, and group B are those in one’s neighborhood block. In

this context, the size of the intersection between each Group A and Group B is very small, say of

size one. Groups of type A and B could be of different sizes, as will be shown below. As discussed

above, it is important to note that those in Group B may be influenced by additional reference

groups that are not observed by the econometrician, as long as those additional unobserved groups

                                                                 14
Figure 2: Veterans and Neighbors Multiple Reference Group Diagram

                                                                       NBH 1
                                                                                   NBH 2
                                       Individual i

                                                                Military Company
                                                                (Group Type A)

                                   (Neighbors of veteran;         NBH t
                                   Group Type B)

are not systematically correlated across groups of type B. The group structure is illustrated in

Figure 2.

     I allow for both the contextual and the endogenous effects to be different for the two types of

reference groups (the army unit, and the neighborhoods). One can write the model of interest as a

system of equations for groups of Type A and B:

           Yi;gA =            + Xi;A +           A E[X] i;A      +   B E[X] i;B         +   A M i;A       +    B M i;B    +   i;gA   (2.3)

      Yj;g[B   A]
                     =        + Xj;B         A   +    0 E[X] j;B A        +   0    M    j;B A   +    j;g[B    A]

where B A are those in Group B that do not belong to Group A. Note that this specification allows

for a different exogenous effect for different groups. It nests the more restrictive case                                 A   =     B = 0;

 A   =    B    =     0:   In addition to the structure, the main assumption is that E[ t j X] = 0: For the

empirical part, I use

                      1
                                P                                                               A
                                                                                                                1
                                                                                                                     P
     X   i;T   =    nT 1
                                            Xj         (T = A; B)                 and       X       i;B   =   nA 1       s6=i;s2A   X   s;B
                           j2Group T;j6=i
                                                 A
where nT is the size of Group T. X                   i;B   is Group A’s average of the average neighborhood charac-

teristic (self-excluded) of all the neighborhoods in which the members of Group A reside.

                                                                  15
Consider the following reduced form:

                                                                    A
        Yi;g =     + Xi;g +          1 X i;A   +   2 X i;B   +   3 X i;B   +   i;g   (i = 1::ng ; g = 1::G)    (2.4)

where    2   is the coefficient in the reduced-form specification of the effect of the average neighbor-

hood characteristic and          3   is the effect of the average of the average neighborhood for that same

characteristic.

    First, a finding that    3   6= 0 suggests that there exists an endogenous effect. The existence of the

effect can be detected using the reduced-form estimates with fewer assumptions. Second, one can

explicitly solve for the endogenous and contextual effects using Result 2 in Appendix III. I provide

an empirical demonstration of the decomposition in Section 5.

3     Data Collection and Description

This paper uses a new dataset constructed from four sources. It consists of United States infantry-

men (nicknamed “Doughboys”) who had served together during WWI in the 313th Infantry Regi-

ment, Seventy-ninth Division. The sample used for estimation focuses on the 1930 United States

Census of Population labor market outcomes of about one-thousand veterans and their closest

neighbors (n=35,181).14 The men in the veterans sample were all drafted and had fought overseas.

I focus on the military company as the individual’s reference group and examine all those in his

unit. Men in these groups are likely to have a sense of affiliation, or unit pride, and all of the units

I examine consist of men who trained together, shipped to Europe, and fought together overseas.
   14
      I focus on the 1930 census, as it contains a question on veteran status, and has more detailed labor market in-
formation than the 1920 census. In addition, the 1920 census might be too close to the end of the veteran’s tour of
duty.

                                                             16
Ties forged during battle are likely to be meaningful.

       In addition to their military service records, the men were linked to two additional data sources:

the 1930 census, and their prewar draft registration card. Finally, for each of the men linked to the

Census of 1930, information about up to 60 of their nearest neighbors was collected. Table 1

provides some summary statistics for the various samples. As comparison for my sample, I also

provide the summary statistics for a similar sample from the public use 1920 and 1930 census

samples (Ruggles et al., 2008). Note that the unemployment rate for both my veterans sample, as

well as the comparison group of all white males from the census in 1930 is less than 7%, as the

effect of the Great Depression had yet to reach its peak.15 Second, the average age of the enlisted

men in my sample is 25 at the time of enlistment. This is due to the fact that the initial draft was for

those 21 and older. Finally, there is much variance in the types of occupations prior to enlistment.

       The linked dataset allows one to observe those who had served together in the same military

company (companies consisted of over 100 men) and to observe their and their neighbors’ postwar

outcomes in the 1930 census, while controlling for their prewar characteristics such as place of

residence and occupation. The 1930 Census includes information on labor market outcomes (e.g.,

employment, occupation, and industry), housing market information (e.g., ownership and housing

values), and various demographics, (e.g., age, race, parents’ place of birth, and immigration infor-

mation). The military service records provide information on place of residence prior to enlistment,

place and date of birth, ranks and promotions, citations and court martials, whether wounded, and

the (military) company affiliation within the regiment. The draft registration records were used to
  15
     Throughout the paper I use the employment variable as recorded (zero/one) in the main census schedule, following
the use and definition of the IPUMS 1930 (Ruggles et al., 2008). The original supplemental schedules were destroyed
right after the census was taken. The 1930 classification is not entirely consistent with the modern definitions used.
But since it is more likely to happen in certain less steady occupations, I address much of the potential discrepancy by
controlling for occupation (and group-level occupation).

                                                          17
obtain information on the men’s occupation prior to enlistment. The data collection procedure is

detailed in Appendix I.

       In addition to the methodological benefit of observing two different reference groups for each

veteran, the data structure also provides a large source of variation that is useful for estimation.

The combinations of veterans and neighbors groups creates a much larger variation than would be

available if only the military group or only the neighborhood-block were considered. As can be

seen in the last column of Table 1 Panel B, there are 938 different reference groups (neighbors and

veterans combinations). As to be expected, the standard deviation of group averages is smaller

than that of the individuals. However, there is ample variation for the estimations in Section 4.

       The military group used as the reference group in most of this paper is the military company

in which each veteran served during WWI.16 Figure 3 illustrates the organizational structure of a

typical infantry division in WWI. The full strength size of a company was over 100 men. Four

companies made up a battalion. There were 12 infantry companies (3 battalions) in each infantry

regiment, in addition to other supporting companies such as the Headquarters and Supply company.

A division consisted of many regiments, of which four were infantry, and had more than 20,000

men in it. During WWI, for the 313th Infantry Regiment I examine, the company level was almost

always the largest autonomous unit.17 While it was possible for different companies in a battalion

to be assigned to different locations or tasks, the companies themselves were almost never divided

and were always assigned as a group to a task. In addition, the company viewed itself as a cohesive

unit of reference. For instance, after the war many companies published their history. Similarly,

the most detailed level in the army’s records is the company level. For instance, the monthly unit
  16
     In exploiting the strength of camaraderie in the military, my work is closest in spirit to that of Costa and Kahn
(2003, 2007) who examine how group characteristics of units in the Civil War affect such measures as desertion and
survival in a POW camp.
  17
     This is based on the detailed account of Thorn (1920).

                                                         18
Figure 3: Infantry Division Organizational Chart

                      Notes: Number of men in parentheses.
                      Sizes are approximate and are for illustration of relative magnitude.

rosters and daily reports are all at the company level, and the administrative records contain no

reference to a more detailed level, such as the platoon.18 Companies are a small enough group to

allow its members to know each other. Yet, they are large enough to allow for a substantial number

of potentially useful ties. In Section 6, I look at a larger-scale group level, the battalion, and find

that it does not have a statistically significant effect after controlling for own company. This further

suggests that the military company is an appropriate reference group to examine.

3.1     Sample Design and Nature of Assignment

The sample design (i.e., the choice of units to examine) focused on minimizing selection bias.

Focusing on veterans of WWI presents several advantages compared to other wars in which the

United States took part. WWI was the first time a “modern draft” (Chambers, 1987) was instituted

in the United States,19 and most of the American army during WWI was comprised of drafted
  18
     I was able to view the actual monthly rosters and daily reports through a Freedom of Information Act request (see
Appendix I).
  19
     On April 1st 1917, prior to the draft, the Army’s size totaled 281,880 men. By November 11, 1918, it had grown
almost fifteen-fold to 4,185,220 (Maryland War Records Commission, 1933). Most of this increase, 2,810,296 men,

                                                                  19
men. The American experience during WWI was a relatively easy one, there was very little, if any,

prewar testing and therefore units exhibited large heterogeneity among members, and micro-level

data is available for both the military experience as well as the prewar and postwar outcomes of

the men. I describe the draft in more detail in Appendix II.

    The main advantage of looking at those drafted is that men were involuntarily assigned to

groups. This provides a rare opportunity to examine a case in which networks were formed due to

an exogenous shock, President Wilson’s decision to enter the war. To focus on the case in which

groups are involuntarily formed, I have chosen to look only at those units formed by the draft (the

majority of divisions were formed by draft).

    The second advantage of looking at WWI, is that while WWI’s casualty rate for the European

armies (and civilian population) was of great magnitude, the U.S.’s casualties were far fewer, as it

joined the war very late. For example, 50,300 US soldiers died in battle during WWI, compared

to 900,000 from Britain, and 1,385,000 from France (Maryland War History Commission, 1933).

WWI was also a relatively short war (less than 2 years for the American troops). For example, the

men in my sample were in the front lines for less than 60 days. Their experience is typical for an

American combat unit during WWI. The sample I have selected belonged to the Infantry Branch.

One might be concerned that men were selected into different military occupations based on their

ability or skills. It is important to note that the military during WWI was far less specialized in its

occupations, training, or needed skills than it is today or even during the Second World War. There

was little, if any, screening or testing on which assignment was based.20 The bulk of the fighting
came from those drafted.
   20
      The famous alpha and beta tests were developed during this era, but too late to be used by the army for the enlisted
men in this sample. Even for those units for which the test was used, the goal was to balance the distribution of test
results across units (Kevles, 1968). Similarly, psychopathology tests were pioneered during this area, but were not
used (DuBois, 1970).

                                                           20
men were infantrymen.21

    Although the draft was random and inclusive, the regiment examined here is typical in that

during WWI, most drafted men ended up in units with a large regional component (usually several

states). For example, many of the men of the 313th Infantry, the sample used in this paper, were

from Maryland.22 Using the service records of all Maryland veterans (n=62,724) I examined the

proportion of men from each city who ended up in my sample. 253 Maryland cities are represented

in the regiment I examine, and the (weighted) proportion of men that ended up in the regiment I

consider is 2.9%. Further, among the 120 cities and towns that had more than 50 WWI veterans, in

only one case did a town (Relay, MD) supply more than 10% of its veterans to this regiment (they

had sent 6 men to two different companies in my sample).

    I chose to focus on a regiment that was largely drawn from one region as this was the common

way in which regiments were formed and there is a higher likelihood that the bonds formed during

the war would be of use postwar. However, as can be seen in Figure 4, there is ample variation

in prewar location both within Maryland as well as other states.23 This is in contrast to other

armies (such as the British Army during WWI) or other eras (e.g., The American Civil War) where

companies were largely formed from one village or a part of a city. Each company in my sample
   21
      The infantry was by far the largest branch with over 1,000,000 men in 1918 (Maryland War History Commission,
1933). In addition, the training of the infantry did not vary much from that of, say those in the artillery. The skills
acquired during the war, especially those transferable to the civilian labor force, were fairly homogenous across mili-
tary professions, and quite limited. In my sample, men had trained in the United States for only a few months before
shipping to France.
   22
      About 40% of them were from Baltimore, though they represent only 2% of all Baltimore veterans who served in
WWI.
   23
      To illustrate, consider all the men who had lived in the last 5 blocks (2500-3000) of Woodbrook Avenue, Balti-
more, Maryland. Twenty-five men from those 5 blocks served during World War I. Only five of them ended up in the
313th Infantry Regiment (the others were not even in the same division). Of the five, one served in the Headquarters
Company, one served in company G, and two served in company I (the fifth was transferred to a different unit before
shipping). Within company I, only eight others were from the same draft board that covered an area of 30,000 people.
The rest of the company consisted of 50 men from Baltimore, and another 52 from one of 14 towns in Maryland, and
yet another five were from Ohio, etc. Those who had come from small towns and villages were even less likely to
have prior connections with others. For instance, in the same Company I, there were only 3 men from Hagerstown,
the third largest city in Maryland.

                                                         21
Figure 4: Distribution of county of residence prior to assignment across companies

has representatives from at least 16 counties across the US (and as many as 31 counties). Within

large cities, no company draws all of its members from only one part of the city (as defined by

30,000-sized partitions of each area).24 To address any concerns regarding the geographical nature

of the assignment, I control for the prewar location of the veteran (including city partitions) in all

specifications. In Section 6, as a robustness check, I show my findings hold for those who came

from areas in which few if any other soldiers ended up in the same unit.

       Because the nature of the assignment is crucial, I further investigate its properties. Panel A of

Table 2 reports the results of a chi-square test for the null hypothesis that the number of people

with a certain characteristic were randomly assigned among the companies. I examine all prewar
  24
     Two main factors affect the final assignment. First, men were sent from their draft boards to the units in groups
(for example, multiples of ten men). As such, a small town that should have only had 5 men, if assignment were truly
proportional, might end up with 10 in a unit. Second, there was a great deal of arbitrary movement between units
and even divisions. For example, in my sample, 40,000 men had originally been assigned in the first few months to a
division with the final size of 20,000.

                                                         22
variables for which there are on average more than 5 outcomes for each company. These variables

include age, place of birth, parents’ place of birth, and prewar marital status. For none of the

14 variables is the null rejected, suggesting that these characteristics are consistent with random

assignment. Panel B of Table 2 reports the results for two variables which have multiple possible

values, the gain in occupational income score (as a proxy for ability), and year of birth. The

occupational income score ranges from 3 to 80, and is a measure of a job’s median wage.25 A

Kruskal-Wallis rank-sum test does not reject the null hypothesis that the distributions are consistent

with random assignment for those two measures (p-values of 0.28 and 0.52). Finally, variation in

prewar place of residence could be driving the distribution of observables. Panel C reports the

results of a multinomial logit specification predicting assignment to units. The table reports the p-

values associated with testing whether each coefficient is different than zero. Even after controlling

for prewar location, the remaining observables are consistent with random assignment.

       In Panel A of Table 1, I examine the representativeness of the sample by comparing the dis-

tribution of occupations and the occupational income score of my sample prior to enlistment to a

sample based on the 1920 Census. I consider white males ages 20-30. My sample of veterans and

the 1920 Census sample population are similar. This is not all that surprising considering that the

draft drew on a large part of the United States male population in that age bracket.
  25
     This measure is based on the median wage of an occupation in 1950. While the measure is not a perfect proxy,
the purpose here is to examine the distribution. The results would still hold even if the scores of some occupations had
changed over time.

                                                          23
4         Results

I present two approaches for estimating the social effect. First, in Section 4.1, I consider a reduced-

form specification that uses fewer assumptions. I show that the group effects for both the military

reference group and the neighborhood block group are statistically significant. As discussed above

these results are potentially due to both endogenous and contextual effects. In Section 5, I make

use of the MRG framework which allows for estimation of both the endogenous and contextual

effects.

         In the sample I consider, men had direct contact for a period of about one year. Though this does

not affect the validity of the analyses and results, it is important to consider whether postwar social

interactions were plausible. First, there are several pieces of evidence regarding social interactions

during the postwar years (e.g., unit-specific veteran associations). Many units held reunions and

published yearly newsletters and, more importantly, detailed contact directories.26 For the units I

examine, I am aware of several formal organizations and associations of this kind.27 Second, the

geographical nature of the sample suggests that interactions were feasible. Postwar residence, like

any migration decision, is endogenous. Therefore, one must be cautious of using geographical

weights for the networks measure, and throughout this section the outcomes of all group members

are equally weighted. Further, I show that the network effect persists even for those who did not
    26
      The Maryland Historical Society have in their collection a photograph taken in 1923 at the Reunion Banquet of
the 313th Infantry Regiment, the regiment used in this paper.
   27
      The Seventy-ninth Division, to which all the men in my sample belonged, had established an association and held
its first convention in 1921. The stated purpose of the association was: “...to promote fellowship among comrades who
gave their all for their country.” The president of the association, H. Harrison Smith, also acknowledged that: “Many
of the Regiments, Battalions, and even Companies within the Division have formed their separate organizations or
associations for just this very purpose.” Smith had set forth a plan with ten goals, of which the third was: “Creating
an efficiency in the matter of bringing up to date and keeping current the rosters of the different individual units.” In
the opening paragraph of the published division history, Major General Kuhn wrote: “This history has been prepared
primarily for you in order to preserve the ties of comradeship formed during strenuous days of training at home and
stirring incidents of campaign abroad.” (Barber, 1922)

                                                          24
come from the same area and hence cannot be attributed to any prewar locational correlation. I

find that a large proportion of the sample had moved postwar and that the migration patterns are

consistent with peers affecting the migration decisions.28

       Though beyond the scope of this paper, the migration patterns provide a plausible intermedi-

ate channel through which networks operate, though information could have flowed through other

channels.29 Two types of specifications allow me to address the endogeneity of the migration de-

cision. The first are those specifications in which only prewar variables are used as dependent

variables. These specifications show that a group’s pre-assignment measures have a statistically

significant effect on an individual’s employment. Second, for those specifications that use postwar

group characteristics, the results focus on the entire group, equally weighted, regardless of pre-

or postwar location and include controls for prewar place of residence.30 In addition, recall that

the MRG specification allows for identification even if unobservables are correlated with neigh-

borhood characteristics (Proposition 2). Such correlation could arise if veterans were migrating

and selecting based on neighborhood characteristics. The formal veteran organizations and the

postwar geographic proximity of the veterans suggest that social networks formed during the war

could have been utilized during the postwar years.

4.1       Baseline and Reduced Form Estimates

I first consider a baseline case assuming there are no peer effects. Table 3 and columns 1-3 in Table

4 report the results of one’s own likelihood of being employed in the 1930 Census as a function of
  28
      Examining the migration decision across counties between 1917 and 1930, I find that 38.8% of the sample moved
across county lines. Of those who moved, more than 80% moved into a county in which other company members
resided.
   29
      For instance, in 1930, the ratio of phone lines to households was two-thirds (US Historical Statistics, 1960).
   30
      In analyses not reported in this paper, I find that the social effects may be somewhat stronger when geographically
weighting the network though the entire effect is not solely due to this geographical weighting. See also Section 6.

                                                          25
various control variables. The functional form used in these tables is the probit.31

       The results in Table 3 illustrate some of the factors that affect the likelihood of employment for

my veteran’s sample. Column 1 includes one’s age, whether married, and military rank controls.

The results for age and age-squared are statistically significant and have the expected signs. Those

married are 4.8 percentage points more likely to be employed. Almost all of the men in the veterans

sample examined were enlisted men, and I include controls for their rank (the ranks are private,

private first class, corporal, sergeant, supply or mess sergeant, and sergeant first class).32 Those

who had reached a higher rank were more likely to be employed. In all specifications, the largest,

and sometimes only statistically significant effect is for corporals. Those who had reached the rank

of corporal during the war are about 5 percentage points (depending on the specification) more

likely to be employed in the 1930 Census. This is consistent with promotions not being random,

but rather being correlated with ability which could be transferred to the civilian labor market.

Similar results hold when number of promotions during service is used as a control instead.

       Column 2 includes only variables which were determined prior to enlistment, such as occu-

pation prior to draft, and controls for pre-assignment (1917) county of residence and birth state.

Those for whom both parents were immigrants are 5.3 percentage points less likely to be employed.

Column 2 also includes an indicator of whether or not the veteran reported being unemployed dur-

ing 1917, when filling out his draft registration card. The measure is statistically significant and

has a large effect, however it may be biased as there was no actual question regarding employ-

ment. Since all men in the sample participated in battle, I examine in column 3 whether being

wounded has an important impact on the employment likelihood in 1930. In all specifications,
  31
     Similar results (not reported) hold for the logit functional form as well as the linear probability model.
  32
     The results do not change when commissioned officers are included, though I have omitted them from most
specifications since both the nature of their interaction and their assignment are likely to be very different than that of
the enlisted men.

                                                           26
being wounded does not have a statistically significant effect on employment (though the sign is

negative, as expected, the effect is quite small).33 Column 3 also includes whether individuals

received a citation or were awarded a medal. Column 4 includes as a proxy for the company’s

experience during the war: the wounded rate (self excluded) and fatality rate. Neither measure is

found to be statistically significant, though the standard errors are quite large. Therefore, the spec-

ifications in other tables will include the unit war experience as an additional control. Columns 5

and 6 add variables from the 1930 census, such as controls for the labor market prospects a veteran

might face, including occupation categories, unemployment rate in the county,34 and whether the

house value or rent payment are in the top quartile, as a proxy for wealth. The county unemploy-

ment rate is included in the specifications below as a control for local labor market conditions,

though is not found to be statistically significant in many of the specifications. Column 6 includes

county of residence fixed effects.

       To examine whether the sample of veterans is unique in some manner, and to determine the

important neighborhood-level measures, I present a similar specification for a more general sample.

Table 4 examines the neighborhood sample, which is a more heterogeneous sample, and I include

controls for race, sex, veteran status, etc. The results are similar across the two samples. For the

neighborhood sample I also include specifications in which only males are included (columns 2

through 5). This circumvents some of the issues related to women’s labor supply. The results

remain the same, as in 1930, most women were not part of the formal labor force.

       I next consider the reduced-form specification adding measures at the group-level correspond-
  33
      Severe wounds do not even seem to affect one’s likelihood of being in the labor force (or even of survival until
1930). Note that there are several possible reasons for this finding. First, the percent of those in the sample who were
severely wounded is less than 3%. Those slightly wounded might not be affected by the wound. Second, there might
be a selection bias involved, and we only observe those who had survived, and therefore recovered, whereas those who
didn’t survive (or recover) would not be in the sample (or the labor force).
   34
      Calculated from the 1930 Integrated Public Use Microdata Series (Ruggles et al., 2008).

                                                          27
ing to equation (2.4) discussed in Section 2:

                                                                      A
         Yi;g =   + Xi;g +           1 X i;A   +   2 X i;B     +   3 X i;B   +   i;g   (i = 1::ng ; g = 1::G)             (4.1)

where Xi;g are own characteristics, X              i;A   are company-level measures, X               i;B   are the neighborhood-
                           A
level measures, and X          i;B   are the compounded-level measures (neighborhood averages, averaged

among all unit members). I find that the characteristics of other military unit members, as well

as the characteristics of other neighbors have a statistically significant effect on employment. In

addition, I find that many of the compounded (average of average) group characteristics, such as

the average of the average neighborhood-level, have a statistically significant effect on one’s like-

lihood of employment (corresponding to                   3   6= 0). Recall that these variables are constructed as an

average of the characteristics of hundreds or thousands of others, and are likely to be uncorrelated

with one’s individual unobserved characteristics. In Section 5 I demonstrate how to derive and

compute the size of the economic-effect of these group effects.

       The reduced-form results, for a linear probability specification of one’s likelihood of employ-

ment, are reported in Table 5.35 Besides controls such as age, marital status, occupation, and

various state and county dummy variables, there are age, age-squared, and marriage rates for vari-

ous group levels (own, neighborhood, military unit, and average of neighborhoods across military
                                                                                       A
unit). The variables corresponding to the compounded average X                             i;B ;   the average across the unit

of the average neighborhood level, are those labeled ix through xii. As discussed in the next sec-

tion, the statistical significance of those coefficients suggests the existence of an endogenous social
  35
     In the case of the reduced-form specifications I obtain similar results using the probit functional form. For exam-
ple, see Table 7 column 3. The linear probability specification is not needed for the reduced-form specification but is
useful for deriving and computing the different social effects, as derived in Appendix III.

                                                                28
You can also read