Within Firm Supply Chains: Evidence from India - Harvard ...

 
Within Firm Supply Chains: Evidence from India∗
                                  Shresth Garg† Brandon Tan‡ Pulak Ghosh

                                                       January 2021

                                                          Abstract

          Vertical integration is central to understanding patterns of economic activity, but there has been limited
      empirical work measuring the extent to which firms own and utilize direct upstream and downstream production
      links for sourcing physical inputs. We use administrative data from Karnataka, India on the universe of good
      shipments between any two establishments to answer this question. We can identify if two establishments are
      under joint ownership allowing us to map the flow of goods both within and across firms. We calculate that 11%
      of input value can be potentially sourced from vertically integrated upstream establishments. We find that 38%
      of products are sourced by establishments exclusively from within the firm when a vertically integrated supplier
      exists, while the rest are almost entirely sourced exclusively from outside the firm. This suggests that the supply
      of physical goods along the production chain is an important rationale for vertical integration in India. Our data
      allows us to improve upon the methodology employed in the literature so far to measure within-firm trade. We
      highlight two sources of measurement and aggregation bias in previous studies, and show that there is a large
      impact on the results. Next, we quantify the extent to which firm boundaries serve as barriers to trade in our
      context by estimating a gravity specification. Finally, we look at factors associated with the decision to source
      a given product from within and find that firm size, physical distance to outside and within firm suppliers,
      frequency of input requirements, product relationship specificity, volume, R&D requirements and competition
      both upstream and downstream are important factors.

  ∗ We thank Enghin Atalay, Pol Antras, Dave Donaldson, Edward Glaeser, Nathan Hendren, Michael Kremer, Marc Melitz and Chad

Syverson for helpful discussions and comments. We thank the State Government of Karnataka for providing the data to Pulak Ghosh.
   † Department of Economics, Harvard University; garg@g.harvard.edu
   ‡ Department of Economics, Harvard University; btan@g.harvard.edu

                                                               1
1     Introduction

Vertically integrated firms play a large and important role in the economy. There exists a rich literature focused on

integration decisions and their consequences, but we know very little empirically about the nature of these vertical

relationships. Recent work has found that upstream units ship very small shares of their output to their firms’

downstream establishments. Atalay, Hortacsu, and Syverson (2014) (henceforth AHS) looks at the question within

the US, while Ramondo, Rappoport, and Ruhl (2016) (henceforth RRR) study within-firm trade in multinational

corporations. In this paper, we revisit the question of within firm trade, using detailed administrative data on the

movement of physical goods both within and outside the firm from Karnataka, India.

We make five contributions. First, we document the extent to which direct production links exists within a firm

and find that downstream establishments can potentially source around 11% of their input value from an integrated

upstream establishment.1 Second, we measure trade along these production links and find that 38% of products

are sourced by establishments exclusively from within the firm when a vertically integrated supplier exists. The

rest are almost entirely sourced exclusively from outside the firm, with only 4% being sourced from both within and

outside the firm. Third, our data allows us to improve on the methodology employed by the existing literature.

We aggregate our data to the level available in previous papers and replicate their measurement procedures. We

decompose and quantify the sources of biases from aggregation and show that the extent of mis-measurement is

large. Fourth, we quantify the extent to which firm boundaries serve as a barrier to trade, relative to distance,

using a gravity specification. Finally, we explore the factors are associated with higher within firm transactions.

The data used in this study comes from administrative tax records from Karnataka, one of the largest states in

India with a population of over 61 million and a GDP of over 220 billion USD. In India, every registered business

is required to submit an electronic document (known as an e-way bill) to the government prior to any movement

of goods valued above a threshold of Rs 50,000 (∼ $700). Karnataka was the first state to roll out this system at

the intra-state shipment level, starting April 1, 2018. Our data covers the universe of bills from April 1, 2018 to

August 29, 2019. For every shipment within or through the state, we see the identity and location of the sending

and receiving party, the products in the shipment, and the value of every product in the shipment.

Our first set of results pertain to the existence of within firm production links. A within firm production link is

defined to exist if a location in the firm sources a product which is sold by some other location operated by the same

firm. We find that 11% of total input value can be potentially sourced from within the firm. Note that the existence

of a within firm link does not necessarily imply that the firm will use the link to source the product. We estimate
   1 An establishment is an economic unit owned by a firm and is distinct from other establishments based on location of operation.

We define a vertical link between two establishments of the same firm, if one establishment outputs a product which is an input to the
other establishment. For establishments which output multiple products, we treat vertical links for each product separately, and our
main analysis is at the establishment - product level.

                                                                  2
that 38% of products are sourced by establishments exclusively from within the firm when a vertically integrated

supplier exists, while the rest are almost entirely sourced exclusively from outside the firm. Very few establishments

source a given input from both within and outside the firm. In line with existing work, we also measure the share

of products which an establishment ships exclusively to integrated downstream buyers, conditional on having such

a buyer within the firm. AHS find that 1.2% of upstream units ship all their output to their firms’ downstream

establishments. RRR find measure that 10% of MNC affiliates are exclusively dedicated to supplying other parties

in the corporation. In our setting, the share is much larger at 40%.

We run a series of robustness checks to verify our main results. The first measurement challenge is identifying

establishments which are producers of the good or use the good as an input in our data.2 We define an establishment

as a buyer of an input if its total inward shipment value of that product exceeds its total outward shipment value

multiplied by some threshold. Similarly, we identify an establishment as a seller of an input if its total outward

shipment value of that product exceeds its total inward shipment value multiplied by some threshold. Our results

are robust to the thresholds that are picked at this stage. Next, we identify a production link by the existence of

a seller and buyer of the same product within the firm. Our results hold when we require the upstream seller to

be of a sufficient scale, relative to the downstream demand. The results are also robust to excluding products that

are not an establishment’s ‘primary input’, suppliers that do not ship frequently, and suppliers that are not within

a buyer’s district.

For our main analysis, we use four-digit product codes. We provide additional results using different levels of

aggregation for products. First, we aggregate all the products in 21 different broad categories and provide results

on the share of within-firm sourcing for each category. Second, we ensure that our 4 digit product categories are

sufficiently narrow. We repeat our analysis for observations which report an 8 digit product code. As larger firms

are more likely to report 8 digit product codes this sample selection is endogenous. Reassuringly, the proportion

of within-firm sourcing remains approximately the same.

Our results differ from earlier work in at least two respects.

First, AHS observes only the outflows of products and RRR only observe the reported share of intra-MNC trade

for an establishment, while we observe all inflows and outflows at the product level identifying both the sending

and receiving party. This improves the analysis in two ways. First, for every downstream establishment we

observe all the physical inputs it sources, allowing us to construct establishment level input requirements instead of

relying on industry wide input output tables. This allows us to conduct the analysis at the establishment-product

level instead of the establishment level and take into account within industry variation of input use and output.

Second, as we observe the sending and receiving party for each transaction we can classify each transaction as
   2 For   example, we want to exclude intermediaries.

                                                          3
either internal or external - AHS instead classify a transaction as internal if it is shipped to a ZIP code when the

firm has a downstream establishment. To better illustrate the difference in methodologies, we re-run our analysis
                                            3
using aggregate input output tables             and classify shipments as internal based on destination ZIP code. We find

substantial bias in the measurement of within firm trade in both directions. In the replication, we find close to

114,000 establishments being classified as upstream of which only 44,000 would be classified as such when input

use downstream is taken into account, downward biasing our estimates by more than two-fold. Additionally, only

80% of shipments to a ZIP code with a downstream establishment is within the firm, upward biasing our estimates.

Second, our data comes from a developing country with potentially higher contracting frictions leading towards

more integration and within firm sourcing. Indeed, Boehm and Oberfield (2018) document that firms in Indian

states with weaker contract enforcement are more likely to source intermediate inputs from within. To quantify

these frictions we adapt the methodology developed by Atalay, Hortaçsu, et al. (2019) to our setting. In our

model, a downstream establishment chooses to source an input from a set of potential upstream establishments

that we observe in the data. The sourcing decision is affected by distance from the upstream supplier, whether the

upstream supplier is integrated and whether the upstream supplier is in the same state. On average, a downstream

establishment has over 3,000 potential suppliers and the probability of sourcing from any one is small at 0.02%.

For the average firm in our data, a one standard deviation reduction in distance from the supplying establishment

increases the probability of sourcing to 0.026%, removing state border barriers increases the probability to 0.07%,

and vertical integration of the supplying establishment increases the probability to 2.03%. This illustrates that

firm boundaries are an important barrier for sourcing decision in our setting. Additionally, we directly replicate

Atalay, Hortaçsu, et al. (2019) and find that a vertically integrated establishment in a given destination has the

same effect on shipment volumes as a 91% reduction in distance – compared to the 60% reduction found in the

United States.

Finally, we explore the decision of utilizing existing within-firm production links. This is an important margin as

many firms face this decision. 60% of economic activity takes place in firms that are vertically integrated in at least

one product. To our knowledge, we are the first paper to explore this firm decision empirically. First, we find that

while it is attractive to source from an integrated supplier, the advantage decreases the further away the integrated

supplier is and conversely the closer an outside firm supplier is. Second, firms are more likely to source products

that are relationship-specific from within. Third, we find that higher sourcing from within-firm establishments for

products where R&D investment is important. Fourth, both the volume and frequency of shipment are important in

explaining the decision to source from within. Fifth, larger firms in terms of total value, number of establishments,

and number of products are far more likely to take advantage of their integrated supply networks. Last, more

competition upstream increases within-firm sourcing, while competition downstream reduces it.
   3 We   construct input-output tables at the product level.

                                                                 4
We also look at what is associated with the existence of an integrated seller for a given product, or the extensive

margin decision over vertical ownership. We find that larger firms are more likely to have an integrated seller.

More competitive product categories are less likely to have an integrated buyer. Finally, if a product is more R&D

intensive or more relationship-specific it is more likely to have an integrated seller.

The remainder of the paper presents our empirical results in more detail. It is organized as follows. In Section 2

we describe the data. In Section 3 we define variable definitions and construction. In Section 4 we present our

results on the ownership and utilization of vertically integrated links. We present the replication results in Section

5. In Section 6, we quantify the relative importance of distance, state borders, and vertical integration in affecting

the volume of trade. Section 7 presents results on factors associated with the decision to source from within. We

conclude in Section 8.

2     Data

In India, every registered business is required to submit an electronic document (known as an e-way bill)4 to the

government prior to any movement of goods valued above the threshold of Rs. 50,000 (∼ $700 USD). This includes

any good transported by road, air, railways, or water vessel. If the consigner is a registered taxpayer, they are

responsible for generating an e-way bill. If they are not registered, then generating the e-way bill becomes the

responsibility of the consignee or the person transporting goods. Notably, the bill is generated even if goods are

shipped to a different establishment within the firm. The law was introduced to increase tax compliance and reduce

shipping times. Government officials have the authority to intercept any conveyance to verify the e-way bill or the

e-way bill number for all inter and intra-state shipments. The penalty for non-compliance is Rs 10,000 (∼ $ 141

USD) or the value of tax-evaded, whichever is greater. In its first phase, the law covered only interstate shipments

and in later phases was expanded to include intra-state shipments as well.5

We use administrative data on e-way bills from the state of Karnataka. Karnataka was the first state to roll out

this bill at the intra-state shipment level, starting on April 1, 2018. Our dataset covers the universe of bills from

April 1, 2018 to August 29, 2019.

For each e-way bill, we observe the date of shipment, the tax ID (GSTIN) and ZIP code (PIN code) of the sender

and the receiver, distance, and the total value of the shipment.6 A given shipment can contain multiple goods. For

each good within a shipment, we observe its HS product code, its total value, and quantity. Firms report either 2,
   4 The filing of E-way bills was mandated with the rollout of Goods and Services Tax in India starting in 2017. There is a small but

growing literature studying the impact of the tax regime. See for example Agarwal et al. (2019) and Leemput (2020).
   5 For more information refer to the information provided at https:/cleartax.in/s/eway-bill-gst-rules-compliance.
   6 We top-code all values at the 99th percentile.

                                                                  5
4 or 8 digit HS product codes. For most of our analysis, we work with 4 digit code. We also repeat our analysis by

subsetting to observations for which we see 8 digit codes for robustness.

Our data includes all formal firms that ship goods in the State of Karnataka.

We provide some descriptive statistics in Tables 1, and 2. We observe over 86 million e-way bills (associated with

over 196 million product shipments) from around 1.2 million firms. An average firm operates in 1.76 locations, is

associated with 5 products, and makes 186 shipments in the period we observe. The average value of a shipment

is around Rs. 200,000 (∼ $ 2,820 USD). The average total value of outward shipments for a given firm is Rs.

15 million (∼ $ 210,000 USD). For each product, we observe on average around 5,500 establishments and 80,000

inward shipments.

3     Variable Construction

The section explains the variables we construct for our analysis.

3.1    Identifying Firm Establishments

For every shipment, for each party on both ends, we see the firm level tax ID (GSTIN) and the location at the

PIN code level.7 We define an establishment as a GSTIN-PIN code pair. For example, if we see a firm shipping

from 5 different PIN codes, we say that the firm has 5 establishments. Note that while we are not able to

separate establishments that operate in the same PIN code, these are geographic areas that are very small and

may effectively operate as a single entity. We only consider establishments located within the state, as we see their

complete transaction details.

3.2    Identifying Buyers and Suppliers For Each Product

Each shipment made by a firm is associated with a unique e-way bill. The firm may include multiple products

within a given shipment, but has to report the product code (HS code) and value of each product. Firms may report

2, 4 or 8 digit product codes depending on firm size thresholds. For our main analysis we remove observations at 2

digit level (which account for less than 0.4% of total value) and aggregate 8 digit codes to 4 digits. For robustness

we also report results for the subsample where we observe 8 digit codes. In the data, there are around 1,300 four

digit codes and 10,600 8 digit codes.
    7 Each PIN code is mapped to exactly one delivery post office. There are over 150 thousand distinct PIN codes in India and it

corresponds on average to an area of 21.22 square kilometers and a population of roughly 8000 people

                                                               6
It is possible that an establishment both ships in and ships out a given product. We want to identify establishments

which are producers of the good or use the good as an input. We classify an establishment as a producer or ’net-

seller’ if the total outward shipment value of that product observed in our data exceeds the total inward shipment

value multiplied by some threshold. In our preferred specification, we use a threshold of 1.2, so a given product is

produced by an establishment if its total outward shipment value is greater than 1.2 times its total inward shipment

value. Similarly, an establishment uses a product as an input or is a ‘net-buyer’ of a product if the total inward

shipment value of that product observed in our data exceeds 1.2 times its total outward shipment value. These

definitions also rule out whole-sellers and intermediaries for whom inward and outward values should be similar.

We provide results for three such thresholds for robustness (1, 1.2 and 1.5) and find that our results are not sensitive

to the choice of threshold.

In Table A1, we present some descriptive statistics on buyers and sellers. For a given product, about 80 percent of

the observed establishments are buyers and about 20 percent are suppliers. Each firm on average sells 1.5 products

and buys 7 products.

3.3    Measuring Vertical Integration

For our main analysis, we measure whether a downstream establishment has an upstream supplier within the firm

for an input. As we observe the entire input purchases of an establishment, we can accurately determine if an

establishment has an integrated net-seller for an input. An analogous variable construction follows for whether an

upstream establishment has a potential downstream buyer.

Mere existence of an integrated upstream seller may not allow the firm to source from within, if the upstream

seller is too small relative to the downstream buyer. We consider three alternate definitions of vertical integration.

In our most liberal definition (‘Upstream Integrated: Exists’), we determine that a net-buyer establishment has

an integrated upstream source for that product if there exists another establishment within the firm that is a

net-seller. A second measure (‘Upstream Integrated: Large’) requires that the largest of these integrated suppliers

be at a sufficient scale relative to the demand of the downstream establishment. As it is not clear what should be

the ‘sufficient scale’ required, we provide results for different threshold values, 0.5 and 1. A threshold value of 0.5

would mean that the largest integrated supplier’s total sales are at least 50% of what the downstream establishment

requires. Our preferred measure (‘Upstream Integrated: Total’) falls somewhere in between these two measures.

The scale is important, but it could be that multiple integrated establishments together can meet the requirements

of the downstream buyer. We sum over the out-value of all the upstream suppliers within the firm and ask if this

total supply is large enough relative to the demand of downstream establishment. Again, we provide results for

different threshold values, 0.5 and 1. Our preferred specifications use 0.5 as the threshold, so we determine that

                                                           7
a net-buyer has an integrated upstream source for a given input if the total sales of that input over all integrated

suppliers is at least 50% of what the downstream establishment buys.

Similarly, we determine whether an upstream establishment has a potential downstream buyer by checking for the

existence of a downstream net-buyer (‘Downstream Integrated: Exists’), and conditioning on establishment-level

scale (‘Downstream Integrated: Large’) or total scale (‘Downstream Integrated: Total’).

3.4    Measuring Utilization of Within-Firm Production Chains

Next we define variables to measure extent to which vertically integrated firms utilize their within-firm direct

production links. We define a shipment i to be internal if both the sender and receiver on the e-way bill have the

same unique tax ID (GSTIN).

Consider an establishment j, owned by firm f , purchasing a product p for which there is an upstream integrated

supplier. We compute the WithinSharejp as the value weighted share of internal shipments in all inward shipments.

                                                              Valueip ∗ 1{Senderi ∈ f }
                                                   P
                                                     i∈χ(j)
                                WithinSharejp =               P
                                                                i∈χ(j) Valueip

where Valueip is the value of product p in shipment i, Senderi is the sending establishment and χ(j) is the set of

shipments to establishment j.

We can also aggregate to the firm level and measure the extent to which a given firm f utilizes its vertically

integrated production links by taking the weighted average for all WithinSharejp , for establishment and products

with an upstream integrated supplier.

                                             h                                                        i
                                            WithinSharejp ∗ 1{UpstreamIntegratedjp } ∗ i∈χ(j) Valueip
                                P                                                      P
                                  jp∈∆(f )
           FirmUtilizationf =                      h                                           i
                                           jp∈∆(f ) 1{UpstreamIntegratedjp } ∗
                                          P                                    P
                                                                                i∈χ(j) Valueip

where FirmUtilizationf is the share of potential vertically integrated sourcing that is realized, ∆(f ) is the set of

all establishment-product pairs associated with firm f , and 1{UpstreamIntegratedjp } is an indicator for whether

an integrated upstream supplier exists.

                                                          8
4       Results: Ownership and Utilisation

In this section, we present our empirical results related to the ownership of vertically integrated production chains

and the utilisation of these networks.

4.1     Ownership of Vertically Integrated Production Chains

We first explore the extent to which firms own vertically integrated production links in Table A2. Using our

preferred measure we find that downstream establishments can potentially source up to 11% of the total input

value from an integrated upstream establishment (Table 3 Row 1). If we only consider firms that operate in more

than one location, or multi-establishment firms, 13% of the total input value can be sourced from an integrated

upstream supplier. Conversely, upstream establishments can potentially sell 10% of their total output value to an

integrated downstream establishment. This measure increases to 11% when we only consider multi-establishment

firms. Our results are robust to alternate definitions of vertical integration as described in Section 3.

We also show that a large share of firms in the economy are vertically integrated. Firms that can source at least

one product from within make up 61% of economic activity (Table 3 Row 2). In Table A3, we show that the results

are similar for various definitions of vertical integration and suppliers as described in Section 3.

4.2     Utilization of Vertically Integrated Production Chains

A large share of trade can potentially take place within vertically integrated firms. Next, we measure the extent

to which this trade materializes.

4.2.1    Baseline

Our main result is reported in Figure 1. The figure plots the distribution of WithinSharejp over all establishments

in our data, i.e. share of input-value sourced from within the firm conditional on the existence of a potential

upstream supplier. We find that most establishments choose to source a given product either entirely from within

or from outside, with few firms doing both. 38% of products are sourced by establishments exclusively from within

the firm when a vertically integrated supplier exists (Table 3 Row 3), 58% are sourced exclusively from outside the

firm and the remaining 4% being sourced from both within and outside the firm.

We also report the weighted average of WithinSharejp over all establishments in our data, where we weigh by

each establishment’s total purchase value of product p. Thus, each number represents the share of total trade that

                                                           9
takes place within firms, out of the total potential trade that can take place within the firm. In our preferred

specification, the average is 30% (Table 3 Row 4). We also report the unweighted average at 40% (Table 3 Row 5).

We also report robustness to various definitions of vertical integration and suppliers. As explained in Section 3, we

consider an establishment to be a net-buyer (net-seller) of a product if the value of inward (outward) shipments

exceeds the value of outward (inward) shipments by some multiplicative threshold. Figures A1 and A2 report

results for 1, 1.2 and 1.5 for different definitions of the having an integrated supplier. Figure A1 considers the

case when there is at least one large upstream integrated supplier, while Figure A2 considers the total capacity of

integrated sellers to define the existence of a link. We find that our results are similar across specifications (see

Table A4). Finally, Table A5 reports averages across establishments for various definitions of vertical integration

and suppliers. Columns 1 to 3 report the weighted average, ranging between 30% and 34%. Columns 4 to 6 report

the unweighted average, ranging from 39% to 40% for different specifications.

The following subsections further test the robustness of our results. Similar to AHS and RRR, we focus on the

proportion of products which are exclusively sourced from within the firm when an integrated upstream supplier

exits.8

4.2.2     Sample Selection Robustness

In addition to checking our measures with various threshold levels, we run a series of further robustness checks on

sample selection.

First, we remove from the sample the observations where the establishment sources the product less than three

times. This may represent a one-off transaction and it may not be worth it to source from within. The results

are reported in Figure A3a. Second, we consider potential suppliers only within a downstream establishment’s

district. It may be that distance prohibits firms from utilizing their within-firm production links. The results are

reported in Figure A3b. Third, we consider only the primary inputs for each establishment. It may be that minor

inputs are not important enough to be sourced within the firm. We define an establishment’s primary input to

be that with the largest total inward shipment value. The results are reported in Figure A3c. In all the cases we

find results similar to the baseline, with the proportion of products sourced exclusively from within-firm suppliers

ranging between 32% and 42%.
   8 We   report robustness of the weighted and unweighted average within-firm sourcing shares across establishments in Table A5.

                                                                 10
4.2.3    Robustness to Scale Requirement

Up to this point we have considered cases where the integrated suppliers must be of sufficient scale for the down-

stream establishment to consider sourcing from within. In Figure A3d we check the robustness of our results

to adopting a very liberal definition of having an integrated supplier. We consider an establishment to have an

integrated supplier if there is any net-seller of the input within the firm, irrespective of the scale of the net-seller.

The results remain similar to baseline with around 35% of products being sourced exclusively from within the firm

when a vertically integrated supplier exists.

4.2.4    Within Firm Downstream Selling

So far we have presented results looking at the sourcing decision of the downstream establishment. Conversely,

one can look at the utilization of vertical production network by an integrated upstream supplier, i.e., the share

of output value that is sold within the firm by establishment for which a vertically integrated downstream buyer

exists. These results would diverge if the size distribution of upstream and downstream establishments are different.

We report our results in Figure 2f. Similar to baseline, around 40% of products are sourced from within the firm

conditional on having an integrated upstream supplier. This measure is directly comparable to that of AHS and

RRR which find much lower shares of within-firm shipping. AHS find that 1.2% of upstream units ship all their

output to their firms’ downstream establishments. RRR find measure that 10% of MNC affiliates are exclusively

dedicated to supplying other parties in the corporation. We investigate the differences in our results in Section 5.

4.2.5    Firm Level Utilization

Up to this point, we have treated each establishment within a firm independently. One can also consider sourcing

at the firm level, dividing vertically integrated firms into firms that source at least one product from within and

those which do not source any products. We present the results in Table A7. Vertically integrated firms which

source at least one product from within account for 72 to 76% of economic activity amongst all vertically integrated

firms.

4.2.6    Results by Product Category

HS product codes can be aggregated into 21 product sections, which are broad in scope. We report results for 21

aggregated product categories in Table A9. Proportion of products sourced from within-firm, when an integrated

upstream supplier exists ranges from 18% to 60%. The number is lowest for stone, ceramic, glass products and

                                                           11
highest for fats and oils.

4.2.7    Robustness to Product Code Level

For most of our specifications, we use 4 digit HS codes to define our product categories. However, there may be

a concern that defining the product at 4 digit level is too broad. For example, we may wrongly classify a firm as

having an integrated supplier by looking at the 4 digit level, if products within a given 4 digit category are not

substitutable. This would downward bias our results. To see how much of a concern this is we repeat our baseline

analysis using the observations for which we have 8 digit HS codes. As larger firms are more likely to report 8 digit

HS code, the sample selection is endogenous.

We report our results in Figure A3e. The results remain similar to baseline with around 30% of products being

sourced exclusively from within the firm when a vertically integrated supplier exists. Panel D of Table A6 reports

that the mean share of within-firm sourcing, when weighted by value, is 41% while the unweighted mean is 32%.

These numbers are similar to our baseline results suggesting that product definition is unlikely to be a big concern.

5       Discussion

Our results indicate that within-firm sourcing is quantitatively important for vertically integrated firms. This

may not seem surprising given the vast theoretical literature on vertical integration to solve contracting problems

associated with the sourcing decision.9 However, empirical evidence for this claim is more mixed. In seminal work,

AHS show that vertically integrated US firms largely do not engage in sourcing of physical inputs. RRR find similar

results using data reported by MNCs. The difference between results can be attributed to at least two factors.

First, our data comes from a developing country with potentially higher contracting frictions leading to more

integration and within firm sourcing. Indeed, Boehm and Oberfield (2018) document that firms in Indian states

with weaker contract enforcement are more likely to source intermediate input from within.

Second, there are two possible sources of bias in the measurement methodology from AHS - the use of industry

level input-output tables to proxy for vertically integrated links (IO Proxy) and using shipments to ZIP code to

proxy for within-firm shipments (Location Proxy). RRR overcomes the second by using reported intra-MNC sales

shares, however still relies on industry wide IO tables to measure vertical links. Our data allows us to improve

upon their analysis by correcting for these biases.
   9 Theory puts forward many possible rationales for the existence of these relationships such as mitigating contracting frictions

(Coase (1937); Williamson (1971)), scale and scope economics (Stigler (1951); Novak and Stern (2009)), or strategic motives related to
consolidating or extending market power (Perry (1989); Rey and Tirole (2007); Bresnahan and Levin (2012))

                                                                 12
For every establishment in the data, we see the universe of product inflows and outflows during the sample period.

This allows us to conduct our main analysis at the establishment product level and construct accurate establishment

level input requirements. If the input requirement of an establishment differs from the average industry level input

requirement, then relying on industry level input output tables may lead to mis-classification of vertical links. For

example, consider an establishment that never sources a given input used by other establishments in the industry,

i.e., it is an input according to industry level input-output table. Also assume that there is a supplier of that input

within the firm. Relying on industry level input-output table method will mis-classify the establishment as having

a vertical link which it is not using. Instead, taking establishment level input requirement into account will avoid

this issue.

Further, we observe both the sending and the receiving party for every product shipment, allowing us to classify

each shipment as internal or external accurately. AHS classify a shipment as internal if it ships to a ZIP code where

the firm has a downstream establishment.

To understand if these differences drive the differences in results we replicate the measurement exercise in AHS and

RRR. Below we outline the replication procedure and the results. More details can be found in Appendix Section

A.

In our data, we do not observe the reported industry of each establishment, preventing us from using industry level

input-output (I-O) tables. Instead we construct an input - output table at the product level. To construct the

set of inputs used in production of product p consider all the establishments which output product p, and take a

weighted average of the inputs used by these establishment. Vertical links are defined are defined as I-J product

(industry) pairs, where I is upstream to J if I accounts for at least 1% of J’s input value.

AHS define a shipment as internal if the shipping establishment’s firm also owns an establishment that is both in

the destination ZIP code and in a downstream industry according to the input output table. We use the same

definition for defining internal shipments when replicating AHS.

5.1    Results

Figure 2 summarizes the proportion of trade that happens within the firm across methodologies and data. Figure 2a

presents the results from AHS on US data and Figure 2b presents the results from RRR on US MNC data. Figure

2f presents the results from our methodology and data, plotting the distribution of within-firm downstream selling,

i.e., within-firm sales when a downstream integrated buyer exists.10 Both AHS and RRR find very small shares

of within-firm shopping. AHS finds that 1.2% of upstream units ship all their output to their firms’ downstream
  10 Note   that this is different from our baseline, where we look at within-firm sourcing when an integrated upstream seller exists.

                                                                    13
establishments, while RRR finds that 10% of MNC affiliates are exclusively dedicated to supplying other parties in

the corporation.

Figure 2e presents the results using the methodology from AHS on our data from India (IO and Location Proxy).

The proportion of upstream establishments shipping exclusively to integrated downstream establishments goes

down from 40% using this paper’s methodology to a little less than 25%. Figure 2c presents the results using the

methodology from RRR on our data from India (IO Proxy). The proportion of upstream establishments shipping

exclusively to integrated downstream establishments goes down from 40% using this paper’s methodology to around

15%.

5.2     Sources of bias

There are two different sources of bias that we have identified - the use of industry level input-output tables (IO

Proxy) and using shipments to ZIP code to proxy for within-firm shipments (Location Proxy). In this subsection

we explore whether these biases are quantitatively large.

The first bias is associated with using industry level IO tables to construct vertical links instead of taking into

account actual input use by downstream establishments. We find that IO tables vastly overestimates the number

of vertical links, which will reduce the share of within firm shipments for the firms at every quantile. Using the

product level IO tables, 114,128 establishments are classified as upstream as opposed to 74,408 when input use

downstream is taken into account. 41,244 establishments are common in both classifications.

The second bias stems from proxying for within firm shipments by shipments to ZIP codes with a downstream

establishment. 80% of shipments to a ZIP code with a downstream establishment is within the firm. This number

reduces to 51% if we weigh by the value of shipments.

To assess the effect of these two biases, we present our main results while allowing for one bias at a time. Figure 2c

is the distribution when using IO tables and Figure 2d is the distribution when using ZIP codes to infer within-firm

shipment.11 As we can see, these biases act in opposite directions. The proportion of upstream establishments

shipping exclusively to integrated downstream establishments goes down from 40% to around 15% when using

industry level IO tables to construct vertical links. The proportion of upstream establishments goes up to 46%

when proxying for within firm shipments by shipments to ZIP codes with a downstream establishment.
   11 Industry level IO tables precludes the analysis to be conducted at the establishment product level. Thus the Figure 2c is plotted

at the establishment level.

                                                                  14
6         How Valuable are Vertical Links?

In this section, we develop a model of firm sourcing and estimate a gravity specification to quantify the extent to

which firm boundaries are barriers to trade relative to distance and state borders.

Our model is based on a revealed preference argument, where establishments trade-off sourcing from integrated

suppliers with distance and being in the same state. Specifically, from observing bilateral trade flows, we mea-

sure how much more likely is a given establishment to source from a vertically integrated supplier, relative to a

geographically close supplier in terms of distance or a within state supplier.

As contractual frictions may be higher in the developing world, we also test the extent to which firm boundaries are

larger barriers to trade in our context. Atalay, Hortaçsu, et al. (2019) estimate a similar gravity specification using

data on seller-destination trade flows and find that the elasticity of bilateral trade flows with respect to the addition

of a same-firm establishment in a destination is 0.89.12 Having an additional vertically integrated establishment in

a given destination ZIP code has the same effect on shipment volumes as a 60% reduction in distance. In addition

to our main specification, we also replicate their analysis with our data for comparable results.

6.1       Model of Sourcing

Establishment j decides to source a unit value input from K potential supplier. The suppliers are differentiated

by their distance to j, whether they are in the same state as j and if they are part of the same firm. All these

variables affect the cost of sourcing from the supplier. The cost minimization problem of establishment j is,

    min{α0 + α1 log(Distancejk ) + α2 1jk (WithinFirm) + α3 1jk (Within State)+
      k

                                       α4 1jk (WithinFirm) ∗ Distancejk + α5 1jk (WithinState) ∗ Distancejk + jk }                 (1)

where Distancejk is the distance in meters from establishment j to k,            1jk (WithinFirm) is an indicator function for
whether the supplying establishment is a vertically integrated, and 1jk (WithinState) is an indicator for whether the

supplying establishment is located within the same state and jk is an establishment-supplier specific idiosyncratic

shock.

We assume that jk follows an EV1 distribution, yielding the following expression for the probability that estab-
  12 Calculated by multiplying the WithinFirm coefficient (2.828) from a Poisson regression and    1
                                                                                                  1+r
                                                                                                        where r is the average number of
potential recipients in a destination (0.315).

                                                                 15
lishment j sources from k.

                                                                  exp(Costk )
                                               Pr(Sourcej = k) = P               ,                                                   (2)
                                                                   k exp(Costk )

where Costk is the deterministic part of the cost specification. The final estimating equation is,

  Pr(Sourcej = k) = exp(α0 + α1 log(Distancejk ) + α2 1jk (WithinFirm) + α3 1jk (Within State)+

                                          α4 1jk (WithinFirm) ∗ Distancejk + α5 1jk (WithinState) ∗ Distancejk + γj )                (3)

where γj is a fixed effect for buying establishment which absorbs the denominator in Equation 2.

6.2       Estimation

We estimate Equation 3 via a Poisson regression.13 For each downstream establishment purchasing a product p,

we define the set of potential suppliers to be all net-sellers of p that operate at a sufficient scale relative to the

demand of the downstream establishment. As before, we define net-sellers of an input to be establishments with

total outward shipment value exceeding 1.2 times its total inward shipment value, and the sufficient-scale condition

requires that total sales of the input be at least 50% of what the downstream establishment buys.

In our baseline specification, we fix α4 and α5 , the coefficients on the interaction between Distance with W ithinF irm

and the interaction between W ithinState with W ithinF irm, to be equal to 0. We present our estimates in Table

4. The elasticities of bilateral trade flows with respect to Distance and WithinState is -0.12 and 1.3 respectively.14

The elasticity with respect to within-firm ownership is 4.6. To interpret the coefficients, we consider the average

establishment in our sample which has many potential suppliers for each of its inputs and a baseline probability of

sourcing from any given supplying establishment at 0.02%. A one standard deviation reduction in distance from

the supplying establishment increases the probability to 0.026%. Moving the supplying establishment within state

increases the probability of sourcing to 0.07%. Vertical ownership increases the probability of sourcing to 2.03%.

Vertical integration of a supplying establishment increases the probability of sourcing much more than either a

reduction in distance or being within the same state indicating that firm boundaries are particularly important

barriers to trade in our setting.
  13 We   code distance between two establishments in the same pincode as 1km.
  14 Our   coefficient on distance is smaller than similar estimates from papers on international trade (see Disdier and Head (2008)).

                                                                   16
In Table 5, we estimate α4 and α5 , the coefficients on the Distance−W ithinF irm and W ithinState−W ithinF irm

interaction terms. We find that α4 has a positive coefficient. This indicates that the relationship between vertical

integration and the trade volume is stronger for more distant locations. Similarly, we find that α5 has a positive

coefficient suggesting that the relationship between vertical integration and the trade volume is stronger for within

state sourcing.

We also consider the analogous decision to sell to potential buyers from the perspective of an upstream supplier.

We present our estimates Tables A17, and A18 and find similar results.15

We replicate the results in Atalay, Hortaçsu, et al. (2019) with our data in Table A19. In our setting, the elasticity

of bilateral trade flows with respect to operating a same-firm establishment in the destination is 0.97.16 This implies

that having a vertically integrated establishment in a given destination has the same effect on shipment volumes as

a 91% reduction in distance. This is substantially larger than the 60% reduction found by Atalay, Hortaçsu, et al.

(2019) in the US context.

7     Factors Associated with Within-Firm Sourcing

In this section, we look at factors associated with both the intensive margin decision of sourcing from existing firm

network and the extensive margin decision of forming vertical links. Details on data construction for this section

can be found in Appendix Section B.

7.1     Product - Establishment Level

We present the results on correlations at product-establishment level in Table 6. In all specifications, we include

district fixed effects. We include product fixed effects for correlates that vary at the firm level, and product group

fixed effects when the correlates vary at the product level. We include robustness checks for each specification in

the Appendix and find consistent results.

In Columns 1 to 4 we explore how distance affects the firm’s decision to source from within.17 We find that doubling

the distance from where the input is sourced decreases within-firm shipments by around 5 percentage points. Thus,
   15 Atalay, Hortaçsu, et al. (2019) estimate a similar gravity specification for US firms. They find integration to be an important

driver of within firm sourcing as well.
   16 Calculated by multiplying the same-firm ownership share coefficient (2.411) from a Poisson regression and the average same-firm

ownership share given that a vertically integrated firm exists in the destination (0.402). Following Atalay, Hortaçsu, et al. (2019),
we carry out a simple calculation to compute the magnitude relative to distance: exp( 0.402∗2.411
                                                                                             −0.401
                                                                                                     ) where -0.401 is the coefficient on
log(distance).
   17 To measure distance between establishments which have a supply relationship we compute the distance as an average over all the

reported distances. For establishment pairs without supply relationships, we measure distance as the distance between their PIN code
centroids.

                                                                   17
input suppliers outside the firm are on average located further than the input suppliers within the firm. This is

consistent with different establishments within a firm locating close to each other. We also find that firms that are

more geographically dispersed are more likely to source from outside. Last, we look at the impact of distance to

net-sellers within and outside the firm. We find that doubling the average distance to integrated sellers reduces

within-firm sourcing by around 5 p.p., while doubling the average distance to outside firm sellers increases the

within-firm sourcing by 7 - 12 p.p. The attenuation in sourcing with increasing distance is higher for outside firm

suppliers, i.e., firms are less elastic to distance for within-firm suppliers.

In Column 4, we look at how product relationship specificity drives a firm’s decision to source from within. Using

Rauch (1999)’s classification, we find that if a product is listed on an exchange, it is 7 - 9 p.p. less likely to

be sourced from within. Thus, in line with theoretical predictions, firms are more likely to source more specific

products from within the firm.

In Column 5, we look at whether firms are more likely to source products that are R&D intensive from within. We

use measures of R&D intensity at the product level from Nunn and Trefler (2013). High R&D intensive products

may require upfront investment by the supplier. However, a non-integrated supplier does not internalize all of

the benefits of this investment. This makes it more likely that such products will be sourced from within. Using

the measure of R&D intensity from Nunn and Trefler (2013), we find that products which are more intensive in

R&D requirement are more likely to be sourced from within. Doubling the R&D requirement increases the share

of within-firm sourcing by about 8 percentage points.

In Column 6, we find that products which are sourced more frequently tend to be sourced from within-firm

suppliers. Our measure of the time-frequency of input requirement is ratio of the number of months for which an

inward shipment that carries the product is made and the total months which the establishment exists in our data,

i.e., the share of months with a shipment of the product. Going from sourcing the product once per quarter to

monthly increases the within-firm sourcing by 7 p.p. This can be due to multiple reasons. If there are contracting

costs associated with every shipment, then it may make sense to source from within. The product can also be a

crucial input into the production process and may lead to bottlenecks in production in case of delays. We also

explore the impact of the number of shipments in Column 7, and find a positive effect. Doubling the number of

shipments increases within-firm sourcing by 2.6 - 5 p.p.

In Columns 8 and 9, we explore how competition affects within-firm sourcing. We use the Herfindahl-Hirschman

Index (HHI) at the product level as a measure of competition. Upstream competition increases the options available

to the downstream establishment, making it less likely that the product will be sourced from within. On the other

hand, in the face of higher competition, the downstream establishment is a captured buyer for the upstream

establishment which may increase the amount of within-firm sourcing. In Column 8, we find that an increase in

                                                            18
upstream competition increases within-firm sourcing. A 0.1 increase in the HHI increases within-firm sourcing

by 4 p.p. In Column 9, we find that if the downstream firm faces a more competitive environment18 , it reduces

within-firm sourcing. One potential mechanism is that competition forces the firm to look for better inputs in

terms of quality which are more likely available outside. However, we do not test for the potential mechanisms that

can be at play in generating the result.

We present robustness checks for the above results in the Appendix.

7.2       Firm Level

Finally, we look at the impact of firm scale on within-firm sourcing in Table 7. Larger firms have more opportunities

to source from within and as they operate at a larger scale, stand to benefit more from sourcing from within. It

is also likely that larger firms have greater managerial capacity, and therefore, are more likely to exploit the

opportunity to source from within. On the other hand, larger firms are likely to have lower costs of contracting, as

they may have easier access to the legal system in the case of a dispute.

Empirically, we find that larger firms, measured in terms of value of shipments, number of products or number of

locations, tend to increase within-firm sourcing. Doubling the value of shipments increases the within-firm shipment

by 0.7 p.p., doubling the number of products that the firm deals in increases it by 1.6 p.p. and doubling the number

of locations that the firm operates in increases it by 5 percentage points.

7.3       Firm Integration Decision

Our results in the previous section focused on the decision to source conditional on the firm ownership structure. In

this section, we explore what is associated with the existence of a within-firm supplier, i.e., what firm and product

characteristics are associated with the decision to own a vertically integrated upstream firm. We report results in

Tables 8 and 9.

Table 8 reports the establishment-product level regression results with an indicator for the existence of a vertically

integrated seller as the outcome variable. The mean value of the outcome variable is 0.10. We find that doubling

the total value of trade for a product increases the probability of existence of an integrated supplier by 0.7 per-

centage points. Doubling the number of shipments made increases the probability by 2.5 p.p.. If a product is not

relationship-specific, i.e., if it is listed on an exchange, the probability that there will be an integrated seller goes

down by 1.1 p.p.. Market competition is associated with a lower probability of having an integrated supplier, with
  18 We   use the weighted average HHI across all the downstream firm’s products.

                                                                 19
an increase in upstream and weighted downstream HHI of about 0.1 increasing the probability of existence by 1.9

and 3.4 p.p. respectively. We do not find a significant coefficient for distance to outside sellers. Finally, doubling

R & D intensity is associated with a 10 p.p. increase in the probability of an integrated supplier existing

Above we reported that higher trade in a product is associated with a higher probability of having an integrated

seller. We also verify this at the firm level. At the firm level, we aggregate the existence of integrated buyers

across different product categories that the firm operates in by taking a weighted average. In Table 9, we show that

larger firms, in terms of, total inward shipping value, number of products and number of locations operated in, are

more likely to have an integrated seller. Doubling these measures increases the probability of owning a vertically

integrated supplier by 0.3, 0.7 and 4.5 p.p points respectively.

8    Conclusion

This paper uses detailed administrative data on the movement of physical goods both within and outside the firm

from Karnataka, India to measure the extent to which firms own and utilize direct upstream and downstream

production links for sourcing physical inputs. We find that firms can potentially source around 11% of their

input value from an integrated upstream establishment. Around 38% of products are sourced by establishments

exclusively from within the firm when a vertically integrated supplier exists. The rest are almost entirely sourced

exclusively from outside the firm, with only 4% being sourced from both within and outside the firm.

This result is in contrast to the existing literature which finds lower usage of within firm links. To better compare

our results to the literature, we aggregate our data to the level used in previous studies and rerun the analysis.

We find substantial bias in the measurement of within firm shipping when using aggregate data. Additionally,

we estimate a firm level gravity specification and find that firm boundaries serve as significantly larger barriers to

trade in our developing country context compared to the US.

We look at factors associated with the decision to source a given product from within and find that firm size,

distance to outside and within firm suppliers, frequency of input requirement, product relationship specificity,

volume, R&D requirements and competition both upstream and downstream are important factors. We also look

at factors associated with the ownership of an vertically integrated establishment and find that firm size, product

specificity, R&D requirements and competition matter.

                                                         20
References

Agarwal, Nikhil et al. (2019). “Market Failure in Kidney Exchange”. In: NBER Working Paper Series No. 24775,

   pp. 1–52.

Atalay, Enghin, Ali Hortacsu, and Chad Syverson (2014). “Vertical Integration and Input Flows”. In: The American Economic Rev

   104.4, pp. 1120–1148.

Atalay, Enghin, Ali Hortaçsu, et al. (Nov. 2019). “How Wide Is the Firm Border?” In: The Quarterly Journal of Economics

   134.4, pp. 1845–1882.

Boehm, Johannes and Ezra Oberfield (2018). “Misallocation in the Market for Inputs: Enforcement and the Orga-

   nization of Production”. In: National Bureau of Economic Research Working Paper Series No. 24937.

Bresnahan, Timothy and Jonathan Levin (2012). “Vertical Integration and Market Structure”. In: NBER Working Papers.

Coase, R. H. (Nov. 1937). “The Nature of the Firm”. In: Economica 4.16, pp. 386–405.

Disdier, Anne Célia and Keith Head (Feb. 2008). “The puzzling persistence of the distance effect on bilateral trade”.

   In: Review of Economics and Statistics 90.1, pp. 37–48.

Feenstra, Robert C and Gordon H Hanson (1996). “Globalization, Outsourcing, and Wage Inequality”. In: The American Economi

   86.2, pp. 240–245.

Leemput, Eva Van (2020). “A Passage to India: Quantifying Internal and External Barriers to Trade”.

Novak, Sharon and Scott Stern (2009). “Complementarity Among Vertical Integration Decisions: Evidence from

   Automobile Product Development”. In: Management Science 55.2, pp. 311–332.

Nunn, Nathan and Daniel Trefler (Oct. 2013). “Incomplete contracts and the boundaries of the multinational firm”.

   In: Journal of Economic Behavior and Organization 94, pp. 330–344.

Perry, Martin (1989). Chapter 4 Vertical integration: Determinants and effects.

Ramondo, Natalia, Veronica Rappoport, and Kim J. Ruhl (Jan. 2016). “Intrafirm trade and vertical fragmentation

   in U.S. multinational corporations”. In: Journal of International Economics 98, pp. 51–59.

Rauch, James E. (June 1999). “Networks versus markets in international trade”. In: Journal of International Economics

   48.1, pp. 7–35.

Rey, Patrick and Jean Tirole (2007). Chapter 33 A Primer on Foreclosure.

Stigler, George J (1951). The Division of Labor is Limited by the Extent of the Market. Tech. rep. 3, pp. 185–193.

Williamson, Oliver (1971). “The Vertical Integration of Production: Market Failure Considerations”. In: American Economic Revi

   61.2, pp. 112–23.

                                                         21
You can also read