Exact, Heuristic and Machine Learning Approaches to The Probabilistic Travelling Salesman Problem with Crowdsourcing

Page created by Ross Tran
 
CONTINUE READING
Exact, Heuristic and Machine Learning Approaches to The
 Probabilistic Travelling Salesman Problem with Crowdsourcing
 Alberto Santini1 , Ana Viana2 , Xenia Klimentova3 , and João Pedro Pedroso4
 1 Universitat Pompeu Fabra, Barcelona, Spain — alberto.santini@upf.edu
 2 INESC TEC and Polithecnic of Porto, Porto, Portugal — ana.viana@inesctec.pt
 3 INESC TEC, Porto, Portugal — xenia.klimentova@inesctec.pt
 4 INESC TEC and Universidade do Porto, Portugal — jpp@fc.up.pt

 March 5, 2020

 Abstract
 We study a variant of the Probabilistic Travelling Salesman Problem arising when retailers crowd-
 source last-mile deliveries to their own customers, who can refuse or accept in exchange for a reward. A
 planner has to identify which deliveries to offer, knowing that all deliveries need fulfilment, either via
 crowdsourcing or using an own vehicle. We formalise the problem and position it in both the literature
 about crowdsourcing and among routing problems in which not all customers need a visit. As a binary
 minimisation problem, we show that its objective function is highly non-linear and, to evaluate it, one
 needs to solve an exponential numbers of Travelling Salesman Problems. To address this complexity, we
 propose Machine Learning and Monte Carlo simulation methods to approximate the objective function,
 and heuristics to reduce the number of evaluations. We show that these approaches work well on small
 size instances and derive managerial insights on the benefits of crowdsourcing to customers.

1 Introduction
With the increasing growth of e-commerce worldwide, business models for last-mile delivery (LMD) of parcels
need to innovate and consider fast, cheap and reliable transportation to end customers. One possibility is
to crowdsource some deliveries, through digital platforms, to non-professional couriers who use their own
transportation means. While reducing costs and providing a source of income for people who might otherwise
be off the labour market [22], crowdsourcing also raises concerns about job quality, environmental impact [32]
and trust [26].
 One main motivation for this work is to investigate the practice of crowdsourcing delivery to end customers
as a system that takes advantage of the benefits of crowdsourcing while mitigating the major associated
drawbacks. We address the case of retail companies with physical shops that also sell online.
 For example, consider a supermarket chain that offers home delivery of groceries. In traditional LMD, a
professional fleet either owned by the chain or outsourced, would deliver the groceries. In current crowd-
sourced LMD models, the chain would have a platform where potential couriers enrol, get offers for some
deliveries, and accept or refuse the offers. This is the model of, e.g., Amazon Flex. Because there is no consol-
idation of parcels and couriers make journeys that they otherwise wouldn’t, this system generates both extra
traffic and emissions.
 Now imagine that the chain has a loyalty programme in which the clients enrolled provide their home
address. We defend a slightly different model where the crowd is composed of clients that are already in the
store and whose home address is close to one of the delivery points. Because the clients who accept were
already heading home, we reduce the number of cars associated to the ecosystem supermarket-delivery and
reduce the number of miles travelled. Under this model, the planner offers participating clients to deliver
someone else’s groceries in exchange for a discount. Since clients can accept or refuse the offer, at some point
in the day a supermarket vehicle will serve all deliveries which the planner did not crowdsource. A similar
scheme was first discussed in the work of Archetti, Savelsbergh, and Speranza [7], in which the authors

 1
consider that either a professional fleet or occasional drivers can carry out deliveries, but assume that the
occasional drivers will always accept offers.
 In this work we assume that each crowdsourcing fee (i.e., the discount offered to the customer) is fixed
and that the probability that some customer accepts a delivery during the day, provided the decision-maker
offers it, is also fixed and known. We also assume that the supermarket’s fleet comprises a single vehicle,
although this assumption is easy to relax.
 Under the assumptions above, our problem becomes a stochastic generalisation of the Travelling Salesman
Problem which we name the Probabilistic Travelling Salesman Problem with Crowdsourcing (PTSPC).
Other generalisations of the TSP which do not require to visit all customers have been studied extensively.
In particular, the literature considers the extreme cases in which the tour planer has either no power or total
power at deciding which clients shouldn’t be visited. In Appendix 2 we position the PTSPC in this spectrum,
as an intermediate problem in which the planner has partial power. We also provide a formal model and
discuss the intrinsic difficulty of the problem and the need to resort to advanced optimisation techniques to
scale problem size. Appendix 3 describes the state-of-the-art in LMD with crowdsourcing and details some
literature on related problems, including the Probabilistic Travelling Salesman Problem and the Probabilistic
Travelling Salesman Problem with Profits. Appendix 4 describes exact and heuristic approaches for the PTSPC
and Appendix 5 reports computational results on small and medium-size instances. Finally, in Appendix 6 we
draw conclusions and identify future lines of research.

2 Problem description
In this section we place the PTSPC in a spectrum of TSP generalisations which do not require to visit all
customers, along a gradient of “decision power” attributed to the tour planner. We also formalise the problem
and introduce the notation used throughout the rest of the paper.

2.1 Contextualisation
As mentioned in the introduction, the Probabilistic Travelling Salesman Problem with Crowdsourcing is an
intermediate problem between two extreme variants of the TSP which do not require to visit all customers.
At one extreme lies the case in which the tour planner has no choice over which customer will require a
delivery (we assume we visit customers to deliver some goods) and which will not, because a random variable
determines customer presence. This problem is the Probabilistic Travelling Salesman Problem (PTSP) [38]. In
the PTSP a decision-maker has to devise a tour to visit a set of delivery points, some of which might be later
revealed not to be available. Because the decision-maker doesn’t know in advance which customers will drop
out, they face two options. The first option, called the reoptimisation approach, is to wait until the status of
all customers is revealed and, at that point, plan a TSP tour visiting the customers requiring delivery. This
strategy can be impractical if there is not enough time to compute a solution after the outcome is revealed.
It can also be inconvenient for operational reasons, for example if it produces every day a radically different
tour while still visiting a similar set of delivery points. The second option, called the a priori approach, is
to first plan a tour visiting all the customers. When the stochastic outcome is revealed, the decision-maker
amends the solution skipping the deliveries that are not required, and performing the remaining ones in the
same order as they appear in the a priori tour. This is the approach most commonly used, introduced together
with the definition of PTSP in [35].
 At the opposite extreme there is the case in which the tour planner has total control over which deliveries
to perform, giving rise to the Profitable Tour Problem (PTP) [25, 27]. In this case, visiting a delivery point earns
a profit, while traversing an edge incurs into a cost. The objective is to select the deliveries and plan the tour
that maximise the difference between collected profits and travel costs.
 Intermediate cases arise when the visit requirements are stochastic, but the decision-maker has some
leverage on their outcome. There are many ways in which the decision-maker can affect the outcome of the
random variables associated with customer acceptance. In the PTSPC the planner has the power of forcing
a delivery with the own vehicle if they never offer that delivery. One could imagine more complicated in-
teractions in which, e.g., the decision-maker can increase (decrease) the discount offered to raise (lower) the
probability of customers accepting a delivery [11].

 2
As a motivation for such approach, in Appendix 1 we described the case of a supermarket chain crowd-
sourcing grocery deliveries. Another real-life scenario is that of an on-line retailer which operates lockers
where customers can pick-up their orders. When customers visit the locker to receive a parcel, they see a
message asking if they want to deliver another parcel in exchange for a gift card or coupon. As in the super-
market example, parcels not accepted for crowdsourcing are then delivered by the standard last-mile segment
of the retailer supply chain. Finally, we remark how this problem can have applications beyond LMD, e.g., in
social-care problems where pharmacies ask their customers to deliver medicines to their elderly neighbours.

2.2 Formalisation of the PTSPC
Consider a complete undirected graph = ( ′, ) with vertex set ′ = {0, 1, . . . , }. Vertex 0 represents
the depot, while = {1, . . . , } is the set of delivery locations. Let ∈ R+ be the cost of traversing an
edge { , } ∈ and assume that, if the planner offers delivery ∈ for crowdsourcing, there is a probability
 ∈ [0, 1] that some provider will accept the offer. In this case, the decision-maker pays a fee ∈ R+ to the
courier. Otherwise, if the offer is not accepted, the planner needs to visit with the own vehicle.
 We assume that probabilities are independent which, under our motivating example is a reasonable
approximation: we expect the number of potential couriers to be larger than the number of deliveries, and
to offer at most one delivery to each customer. Different assumptions might hold if the planner outsourced
deliveries to a logistic provider. For example, the provider could accept or refuse in block deliveries in a same
area.
 Denote with ⊆ the subset of deliveries offered for crowdsourcing and with ⊆ the set of accepted
offers, which is only revealed at the end of the day. The decision-maker has to decide which deliveries to offer
for crowdsourcing (i.e., the elements of ), assuming that they will reoptimise the end-of-day tour after set 
is revealed. In other words, they look for the set with the lowest expected cost with respect to the random
variable .
 This approach can appear similar to the reoptimisation approach for the PTSP, but the two differ in an
important aspect. In the PTSP the decision-maker cannot affect the outcome of the random variables: they
will wait for their realisation and then solve one single TSP over the customers requiring a visit. In short, the
reoptimisation PTSP becomes equivalent to a TSP on the operational level. On the tactical level, the PTSP
decision-maker can calculate the expected cost of the reoptimised tour, but still (different from our problem)
cannot act to decrease it.
 Let TSP( \ ) be the reoptimised tour of the PTSPC: the shortest simple tour starting and ending at the
depot, visiting all delivery points which were either not offered, or whose offer for crowdsourcing was not
accepted. Let \ be the total cost of such tour, obtained by summing the edge costs. Under the reoptimisation
approach, the cost associated with offering deliveries and having deliveries accepted, r ( , ), is the sum
of the crowdsourcing fees for the accepted deliveries plus the cost of the tour visiting all other delivery points:
 Õ
 r ( , ) = + \ 
 ∈ 
 Î Î
The probability that a particular set ⊆ is the set of accepted deliveries is ∈ ∈ \ (1 − ). We
  can
then calculate, for a fixed set of deliveries offered for crowdsourcing, what is the expected cost E ( )
over all possible realisations of .
 " ! ! #
   Õ Ö Ö Õ
 E ( ) = (1 − ) · + \ (1)
 ⊆ ∈ ∈ \ ∈ 
 | {z } | {z }
 Prob. that is the accepted set Cost when is the accepted set

The objective of the problem is to find the set opt which gives the lowest expected cost for the reoptimised
tour:  
 opt = arg min E ( ) (2)
 ⊆ 

Finding a solution to this problem by complete
  enumeration is hard because we have to iterate over all sets
 ⊆ and, to compute each term E ( ) , we have to iterate again over all ⊆ . Furthermore, at each

 3
inner iteration we have to solve a TSP, which is N P-complete. To avoid full enumeration (e.g., with a branch-
and-bound algorithm) we would have to produce a lower bound on the objective function (2), which seems
hard if not outright impossible.
 Note that in our motivating examples the set of delivery points changes every day. In this case there are
no operational considerations hindering the use of the reoptimisation strategy, which would only be limited
by its high computational cost. If the set of delivery points tend to be stable, operational considerations make
an a priori tour approach more attractive (see Appendix A).

Relationship with classical TSP problems The PTSPC is a generalisation of three well-studied routing
problems. When the crowdsourcing costs are large, = ∞ ∀ ∈ , there is no incentive to offer any delivery
for crowdsourcing and the planner will decide to perform all deliveries with the own vehicle. In this case,
the PTSPC becomes the classical Travelling Salesman Problem. The reduction to the TSP also happens
when all probabilities of providers accepting a crowdsourcing offer are zero, = 0 ∀ ∈ . When there are
no crowdsourcing fees, = 0 ∀ ∈ , the planner will try to crowdsource all deliveries. At that point, the
sole factor determining which deliveries to make with the own vehicle and which not, is the stochasticity
related to providers’ acceptance probabilities and we are back to the Probabilistic Travelling Salesman
Problem. When the customers always accept crowdsourcing offers, = 1 ∀ ∈ , the problem reduces to
the Profitable Tour Problem with prizes for performing deliveries with the own vehicle equal to the savings
of the corresponding crowdsourcing fees.

3 Literature review
In this section we position the PTSPC in a broader context, highlighting the characteristics it shares with other
non-deterministic routing problems. First, in Section 3.1, we mention recent studies proposing optimisation
models to integrate crowdsourcing into last-mile delivery, as this is one of the main applications for the PTSPC.
In Section 3.2 we analyse the Probabilistic TSP, the best-known non-deterministic routing problem. Then, in
Section 3.3, we compare the PTSPC with a class of problems introduced in 2017 by Zhang, Wang, and Liu [57],
known as Probabilistic TSP with Profits and Stochastic Customers.

3.1 Crowdsourcing in last-mile delivery
In a recent survey, Alnaggar, Gzara, and Bookbinder [2] review operational research literature on crowd-
sourced LMD and propose a classification of problems arising in the field. Under their classification scheme,
the PTSPC focuses on e-retailers (i.e., the company offering the delivery is the same that sells the product)
as opposed to couriers (i.e., a company that handles deliveries for third parties), it offers a per-delivery rate
determined by the company (other possibilities are per-hour and negotiated rates), and uses pre-planned trips
(drivers were already heading in the direction of the delivery points). From the point of view of the target
market, the PTSPC focuses on self-scheduling individuals (the customers enter the supermarket at their own
convenience, not at specified times) and short-haul deliveries (deliveries within the same city). These char-
acteristics set the PTSPC apart from most other problems considered in the literature, which focus more on
crowdsourcing to entities performing the delivery professionally, rather than truly occasional drivers.
 Archetti, Savelsbergh, and Speranza [7] were among the first to address a problem arising in outsourcing
in LMD: the Vehicle Routing Problem (VRP) with Occasional Drivers (ODs). In this problem, the company can
decide whether to serve a delivery with a vehicle of its own fleet, or to outsource it for a fixed fee. The planner
assigns deliveries to ODs which are already heading towards a destination. For the assignment to take place,
the delivery point needs to be close to the driver’s destination. The model works under the assumption that
ODs always accept requests from the company, provided they fulfil this “closeness” condition and there is no
stochasticity on acceptance. This assumption is important, as optimal solutions tend to use a high percentage
of available drivers. The authors propose a Mixed-Integer Programming (MIP) formulation with arc-indexed
variables and a multi-start heuristic to tackle instances with more than 25 deliveries. They identify three
characteristics affecting the profitability of such a schema: the number of available drivers, their flexibility
(how far the delivery point can be from the OD’s original destination) and the compensation amount.
 Other authors extended the VRP with ODs model to incorporate real-life features such as time windows,
multiple and split deliveries [42], transshipment nodes [41], and coordinating ODs on bikes or on foot with

 4
a delivery truck from which they relay [36, 33]. The MIP model by Huang and Ardiansyah [33] illustrates
well how even deterministic problems can be hard to solve, when they require the interaction of traditional
and crowdsourced supply chain segments. The authors could solve instances with up to 15 delivery points
using the Gurobi solver. Although they do not report computing times, the authors state that by using the
MIP model they could not solve instances with 20 and 30 customers and sometimes, after a 4-hour time limit,
the solver did not even provide an integer feasible solution.
 Recent literature also addressed dynamic versions of the problem, in which delivery requests and driver
availability become known during the day. Arslan et al. [8] consider a real-time system in which drivers make
trip announcements and the company can then assign them pickups and deliveries which are compatible with
their announced travel (i.e., involving only a small detour) and the recipients’ time windows. They tackle
the dynamic nature of the problem with a rolling horizon approach, corresponding to a planner who takes
decisions at different moments during the day. The decisions consist in matching deliveries to drivers and
routing the own fleet for deliveries which were not crowdsourced. With a simulation study the authors show
how this approach can produce savings, depending on the time flexibility of the drivers and their willingness
to take larger detours. They also conclude that this system is more indicated when all parcels share the same
origin, such as a central depot, as this greatly reduces the cost to operate the own fleet.
 Dayarian and Savelsbergh [24] study a problem which shares some characteristic of the PTSPC. Their
model also addresses the case of using in-store customers as occasional drivers and an own fleet in charge
of completing the distribution of parcels. Different from the PTSPC, they simulate each customer individu-
ally; if a customer has a destination compatible with a delivery point, they assume that the customer will
accept the delivery. The authors consider two cases: a static case, where all customer visits and deliveries
are known in advance, and a dynamic case, where information is only known up to a certain time and the
planner reoptimises following a rolling horizon approach. Data on the presence and destination of customers
is collected while the customers shop inside the store. For the dynamic case, the authors also propose to base
the decisions at each epoch on forecasts of future demand and in-store visits, which they obtain averaging
sample scenarios (e.g., generated using historical data). Through a simulation study, the authors determine
a trade-off between the savings obtained by reducing the own fleet and the risk introduced when relying on
customer availability.
 Dahle, Andersson, and Christiansen [23] model occasional drivers as stochastic vehicles possibly appear-
ing at random times during the day (according to a known distribution) in a VRP with time windows. The
authors use a two-stage stochastic model in which they plan the routes of the own fleet in the first stage and,
after OD appearance times are revealed, they amend them and assign deliveries to ODs in the second stage.
They also allow the possibility of not fulfilling some deliveries and pay a corresponding penalty. The authors
solve instances with 5, 10, 15 or 20 deliveries, 2 own vehicles, and 2 or 3 ODs. To do so, they use a MIP model
 
and enumerate all possible 2 K scenarios, where K is the set of occasional drivers. The problem proves
hard to solve: for example, they cannot find the optimal solution within 2 hours of computation for instances
with 10 delivery points and 3 ODs.
 Finally, Gdowska, Viana, and Pedroso [30] introduced a multi-vehicle problem for delivery with crowd-
shipping based on the VRP with ODs [7], which is most similar to our problem. In their work, the authors
consider a multi-vehicle fleet of own vehicles and propose an agent-oriented bi-level stochastic model and a
local search algorithm for its solution. With computational experiments on instances with 15 deliveries, they
show that solutions using crowdsourcing can produce savings, but they cannot assess the accuracy of their
heuristic, because they lack exact solutions.

3.2 Probabilistic Travelling Salesman Problem
When Jaillet [35] introduced the PTSP, the main focus of his work was on the case with homogeneous prob-
abilities ( = for all ), while heterogeneous probabilities were first described by Berman and Simchi-Levi
[13]. Jaillet [35] also introduced the a priori tour approach, which is the one most frequently considered in
the literature. Reoptimisation is used to provide a bound on the quality of a priori tours, e.g., in [34, 19]. Early
works on the PTSP examine theoretical properties of the a priori tour, including bounds and asymptotic rela-
tions [15], and put the PTSP in the broader context of probabilistic routing problems, where the parameters
affected by uncertainty can also be travel times, demands, and time windows [31].
 Solving the PTSP with exact methods has proven hard. More than twenty years ago, Laporte, Louveaux,

 5
and Mercure [38] proposed a branch-and-cut algorithm which added L-shaped cuts to a stochastic formulation
of the PTSP. They could solve instances with up to 50 customers, but only 5 of them had a non-zero probability
of not requiring a visit ( > 0). Even more recent branch-and-bound algorithms were tested on instances
of up to 10 customers [43], 18 customers [4], and 30 customers [3]. Furthermore, the difficulty of the PTSP
increases (a) from homogeneous to heterogeneous probabilities [13], (b) when probabilities 1 − of requiring
a service are low [12], (c) when customers are not geographically uniformly distributed [20]. Recently, Lagos,
Klapp, and Toriello [37] proposed a branch-and-price algorithm for a multi-vehicle version of the PTSP, which
they tested on instances of up to 40 customers. To the best of our knowledge, these results are only published
in draft form [37].
 While exact methods are scarce, the literature abounds with heuristics. Devising effective heuristics, how-
ever, is still a hard task. For example, one of the main building blocks of heuristic and meta-heuristic algo-
rithms is an efficient local search procedure. But since the evaluation of the PTSP objective function is O( 3 ),
neighbourhood exploration becomes hard. In a survey, Campbell and Thomas [21] discuss two approxima-
tion approaches: truncating the number of terms [54], and aggregating customers [20]. Computation of the
objective value can be also accelerated via sample average approximation (SAA). The method, introduced by
Verweij et al. [55], estimates the expected value of a solution by averaging over a finite number of possible
realisations of the Bernoulli random variables associated with the customers (an application of the Monte
Carlo simulation technique). Birattari et al. [18] use SAA to estimate not the whole cost of a solution, but
only the cost variation when moving from one solution to the next via local search. This strategy, called delta
evaluation, was already used by Bianchi, Knowles, and Bowler [17] and by Bianchi and Campbell [16] for
two neighbourhoods (“2-p-opt” and “1-shift”) to provide a closed-formula expression for the expected gain of
applying the local search move. The formulas were actually published, with mistakes, in [14, 15] and then
corrected in [17, 16]. Another approximation approach, proposed by Meier and Clausen [47], consists in
transforming the high-degree objective function into a quadratic function and then linearising the resulting
problem. The authors show that the method produces nearly-optimal solution for instances up to 80 nodes,
but using probabilities = 0.02, 0.05, 0.1 (i.e., most customers require a service).
 Authors have also applied meta-heuristic paradigms such as scatter search [40], Iterated Local Search [10],
multi-started Local Search [39], Greedy Randomised Adaptive Search Procedure [46], Variable Neighbourhood
Search [56], swarm optimisation [45], memetic algorithms [10], and ant colony optimisation [9].

3.3 Probabilistic TSP with profits
Zhang et al. [58] introduced the class of Travelling Salesman Problems with Profits and Stochastic Customers
(TSPPSC). This class contains three problems, which mirror the three classical problems in the area of TSP
with profits: the Orienteering Problem (OP) that maximises the collected profit under an upper bound on tour
duration, the Prize-Collecting TSP (PCTSP) that minimises tour duration under a lower bound on collected
profit, and the Profitable Tour Problem (PTP) described in Appendix 1.
 For each of these problems, the authors describe the corresponding version with stochastic customers. Of
the three new problems, the one more related to the PTSPC is the PTP with Stochastic Customers (PTPSC). In
this problem the decision-maker has to select a subset of customers and plan an a priori tour only visiting the
selected customers. When the the random outcomes are revealed, the planner amends the tour skipping the
customers not requiring service. If a delivery is not even included in the a priori tour, the customer will not
be visited and the corresponding profit will not be collected. This is analogous to outsourcing the delivery,
paying a fee equal to the lost profit to a provider who is always available to accept requests. This highlights
a sort of duality between the proposed problem and our PTSPC. In our problem we have certain customers
but uncertain outsourcing; in the PTPSC there are uncertain customers, but certain outsourcing. Looking at
probabilities, the decision-maker of the PTPSC has the possibility of setting = 1 (exclude from the tour) for
those customers he does not select in the a priori tour; in contrast, in our problem the decision-maker can set
 = 0 (force in the tour) for those customers he does not offer for crowdsourcing.
 Zhang, Wang, and Liu [57] propose a genetic algorithm to solve the PTPSC with homogeneous proba-
bilities. Their analysis is limited to 5 instances with 8, 13, 20, 28 and 50 customers. It shows that, for the
two largest instances, the genetic algorithm produces in a few seconds better results than solving the highly
non-linear formulation with a commercial solver running for three hours.
 Finally, we remark that of the three problems with profits mentioned above, there are no exact algorithms

 6
for their stochastic-customer variants in the literature, while heuristics only exist for the PTPSC [57] and the
OP with stochastic customers [58, 5, 48].

4 Algorithms
The most straightforward algorithm to solve optimisation problem (2) is complete enumeration. This requires
2 outer iterations, one for each possible set ⊆ of deliveries to offer, and 2 | | inner iterations, one for
each possible set ⊆ of accepted deliveries. At each inner iteration we have to solve a Travelling Salesman
Problem with deliveries \ . We get a speed-up for this last step noticing that sets \ repeat over the
iterations, so we can solve the TSPs once and cache the results. A first question arises: is this approach as
untenable as it looks? Up to which instance size can we solve (2) by enumeration, on modern hardware? To
answer this question, we generated a test bed of instances (as explained in Section 5.1) and ran a series of
computational experiments whose results we report in Section 5.2.
 A second question is how much we can scale results by speeding up the enumeration process at the cost
of giving up optimality guarantees. We propose two approaches:
  
 1. reduce the time spent evaluating each term E ( ) ;
  
 2. reduce the number of terms E ( ) to evaluate.

To achieve the two objectives above, we devised heuristic algorithms whose results we report in Sections 5.3
and 5.4.
  
4.1 Speeding up the computation of E ( )
  
Computing E ( ) is expensive for large sets : the number of sets ⊆ to consider grows exponentially
with the size of and, for each such , we have to solve a TSP. We directed our efforts towards reducing the
number of sets to consider in the sum in eq. (1). To do so, we used two techniques: Monte Carlo simulation,
and Machine Learning.

4.1.1 Monte Carlo simulation
  
With this approach, at each computation of E ( ) , instead of enumerating all sets ⊆ , we fix a much
smaller collection A ⊂ P ( ) (P indicates the power-set) of fixed size. We compute the sum in eq. (1) over
the sets of this smaller collection, and then rescale
  the result
  to reflect the large number of terms left out from
the sum. This results in an approximation of E ( ) , defined as follows:
 " ! !#
   1 Õ Ö Ö Õ
 Ẽ ( ) =
 MC
 (1 − ) · + \ (3)
 A ∈A ∈ ∈ 
 ∈ \ 
 Í Î Î
where A is the probability that a set of A is realised, i.e. A = ∈A ∈ ∈ \ (1 − ). The size of A,
denoted as MC , is a parameter which allows to trade-off estimation accuracy and computing time. We used a
fixed value of MC = 20, although more complex approaches are possible: for example, Montemanni et al. [48]
use a neural network to learn good values for MC when using Monte Carlo simulation for the Probabilistic
Orienteering Problem.

4.1.2 Machine Learning approaches
Predicting the value of a complex objective function using machine learning techniques is a recent approach 
which is gaining popularity (ess, e.g., Fischetti and Fraccaro [28]). In our case, ecause computation of E ( )
is harder for large sets , we can compute the exact value of the expected cost for small sets and then use
this information to predict the expected cost of larger ’s. This means training a regression
   model over data
collected on small sets and then using the model to extrapolate an estimate of E ( ) for large sets. Note
how the training and the prediction happen on-line: we train a specific model for each instance, while we
solve that instance.

 7
In the following we define which features (i.e., independent variables) we use for regression and which
models we tried. To list the features used, we first have to introduce some notation. Let ¯( ), ˆ ( ), ( )
 ¯
be, respectively, the largest, the average, and the smallest distance of a delivery point in set ⊆ from the
depot:
 1 Õ
 ¯( ) = max 0 , ˆ ( ) = 0 , ( ) = min 0 
 ∈ | | ∈ ¯ ∈ 
 Í
Let ( ) be the sum of crowdsourcing fees for deliveries in , ( ) = ∈ . Finally, let ( ) be the
diameter of , ( ) = max , ∈ . We define, then, the following features:
 • One binary feature for each delivery point ∈ , with value 1 iff ∈ .

 • One feature representing the fraction of crowdsourcing fees of offered deliveries, ( )/ ( ).

 • Three features, representing the ratios between largest, average, and smallest distances from the depot,
 ˆ ( ) ( )
 between delivery points in and all delivery points: ¯¯ ( )
 ( ) , ˆ ( ) , ¯ ( ) .
 ¯
 ¯ ( \ ) ˆ ( \ ) ( \ )
 • Three features, similar to those above, but referring to \ : ¯ ( ) , ˆ ( ) , ¯ ( ) .
 ¯
 • Two features, representing the ratio between the diameters of, respectively, and \ , and the diam-
 eter of : ( ) ( \ )
 ( ) , ( ) .

In a preliminary analysis (see Section 5.3) we compare the following regression models:
 • A linear regression model LR.

 • A quadratic regression model with a LASSO regularisation term, QRL, leaving out the trivial quadratic
 terms relative to the binary features (since zero squared is zero, and one squared is one). We tuned the
 regularisation hyperparameter via 5-fold cross-validation using a grid search over 100 values.

 • A multi-layer perceptron with ReLU activators, MLP. We chose the number of layers and size of each
 layer via 5-fold cross-validation using three options: one layer with 30 neurons, two layers with 15
 neurons each, three layers with 10 neurons each.

 • A single regression tree RT, and a regression forest RF trained with the Gradient Boosting algorithm [29].
 We used the Mean Squared Error as the splitting criterion and, for the Gradient Boosting algorithm, we
 used 100 estimators of maximum depth 3.

 • Because a regression tree is not able to extrapolate predicted values outside the range of the labels in the
 training set, we experimented training linear regression models on the leaf nodes of RT, thus obtaining
 a new model RT+LR. This model is analogous to the “M5” model introduced by Quinlan [51].
  
The independent variable used for training was the exact expected cost E ( ) . Then, we used the trained
  
model to yield predictions Ẽ ML ( ) relative to new sets not seen during training.
  
4.2 Reducing the number of terms E ( ) to compute
  
To reduce the number of terms E ( ) to evaluate, we propose four heuristics which decrease the number
of sets considered.
 
 • The forward stepwise heuristic starts with = ∅ and iteratively adds the customer for which E ( ∪
 { }) is smallest, until no improvement is possible.

 • The backward
   heuristic is analogous, but it starts with = and removes the customer for
 stepwise
 which E ( \ { }) is smallest.

 • The alternating forward-backward heuristic starts with = ∅ and alternates one forward and one
 backward step. This method uses the same basic idea of the heuristic presented by Gdowska, Viana,
 and Pedroso [30]. Analogously, the alternating backward-forward heuristic starts with = and
 alternates one backward and one forward step.

 8
4.3 Bringing all together
Note how some of the techniques described in Sections 4.1 and 4.2 are orthogonal, in the sense that we can
combine them independently.
  For example, we might choose to adopt a full enumeration of sets but estimate
each term E ( ) with a Monte Carlo simulation, or to use a forward stepwise heuristic and calculate the
expected costs exactly. We expect Machine Learning algorithms to be more effective in combination with full
enumeration, as it allows us to include more data in the training dataset (because the algorithm “sees” all sets
 of small size that we use to train the model).  
 Note that when using an approximation of E ( ) to determine a set ˆ of deliveries to offer for crowd-
sourcing, differently from classical heuristics, we don’t know the true value of the objective function associ-
ated with .ˆ Recomputing the exact value of the expected cost, even for a single set , ˆ can be expensive if
the set is large. A decision support system, thus, might return the set and the approximate expected cost
 ˆ
already computed. In our case, because our interest is in comparing different approaches, we added
  a further
step at the end of the algorithms using approximate values: we compute the exact value of E ( ) , to be
 ˆ
able to gauge the accuracy of the approximation.

5 Computational results
In this section we report on the computational difficulty of this problem and on the quality of the heuristic
techniques outlined in Appendix 4. We ran all experiments on a cluster with Xeon 2.4GHz processors. We
wrote the code in C++. To solve the TSP problems, we used Concorde [6] via the Discorde API [44] and cached
the results in a Patricia Trie [49]. We wrote the code dealing with the machine learning models in Python
using the scikit-learn library [50], and we embedded it using the Boost Python library [1]. Both code and
instances are available on-line [53].

5.1 Instances
We generated a test set selecting four base instances in the TSPLIB [52], truncating them to the first + 1
vertices (one depot and delivery points) for different values of , and assigning values to and according
to different criteria. Because we used six criteria, six values for , and four base instances, our data-set counts
with 144 instances. The number of delivery points used are ∈ {15, 16, 17, 18, 19, 20}.
 We used the following methods to assign values and , guided by two principles: using realistic criteria,
and avoiding that all instances in a set have an “extreme” optimal solution ( opt = or opt = ∅).

Instance Set A. Probability is linearly proportional to the vertex’s distance from the depot, with = 0
 for the depot itself and = 0.95 for the farthest delivery point; crowdsourcing fee is set to 0 / .

Instance Set B. As in set A, but we increase fees by 25%. Increasing the fees is a straightforward way of
 obtaining instances in which the optimal set opt is smaller.

Instance Set C. As in set A, but we assigned probabilities with inverse proportionality to their distance from
 the depot. The rationale is that, in real applications, far delivery points might be inaccessible and harder
 to crowdsource.

Instance Set D. As in set A, but again probabilities are not assigned according to a direct proportion. Let
 be the distance between the depot and the farthest delivery point. Delivery points whose distance
 from the depot is 0.25 · or smaller got assigned = 0.5; otherwise if the distance is 0.75 · or smaller,
 they got = 0.75; otherwise they got = 0.95. This is a step-wise version of the criterion used in set
 A, motivated by the fact that we can group delivery points into “close”, “distant”, and “intermediate”
 groups.

Instance Set E. Combines elements of sets C and D. We assigned probabilities as in set D, but in reverse (e.g.,
 close delivery points had 0.95). This also reflects the fact that centric deliveries are easier to crowdsource,
 while those in the suburbs can be harder.

 9
Set A Set B Set C Set D Set E Set F
 | | FO Sav Time FO Sav Time FO Sav Time FO Sav Time FO Sav Time FO Sav Time
 15 1.00 23.27 132.13 0.87 20.79 131.86 0.60 11.85 131.64 0.87 37.51 130.95 0.78 23.10 130.69 0.67 18.31 133.71
 16 0.83 23.46 299.31 0.80 21.15 307.46 0.59 11.71 304.16 0.88 37.79 302.90 0.80 22.69 304.79 0.66 16.42 305.02
 17 0.84 22.33 721.50 0.79 20.11 727.20 0.63 12.01 888.71 0.90 36.82 719.54 0.97 21.46 718.64 0.65 13.48 721.30
 18 0.85 21.84 1896.97 0.81 19.81 1881.88 0.65 12.12 1646.20 0.90 33.99 1752.09 0.97 23.91 1655.59 0.74 14.32 1913.27
 19 0.86 22.09 4421.65 0.78 20.00 4447.92 0.68 12.07 4374.25 0.92 32.88 4292.36 0.96 23.31 4384.22 0.59 11.88 4316.65
 20 0.86 21.30 7338.38 0.76 19.18 7508.06 0.68 12.03 7562.05 0.91 33.25 7495.46 0.74 22.16 7655.88 0.70 14.76 7587.90

Table 1: Full enumeration results. Columns FO report the fraction of vertices belonging to the optimal set
 opt ; columns Sav list the expected savings (in percentage) when using crowdsourcing, compared to serving
all deliveries with the own vehicle; columns Time list the runtime in seconds. Each row refers to the average
over 4 base instances.

Instance Set F. In this case, we selected probabilities and fees uniformly at random. For probabilities the
 lower and upper bounds were 0 and 1, while for fees they were the shortest and largest travel cost
 between the depot and a delivery point.

We also checked that no delivery can be trivially fixed to always (or never) being offered. We used the follow-
ing method which, in real-life instances, can be used to reduce the solution space. Consider a vertex ∈ 
and let ∗, ∗ ∈ ′ be the two vertices between which inserting has the least cost

 ∗, ∗ = arg min{ + }
 , ∈ ′

If the crowdsourcing fee is smaller than the smallest insertion cost, < ∗ + ∗ , then it will always be
convenient to offer for crowdsourcing. Analogously, if the crowdsourcing fee is larger than the largest
insertion cost (obtained swapping arg min with arg max) it will always be convenient to visit with the own
vehicle.

5.2 Enumeration results
We ran the full enumeration algorithm for three reasons. First, to check that the runtime grows exponentially
with the instance size, as we assumed in Appendix 2. Second, to compute exact solutions which we can use
to assess the quality of the heuristic algorithms. Third, to see how the optimal sets opt vary depending on
the instance sets.
 Table 1 gives a summary of the results. Columns “FO” report the fraction of vertices in the optimal offered
set (| opt |/| |). Columns “Sav” list the percentage savings obtained when using crowdsourcing, i.e. the gap
between the expected cost of the optimal solution and the cost of a TSP tour visiting all customers:
  
 − E ( opt )
 Sav =   · 100
 E ( opt )

Columns “Time” list the runtime in seconds.
 The savings reported in Table 1 for all instance sets show that, no matter the method used to generate
the instances, the adoption of crowdsourcing can lead to significant savings in the final expected cost. The
results show that instances with step-wise probability of acceptance produce larger savings (compare sets
A with D, and C with E). The important factor is that, in step-wise instances, the probability of acceptance
for distant delivery points increases, showing that it’s important for companies to incentivise crowdsourcing
of the farthest deliveries with appropriate pricing strategies. Comparing sets A and B shows that such pric-
ing strategies are feasible, as increasing the rewards by 25% gives only minor decreases in over-all expected
savings.
 We provide a visualisation of the algorithm runtime in Figure 1, where each point corresponds to one of the
144 instances. The axis reports the instance size, with points of different instance sets slightly shifted from
each other. The axis represents the runtime, and is in logarithmic scale. Figure 1 shows that the runtime
grows exponentially in the size of , as an upward straight line in the figure corresponds to exponential
growth along the axis.

 10
Figure 1: Runtime for the full enumeration algorithm. The -axis is in logarithmic scale.

 11
Figure 2: 2 score of the Machine Learning models used off-line. The boxes denote instances within the two
central quartiles and whiskers extend to the rest of the distribution, except for outliers denoted by diamond
filers. The solid line inside the boxes and the numbers at the top refer to the median score.

5.3 Evaluation of Machine Learning and Monte Carlo estimators
We ran two sets of experiments to assess the quality of the Monte Carlo and Machine Learning estimators
of E ( ) . In the first set, we measure the accuracy of the estimation off-line, i.e., without bundling the
estimators in an enumeration
  algorithm. The aim
  of these
  experiments is to understand how much the  esti-
 
mates Ẽ ML ( ) (Machine Learning) and Ẽ MC ( ) (Monte Carlo) differ from the exact value E ( ) .
In the second set, we use the estimators on-line within the enumeration algorithm. In this case, we are not
only interested in the accuracy of the estimation, but also in the trade-off between speed and accuracy.

5.3.1 Estimations using Machine Learning
For the first set of experiments, we trained the machine learning models on an off-line dataset. We created
this dataset
  by running
  the complete enumeration algorithm over all instances and, each time we evaluated
a term E ( ) , we also recorded the corresponding independent variables mentioned in Section 4.1.2.
 Next, to reflect how we would use the models bundling them into the enumeration algorithm, we split
the dataset in two parts. The training dataset consists of data relative to small sets , while the test dataset
contains data points relative to larger sets . We experimented with two different splits: in the first case we
put in the training dataset information for sets such that | | ≤ 31 | |, and in the second case for sets such
that | | ≤ 23 | |. This means that, e.g., in the first case we would run the exact enumeration algorithm for
sets containing up to one third of the delivery points and then use the machine learning models to predict
the expected cost for larger sets. These cases reflect two different trade-offs between the size of the training
datasets and the time required to generate them (i.e., the time spent calculating the exact expected costs).
 Figures 2 to 4 report the findings of this first set of experiments. In each figure the left chart refers to
the first case (| | ≤ 13 | |) and the right chart to the second case (| | ≤ 23 | |). Figure 2 reports the 2 scores
obtained by each of the models presented in Section 4.1.2. The boxes represent the two central quartiles of
scores across all the instances; the whiskers extend to the rest of the scores distribution, except for outliers
which are denoted by diamond fliers. The solid line inside each box demarcates the median score, which is
also reported as a number on the top part of the chart. As we can expect, scores improve with the larger
  The median values are high, considering we are trying to predict the complicated
training dataset 2
  size.
function E ( ) but, for smaller training set, they are sometimes negative (i.e., using the model is worse

 12
Figure 3: Percentage error on the estimate of the expected cost of the lowest-cost set in the test dataset, when
using the Machine Learning models off-line. The height of the bar and the numbers at the top report the
median error. The whiskers show a 95% confidence interval obtained by bootstrapping.

Figure 4: Number of instances (in percentage) for which each Machine Learning model, used off-line, identified
the lowest-cost set in the test dataset.

 13
Instance size Avg Time (s) Avg Gap (%) Opt found (%)
 15 62.34 4.61 62.50
 16 68.71 8.72 8.33
 17 311.55 6.05 33.30
 18 332.22 7.90 8.33
 19 373.39 7.84 12.50
 20 479.44 8.56 12.50

Table 2: Results for the Machine Learning model RT+LR used on-line whithin the full enumeration algorithm.
Each row presents results averaged over all instances
  of the given size. Column Avg Gap (%) lists the percent-
age gaps between the real expected cost E ( ) of the best set ˆ found by the algorithm and the optimum.
 ˆ
Column Opt found (%) reports the number of instances (in percentage) for which the algorithm identified the
optimal set.

than predicting the mean). We attribute the high median scores to a careful selection of features and to the
fact that we heavily fit to the distribution associated with the instance we train the model on. Fitting to
a particular instance, unlike overfitting to a particular training set, is not a problem because we train each
model from scratch for each instance and don’t plan to reuse the same trained model for new instances.
 The figure shows that, for the smaller training dataset, Gradient Boosted trees (model RF) perform the
best, followed by Linear Regression (model LR). For the larger training dataset, the Regression Trees with
Linear Regression models on their leaves (model RT+LR) achieves an almost perfect 2 score.
 Figure 3 reports the percentage error made by each model on the lowest-cost set ∗ in the test dataset
T ⊂ P ( ). Note that we know this set because we have computed all expected costs with the complete
enumeration algorithm; also, if the optimal set happens to be in the test set, then ∗ = opt . We define the
percentage error on ∗ as    
 E ( ∗ ) − Ẽ ML ( ∗ )
  
 E ( ∗ )
This metric is important because identifying the lowest-cost set and estimating its cost is the main job of an
estimator. The figure reports the median percentage error as the height of the bars; these numbers are also
present at the top of each chart. The whiskers denote 95% confidence intervals around the median, obtained
via bootstrapping. Models LR and RT+LR give the best performances, respectively, on the smaller and larger
training datasets (although the confidence interval for LR on the smaller datasets overlaps with other intervals,
including the one of RT+LR).  
 Because we want to find the set ∗ ∈ T which minimises E ( ) , another important metric that
defines a good model is to count how many times it managed to identify this set in the test dataset. A model
is successful in doing so if
    
 ˆ B arg min Ẽ ML ( ) = arg min E ( ) C ∗ (4)
 ∈T ∈T

Note how finding the correct set ∗ doesn’t necessarily say anything
  on the
  quality
  of the other metrics. For
example, a model consistently giving an estimate Ẽ ( ) = 1000 E ( ) would perhaps have a low
 ML 1

 2 score and a high percentage error, but would always be able to identify ∗ . Figure 4 reports the number
of instances (in percentage) for which each model succeeded at satisfying Equation (4). In this case, model
RT+LR is, by far, the best model in both cases and thus it is a good candidate to use in combination with the
enumeration algorithm.
 In the second set of experiments, we focused on the RT+LR model and we used it within the full enumer-
ation of sets ⊆ . Because in a decision support system the user will require a solution within a short
timeframe, we used a hard time limit as the stopping criterion to halt the construction of the training set.
For instances of sizes 15 and 16 we used a 1-minute time limit, while for other instances we used a 5-minute
limit. After this time limit we trained
  the model (the RT+LR model trains in less than 5 seconds) and then we
started using it to approximate E ( ) . Note that this strategy is analogous to the one used in the off-line
experiments, because in Equation (2) we visit sets by ascending size.

 14
Figure 5: Percentage gaps obtained using the on-line Machine Learning estimator RT+LR. Each box refers to
a different instance size. The box spans the two central quartiles of the gap distributions, with the straight
horizontal line demarcating the median. The whiskers extend to the rest of the distribution, except for outliers
drawn as diamond fliers.

 Table 2 reports the results of applying the Machine Learning RT+LR estimator on-line during the full
enumeration outer loop. Each row presents results averaged over all instances of the given size. Column
“Avg Time (s)” reports the solution time including: (1) running the enumeration algorithm and computing
the exact expected cost to build the training set, (2) training the Machine Learning model, and (3) completing
the enumeration using  the model’s predictions. Column “Avg Gap (%)” lists the percentage gaps  between
 
the expected cost E ( ) ˆ of the set ˆ returned by the algorithm, and the expected cost E ( opt ) of
    
the optimal set opt . Note that the algorithm returns Ẽ ML ( ) ˆ , so we computed E ( ) ˆ in a further
  
step, after the experiment was over; the time used to compute E ( ) ˆ is not included in the time reported
under “Avg Time (s)”. Column “Opt found (%)” reports the number of instances (in percentage) for which the
algorithm identified the optimal set (i.e., ˆ = opt ).
 Note that most of the time is spent building the training set, and the total runtime slightly exceeds the
training time (60 seconds for sizes 15 and 16, 300 seconds for larger instances). All average gaps are under
10% and there is no clear trend as a function of the instance size. Even when the algorithm is not successful at
identifying the optimal set, it still gives a solution of high quality, as testified by the low gaps. Figure 5 shows
the full distribution of gaps as a function of the instance size. All median values are under 10% and the third
quartile is always well under 15%. The almost flat distribution of gaps as instance size increases suggests that
this method can scale well for larger instances.

5.3.2 Estimations using Monte Carlo simulation
Analogously to what we have done for the Machine Learning estimator, we report results   on two sets of exper-
iments. With the first set, we analyse the accuracy of Monte Carlo estimates Ẽ ( ) when the estimator
 MC

is not yet bundled into the enumeration algorithm (off-line). The second set will show the performance and
the quality of the Monte Carlo estimator when used inside a decision support system (on-line).
 Table 3 reports the results obtainedin thefirst set of experiments. For every
  instance
  and every set ⊆ ,
we compare the true expected cost E ( ) with its approximation Ẽ MC ( ) . Note that we left out sets
 with | | ≤ 4 because, in that case, 2 | | ≤ MC = 20 and so the number of subsets ⊆ is smaller than
the number of Monte Carlo iterations. Each row of the table displays results averaged over all instances with
the given size (column “Instance size”). Column “Mean Err (%)” reports the percentage error (or gap) over all
sets and all instances, whereas column “Opt Err (%)” refers to the percentage error on the optimal sets opt .

 15
Instance size Mean Err (%) Opt Err (%) Opt found (%)
 15 1.58 7.33 12.5
 16 1.61 7.79 0.0
 17 1.58 11.04 16.7
 18 1.50 7.20 4.2
 19 1.47 7.81 8.3
 20 1.46 8.67 12.5

Table 3: Statistics on off-line Monte Carlo approximation with MC = 20 samples. Column Mean Err (%)
reports the percentage error averaged over all sets in all instances with the given size. Column Opt Err (%)
reports the percentage error for the optimal set, averaged over all instances with the given size. Column Opt
found (%) lists the number of instances, in percentage, for which the set with the lowest approximate cost
coincides with the optimal set.

Figure 6: Percentage errors by size of the set whose cost was approximated, for the off-line Monte Carlo
estimator. Points with a round marker report the mean percentage error over all sets of the given size and
over all instances. Points with a diamond marker display the percentage error over the lowest-cost set of the
given size for each instance, averaged over all instances.

More formally, for each instance the mean percentage error is
    
 1 Õ E ( ) − ẼMC ( )
 
   where P4 ( ) = { ⊆ : | | > 4}
 |P4 ( )| E ( )
 ∈ P4 ( )

and the percentage error on the optimal set is
    
 E ( ∗ ) − Ẽ MC ( ∗ )  
   where ∗ = arg min E ( )
 E ( ∗ ) ∈ P4 ( )

Column “Opt Found (%)” reports the number of instances (in percentage) in which the set ˆ with the lowest
approximate cost coincides with the optimal set:
    
 ˆ B arg min Ẽ MC ( ) = arg min E ( ) C ∗ (5)
 ∈ P4 ( ) ∈ P4 ( )

 Comparing the values in column “Opt Found (%)” of Table 3 with those reported in Figure 4 we can see
how the Machine Learning estimator is considerably better at finding the lowest-cost set, compared with
Monte Carlo simulation.

 16
Instance size Avg Time (s) Avg Gap (%) Opt found (%)
 15 107.48 4.41 0
 16 326.11 4.66 0
 17 418.01 5.14 0
 18 514.99 3.66 0
 19 1024.80 3.01 4.17
 20 2742.19 3.39 0

Table 4: Results for the on-line Monte Carlo simulation method. Each row presents results averaged over all
instances
  of the given size. Column Avg Gap (%) lists the percentage gaps between the real expected cost
E ( ) of the best set ˜ found by the algorithm and the optimum. Column Opt found (%) reports the
 ˜
number of instances (in percentage) for which the algorithm identified the optimal set.

 We also notice a sharp increase in the percentage error when computing its average over all sets (column
“Mean Err (%)”) and over the optimal sets (column “Opt Err (%)”). Because the accuracy of the Monte Carlo
method decreases with | | (for larger sets MC becomes smaller compared to 2 | | ) we were unsure if the lower
precision was due to the optimal sets opt being generally large. To answer this question, in Figure 6 we plot
the percentage error as a function of the size of . Points with a round marker report the mean percentage
error over all sets of the given size, averaged over all instances. Points with a diamond marker display the
percentage error over the lowest-cost set of the given size for each instance, averaged over all instances. The
diamond points always correspond to larger errors compared to the corresponding round points — except for
size 20, because there is only one set with such size, i.e., when = in instances where | | = 20, and so
the two markers correspond to the same results. This means that, independently of the size of sets (and
so of the size of opt ), the estimate is worse for good sets than it is on average. This consideration suggests
that, in the on-line experiments, we could see fewer instances in which Monte Carlo simulation can find the
optimum, compared to using the Machine Learning estimator.
 The results of the on-line experiments, indeed, confirm this hypothesis. Table 4 reports these results using
the same column headings as Table 2. The data shows three trends. First, the number of times the heuristic
produced the optimal solution is larger using the RT+LR model compared to Monte Carlo simulation. Second,
the Monte Carlo approach gives lower gaps than the Machine Learning model. The two estimation methods
show, in fact, complementary strengths: one is better at identifying the optimum but, when it fails doing so,
its error can be large; the other rarely identifies the optimal set but returns solutions which, on average, are
better. The third consideration is that the Monte Carlo approach doesn’t scale well and its runtime is high.
Indeed, using the Machine Learning model the user has more control over the runtime because the training
phase can be stopped at any time. We expect that, if we trained the RT+LR model for the same amount of time
required by the solver using Monte Carlo simulation, the solution quality would increase considerably.
 Given the above considerations, we think that the Machine Learning approach is a better alternative; the
main advantage of using Monte Carlo simulation lies in its simplicity and minimal implementation effort.

5.4 Heuristic enumeration results
In this section we present the results obtained when reducing the number of sets ⊆ we consider in the
outer loop described in Section 2.2 to solve minimisation problem Equation (2). We applied the four heuristics
described in Section 4.2: forward stepwise (Fwd), backward stepwise (Bck), forward-backward stepwise (Fwd- 
bck) and backward-forward stepwise (Bck-fwd). In all cases we computed the exact expected cost E ( ) .
We compare these results with those of the other three methods introduced in this section, which use the full
enumeration approach with, respectively, the exact value and the Machine Learning and Monte Carlo cost
estimators.
 Table 5 gives a summary of the results. Column “Alg” denotes the algorithm used for the outer loop.
Column “Est” indicates the estimator of the expected cost, with ML denoting Machine Learning and MC
denoting Monte Carlo. For each instance size, we report three metrics: the optimality gap in percent (column
“Gap”); the runtime in seconds (column “Time”); and the number of instances solved to optimality, in percent
(column “Opt”). The numbers in bold mark the best result for each metric and instance size.

 17
You can also read