# DYNAMIC PROBABILITY OF REINFORCEMENT FOR COOPERATION: RANDOM GAME TERMINATION IN THE CENTIPEDE GAME - University of ...

←

**Page content transcription**

If your browser does not render page correctly, please read the page content below

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2018, 109, 349–364 NUMBER 2 (MARCH) DYNAMIC PROBABILITY OF REINFORCEMENT FOR COOPERATION: RANDOM GAME TERMINATION IN THE CENTIPEDE GAME EVA M. KROCKOW, ANDREW M. COLMAN, AND BRIONY D. PULFORD DEPARTMENT OF NEUROSCIENCE, PSYCHOLOGY AND BEHAVIOUR, UNIVERSITY OF LEICESTER, U.K. Experimental games have previously been used to study principles of human interaction. Many such games are characterized by iterated or repeated designs that model dynamic relationships, including reciprocal cooperation. To enable the study of inﬁnite game repetitions and to avoid endgame effects of lower cooperation toward the ﬁnal game round, investigators have introduced random termination rules. This study extends previous research that has focused narrowly on repeated Prisoner’s Dilemma games by conducting a controlled experiment of two-player, random termination Centipede games involving probabilistic reinforcement and characterized by the longest decision sequences reported in the empirical literature to date (24 decision nodes). Speciﬁcally, we assessed mean exit points and coop- eration rates, and compared the effects of four different termination rules: no random game termina- tion, random game termination with constant termination probability, random game termination with increasing termination probability, and random game termination with decreasing termination proba- bility. We found that although mean exit points were lower for games with shorter expected game lengths, the subjects’ cooperativeness was signiﬁcantly reduced only in the most extreme condition with decreasing computer termination probability and an expected game length of two decision nodes. Key words: Centipede game, random game termination, backward induction, endgame effects, cooper- ation, reciprocity Cooperative human interactions have fre- decision contexts, circumstances outside of the quently been modeled by repeated or sequen- players’ control can prematurely end a tial games, including the Repeated Prisoner’s sequence of cooperative turns, for example if Dilemma game and the Centipede game. These one player is forced to leave, or dies. games provide abstract decision contexts where To provide a methodological implementation two or more players can choose repeatedly of indeﬁnite game repetitions with random stop- between cooperation and defection, either ping through external forces, random game ter- cooperatively sharing a pot of money with the mination was introduced as an alternative to other player or selﬁshly choosing a larger share ﬁnitely repeated games (Roth & Murnigham, for themselves (Krockow, Colman, & Pulford, 1978). Unlike more traditional games with 2016a). These abstract decision tasks enable the explicit, ﬁnite horizons (e.g., Selten & Stoecker, study of fundamental principles underlying 1986), this termination rule involves players human relationships. However, an aspect that being informed of the probability that a further has often been neglected in experimental game round will be played but not which partic- designs is the possible effect of external factors ular round will be the last. Random game termi- such as random interventions, modeled in game nation was claimed to avoid an endgame effect. theory by a player called Nature. In real-life The term ‘endgame’ refers to the ﬁnal stage in a game of chess, but has also been applied to the analogous stage in an experimental game— The research reported in this article was supported by the ﬁnal decisions in the game. The endgame awards from Friedrich Naumann Stiftung für die Freiheit effect, in turn, denotes the behavioral phenome- to the ﬁrst author and from the Leicester Judgment and non that cooperation—even if stable through- Decision Making Endowment Fund (Grant RM43G0176) out most of the game—suddenly drops when to the second and third authors. The authors are grateful to Eike Buabang, who helped with data collection, and the players can predict that they are approach- Kevin McCracken and Jodil Davis, who helped with soft- ing the end of the interaction (Andreoni, 1988). ware development. Random game termination was introduced to Address correspondence to: Eva M. Krockow, Depart- allow for the study of inﬁnitely extended games ment of Neuroscience, Psychology and Behaviour, Univer- (Dal Bó & Fréchette, in press; Fréchette & sity of Leicester, Leicester LE1 7RH, United Kingdom. E- mail: emk12@le.ac.uk, Telephone: +44 (0)116 229 7084, Yuksel, 2017; Normann & Wallace, 2012). Based Fax: +44 (0)116 229 7196 on these methodological advantages, random doi: 10.1002/jeab.320 game termination rules may increase real-life © 2018 Society for the Experimental Analysis of Behavior 349

350 EVA M. KROCKOW et al. applicability of repeated games, because human arguably consistent with the matching law, social interactions are rarely characterized by according to which the relative frequency of complete-information contexts with ﬁnite hori- responses (in this case, cooperative decisions) zons (Dal Bó, 2005; Jiborn & Rabinowicz, 2003). closely approximates the relative frequency of Most experimental research using random reinforcements in concurrent reinforcement termination designs has been conducted on schedules (Herrnstein, 1961). the repeated Prisoner’s Dilemma game However, only few studies (e.g., Engle-Warnick (RPDG), the iterated version of the dyadic, & Slonim, 2004) have investigated random termi- one-shot Prisoner’s Dilemma (PD), frequently nation rules in games other than the RPDG, and referred to as the most fundamental example no empirical research has studied random termi- of all social dilemmas (Colman & Pulford, nation rules in Rosenthal’s (1981) Centipede 2015; Rapoport, Seale, & Colman, 2015; Roth, game (CG) (see Fig. 1). In this sequential game 1995). The PD, originally named by Tucker with complete and perfect information, two (1950/2001), describes a strategic decision players A and B take turns in deciding between context in which two suspects have been two alternatives: a cooperative GO move that arrested for a joint crime. Both individuals leads the game to continue horizontally across have to choose (separately and simulta- the game tree, and a noncooperative STOP neously) between selling out the other person move that terminates the game through an (defection) and staying quiet (cooperation). immediate, downward exit move, leaving the Their sentences will be long if both of them defector with a relatively favorable payoff com- decide to sell out, and shorter if both remain pared to the other player. In the example CG, a silent, but if one person chooses betrayal while GO choice always decreases a player’s payoff by the other stays silent, then the defector will go three units and increases the co-player’s payoff free and the cooperator will suffer the maxi- by seven units. In this case, the joint payoffs of mum sentence. Despite describing a speciﬁc the player pair increase linearly from one exit decision scenario, the PD can be abstracted to point to another, but exponentially increasing model a general strategic dilemma that crops versions are also frequently studied. The subgame up in many economic, political, and interper- perfect Nash equilibrium of the CG, as derived sonal interactions. The RPDG refers to deci- through backward induction (BI) reasoning, is sion contexts in which two individuals the unconditional STOP move by Player A at the complete multiple PDs in a sequence. Roth ﬁrst decision node, even though both players and Murnighan’s (1978) ﬁrst investigation of would receive higher individual payoffs following random termination rules in RPDGs suggested just one cooperative move each (for a discussion that lower termination probabilities increased of BI in the context of the CG see Aumann, 1995, cooperation relative to higher probabilities. 1998; Colman, Krockow, Frosch, & Pulford, More recently, Dal Bó (2005) conducted a 2016). This surprising conclusion of backward comprehensive experiment on RPDGs using induction reasoning is also consistent with proba- three different random termination rules with bility discounting—the ﬁnding that individuals expected lengths of one, two, and four game generally prefer smaller certain rewards to larger rounds respectively. Additionally, they com- lower-probability rewards, the effect being most pared these conditions to ﬁnite-horizon games with matching numbers of expected game rounds. The results conﬁrmed the earlier ﬁnd- ings of Roth and Murnighan (1978), suggesting that decreasing the likelihood of game termina- tion increased cooperation levels. Furthermore, the results showed that subjects were likely to cooperate more in the inﬁnite-horizon RPDGs Fig. 1. Centipede game with a linearly increasing pay- than in those of a ﬁnite length, even if matched off function. The game proceeds from left to right. Two for expected game length. If we interpret the players (A and B) alternate in choosing between coopera- probability of game continuation in such tive GO moves that continue the interaction by moving horizontally to the right and noncooperative STOP moves random-termination games as a proxy for the that terminate the game by moving down. The numbers at players’ (anticipated) relative frequency of the bottom and right are payoffs to both players, with rewarding payoffs, then the results are also those of Player A displayed above those of Player B.

RANDOM GAME TERMINATION 351 accurately described by a hyperbolic probability Whereas the CG has received increasing atten- discounting function (Green & Myerson, 2004; tion in the literature, with most empirical studies Myerson, Green & Morris, 2011). demonstrating high levels of cooperation and While sharing many features of the RPDG, reliable deviations from equilibrium play the CG provides a different decision context (e.g., Bornstein, Kugler, & Ziegelmeyer, 2004; and deserves investigation in its own right. In Krockow, Colman, & Pulford, 2016b; Krockow, the CG, the decision to defect terminates the Pulford, & Colman, 2015; McKelvey & Palfrey entire interaction, and its consequences are 1992), only short CGs with ﬁnite horizons have irrevocable. Retaliation through strategies such been investigated so far. The longest CG used in as Tit for Tat is therefore not possible. Further- a published, peer-reviewed experiment was more, it is characterized by a sequential, recip- Nagel and Tang’s (1998) 12-node game. That rocal move structure that may offer a closer game was presented in reduced normal form, model of many real decision situations than the which had the additional advantage of assessing simultaneous decision context of the RPDG all intended exit points in the game—the struc- (Krockow et al., 2016a). Finally, the payoffs of ture and interdependence of players’ decisions the standard RPDG remain constant through- in the sequential-move version mean that even out the decision sequence and therefore cannot the most cooperative player can never reach late model the same variety of dynamic incentive exit nodes when paired with an early-defecting structures, including exponentially or linearly co-player. However, although the reduced nor- increasing payoffs, as the CG. mal form is likely to provide more accurate An example of the game’s application to assessments of the prevalence of altruism in an real-life interactive decisions could include two experimental sample, it misses out the sequential neighboring couples who alternate helping player interaction characteristic of the standard each other with the baby-sitting. Neither of the CG and reduces the length of time invested in couples particularly enjoys looking after the each game. It presents a fundamentally different other family’s badly behaved children, and decision problem and may lead to a signiﬁcantly there is always the possibility that one couple different behavior (Krockow et al., 2016a). could decide to end the relationship without Finally, no research to date has investigated further reciprocation. Nevertheless, in the long CGs with different termination rules, includ- run, both couples beneﬁt from the arrange- ing random game termination, even though ment, because the cost of performing the chore these could provide informative insights into is less than the beneﬁt to the other couple. decision-making situations under the risk of In addition to this social decision-making con- premature termination. Hence, there is a text, the CG has biological applications, for need for the investigation of longer CG example modeling certain animal mating sequences with a variety of termination rules behaviors. Hermaphrodite organisms (i.e., (Krockow et al., 2016a). organisms with both female and male reproduc- The present study investigated CGs with up to tive organs) such as the hermaphrodite sea bass 24 moves (twice as long as Nagel & Tang’s 1998 have been found to distribute costly egg produc- version) and linearly increasing payoffs. Addi- tion by taking turns with their mates in laying tionally, we investigated the effects of four dif- small batches of eggs. This repeated exchange ferent termination rules, including two novel of small batches of eggs for fertilization—as rules of random termination with increasing opposed to the production of a large batch by and decreasing probabilities of game termina- one individual at a time—helps to prevent tion throughout the decision sequence, respec- mutant sea bass with male reproductive organs tively. No study to date appears to have only from fertilizing all eggs and swimming off combined random termination rules with ﬁnite without making a similarly large contribution to game horizons. However, the ﬁnite design reproduction (Binmore, 1998). The CG thus offers an advantage in the CG inasmuch as it provides an interesting experimental paradigm allows for the calculation of mean exit points, to study mutual trust and related topics of recip- an index of cooperation widely used in the pre- rocation, altruism, individual versus group bene- vious CG literature. Furthermore, as Selten, ﬁts, and long-term versus short-term payoff Mitzkewitz, and Uhlich (1997) pointed out, inﬁ- maximization (e.g., Krockow et al., 2016a; nitely repeated games are not feasible in prac- Palacios-Huerta & Volij, 2009). tice. Experimental subjects always know that the

352 EVA M. KROCKOW et al. game will have a ﬁnite duration, and the time cooperative interaction between the neighbors. slot they signed up for provides an effective Over time and with increasing work experience, upper bound. Consequently, no experimental however, job security and ﬁnancial stability are game would ever be expected to be inﬁnite. likely to improve, thus leading to a decreasing The study reported below aimed to com- probability of the relationship being terminated pare four CG conditions: A: no random game by environmental factors. Each of the example termination; B: random termination with a scenarios maps onto one of our experimental constant termination probability; C: random conditions, with the ﬁrst scenario correspond- termination with increasing probability; and ing to Condition B, the second to Condition C D: random termination with decreasing proba- and the third to Condition D. bility. These conditions were based on theoret- All conditions of the experiment shared the ical interest and their direct applicability to same maximum game length of 24 nodes but different real-life decision contexts. were designed to differ in their expected game Consider again the neighborly relationship of lengths as based on the random termination alternating childcare support which was pre- probabilities. While Condition A without ran- sented as an example situation earlier. Random dom termination had an expected game length termination of the relationship through exter- of 24 nodes, all random termination conditions nal factors beyond the neighbors’ control is had lower expected lengths of approximately possible and could follow several different func- 4, 9, and 2 nodes, respectively. Previous litera- tions. In its simplest form, the probability of the ture reviewed above (e.g., Dal Bó, 2005) showed relationship being terminated by an external that random termination games of shorter factor could take on a ﬁxed value. For example, expected lengths produced lower cooperation it is possible to imagine a lethal accident cutting in the RPDG than games with longer expected the relationship short. Following each coopera- lengths. Consequently, we hypothesized a simi- tive action by either neighbor, an accident lar decrease of cooperation in Centipede game could occur by chance, thus rendering either conditions in which the computer was statisti- one of the neighboring families unable to cally more likely to end the game earlier. More engage in further baby-sitting. The probability speciﬁcally, we used the order of expected game of such an accident could be ﬁxed (e.g., 1/4) lengths presented above to arrive at our predic- and its value could depend on the general riski- tions of cooperation levels in the individual con- ness of the neighbors’ lifestyles. ditions. Based on this order, Condition D with In a slightly different variation of this sce- an expected length of just over two decision nario, one of the families could be living in a nodes was hypothesized to yield the lowest coop- rented house from which the landlords could eration levels, followed by Condition B and then evict them at any time. The landlords may even- Condition C. Dal Bó (2005) reported that tually use the property as their own future games with ﬁxed lengths decreased cooperation retirement home or as the prospective house compared to games with random termination for their children. In this scenario, the land- rules. However, their treatment games were lord’s choice would be the external factor matched for expected game lengths. Given that potentially ending the neighbors’ relationship our ﬁxed-length game presented in Condition prematurely. Although the initial probability of A was characterized by a comparatively high the landlord evicting his tenants may be very expected length of 24 decision nodes, we low, the probability would increase over time. hypothesized that this condition would yield Finally, consider this third variation of the higher levels of cooperation than all random- baby-sitting scenario. The families may have termination conditions in the experiment. moved to the neighborhood at an early age and with uncertain job prospects. Like many young Method professionals, they may initially depend on short-term work contracts or insecure temping Subjects jobs with zero-hour contracts. Given the initial A total of 148 undergraduate students from job insecurity, a long-term stay in the area may the University of Leicester with a mean age of be questionable, yielding a high early likelihood 19.34 years (SD = 2.86 years) participated in of forced relocation. Thus, job insecurity could the experiment (see Table 1). All were incen- be another external factor terminating the tivized with a between-subjects random lottery

RANDOM GAME TERMINATION 353 Table 1 Hence, at the ﬁrst node the computer never Summary of session and subject details chose to terminate, and at the game’s end (i.e., the computer’s 24th decision node) it Subjects Rounds # of # of per per terminated the game in 50% of the cases. Con- Condition subjects sessions session session versely, in Condition D, the probability of game termination by the computer steadily A: No random 40 2 22, 18 20 decreased from 1/2 at the ﬁrst node to 0 at termination B: Constant δ 34 2 18, 16 20 the last node. Hence, at the ﬁrst node the C: Increasing δ 40 2 22, 18 20 computer chose to terminate in 50% of the D: Decreasing δ 34 2 18, 16 20 cases, and at the game’s end it never termi- nated. In both Conditions C and D, the mean value of δ is 1/4, which is why this value was system. One person per testing session chosen as the constant termination probability received the payoff from a randomly chosen in Condition B. Based on the above probabili- game completed during the session. The ties, the expected termination points T by the mean cash remuneration of the selected sub- computer were calculated to be as follows: jects was £14.36 ($18.00). We chose to select Condition A, TA = 24.00; Condition B, TB = one game for payment randomly rather than 4.00, Condition C, TC = 8.99, Condition D, calculating an average across all games, TD = 2.13. The game trees displayed on screen because previous literature provided evidence for the different treatment conditions are that this method prevents subjects from show in Figure 2. Detailed plots of the proba- responding to the individual game repetitions bility functions of games being randomly ter- merely as parts of one large “supergame” minated by the computer at each exit point (Bardsley, et al., 2010; Bolle, 1990; Cubitt, Star- are provided in Figure 3. mer, & Sugden, 1998). In particular, we As a general measure of cooperation, the wanted to ensure that subjects responded to subjects’ cooperation rates were calculated by every game as a separate decision context that dividing a player’s number of GO moves by could determine their total payoff in the the total number of moves that player made experiment. Selecting only one subject per ses- across all 20 game rounds. In the context of sion for payment is common practice in the present experiment, the proportion of GO research on experimental games, and informal moves provided a more accurate indication of feedback from subjects conﬁrmed that they individual cooperation levels than the mean were sufﬁciently motivated by the chance of exit points reported in previous studies winning the money. (e.g., Krockow et al., 2016a), because it took into account the fewer decision opportunities Design in the three conditions with random termina- Subjects were randomly allocated to one of tion rules, while also capturing the cooperative four treatment conditions with different CGs. moves made in games which were prematurely Each game offered a maximum of 24 subject terminated by the computer. moves, and the combined payoffs of both Additionally, players’ STOP probabilities players at each node increased linearly from were calculated for each individual decision 4 at Node 1 to 100 at the natural end. The node to estimate the likelihood of game termi- four treatment conditions varied only as nation at each point in the game. This was regards the probability δ of random game ter- done by dividing the number of players who mination by the computer, as follows. A: no chose to STOP at each decision node by the random termination; B: constant termination total number of players who had reached the probability δB = 14 following each subject move; respective node. C: increasing termination probability δC = 0, 44 , 44 , …, 21 1 2 , 44 44 22 ; and D: decreasing termi- Materials nation probability δD = 22 , 21 44 44 , …, 2 1 , 44 44 , 0: In The testing sessions were carried out in a Condition C, the probability of game termina- large computer laboratory. Each subject was tion by the computer steadily increased from seated at a computer desk, with all desks gen- 0 at the ﬁrst node to 1/2 at the last node. erously spaced out in the laboratory to avoid

354 EVA M. KROCKOW et al. Fig. 2. Speciﬁc Centipede game trees used in the present experiment: (a) Game tree used for Condition A: a long Centipede game with 24 decision nodes and no random termination by the computer; (b) Game tree used for Conditions B, C, and D: a long Centipede game with 24 decision nodes and random termination by the computer (ran- dom termination rules varied across the three conditions). any communication between subjects. For the Subsequent screen displays did not include anonymous game interaction, a custom-made reminders about the computer’s speciﬁc ter- web-based game application was used which mination probabilities at each node. We provided real-time feedback about the sub- made this decision despite recent literature jects’ choices, the computer’s choices and the suggesting that subjects’ responses to linear current round number. The subjects were pre- probability functions may frequently be dis- sented with the game tree of their respective torted, with subjects behaving as though the treatment condition. To visualize the com- likelihood of events with low probabilities are puter’s options for random termination in the higher and the likelihood of events with high last three conditions, additional decision probabilities lower than they actually are nodes with the label C for computer were (e.g., Zhang & Maloney, 2012). Given that inserted into the game tree following each the computer’s termination probabilities in player’s decision nodes. Several detailed Conditions C and D either increased or instruction slides explained the payoff func- decreased by 1/44 (0.0227) with each of the tion and the random termination rule for the computer’s decision nodes passed, we relevant treatment condition. For example, in believed that the small fractions or decimal Condition C (increasing termination probabil- numbers would impose an even greater chal- ity), the instructions read: lenge to the subjects’ adaptive learning than the linear probability functions explained in The Computer is programmed to make the instruction slides. The subjects saw eight random choices, prefers neither partici- player nodes at a time, and the display shifted pant, and gains nothing itself. by eight nodes once the game continued The probability that the Computer beyond the eighth node. The display shifted chooses GO steadily decreases from again to the game’s ﬁnal set of eight decision 1 (at the ﬁrst circle) to 1/2 (at the last nodes if the subjects reached the 16th node. circle). This means that in the begin- We chose to shift the game tree by eight ning it always chooses GO and at the nodes at a time, because a previous experi- end it chooses GO in 1 out of 2 times. ment by Krockow, Colman, and Pulford (2017) suggested that subjects struggled with The probability that the Computer a constantly moving window that always chooses STOP steadily increases from displayed the next eight decision nodes. Addi- 0 (at the ﬁrst circle) to 1/2 (at the last tionally, the experiment included a paper- circle). This means that in the beginning based comprehension test to check for the it never chooses STOP and at the end it understanding of the game’s basic features as chooses STOP in 1 out of 2 times. well as the different termination rules.

RANDOM GAME TERMINATION 355 Fig. 3. Subjects’ exit percentages in the experiment and computer STOP probabilities. Graphs show the percentage of experimental games that were terminated by human subjects at each of the 25 exit nodes in our Centipede games. Additionally, the calculated probabilities of games being terminated by the computer are displayed at each node. Graphs A–D correspond to the four conditions with different types of random computer termination. Procedure Table 1). In each testing session, all subjects For each of the four conditions, two testing experienced the same condition, and they sessions were conducted, each of which con- were informed about this fact. The subjects tained between 16 and 22 subjects and took were instructed to focus only on their own approximately 50 min to complete (see materials and computer screens, and the

356 EVA M. KROCKOW et al. experimenters checked that these rules were Condition D (decreasing δ) it was 0.68. Hence, followed at all times across all testing sessions. in Condition C more than half of the games After completing the consent form, subjects were terminated by the subjects, whereas in were presented with detailed, animated the other two treatment conditions with ran- instructions on their computer screens. They dom termination, only around a third of the could work through the slides in their own games were ended by either of the human time, and were given the opportunity to ask subjects. questions in private. Then, they were asked to Taking a closer look at Figure 3, the distribu- ﬁll in a short comprehension test. The experi- tions of subjects’ exit moves show marked differ- menters checked all responses and corrected ences across treatment conditions. Although in any misunderstandings. Subsequently, the Condition A (no random termination) more experiment was started. The computer ran- than 50% of the games were stopped after the domly assigned all subjects to a player role in 20th exit point, not a single game in the other which they remained for the entire testing ses- treatment conditions was stopped after the 20th sion. The subjects were ignorant of the identity exit point. In Condition B (constant δ), games of their co-players, and they were randomly re- stopped by subjects followed a near normal dis- paired after each game round (i.e., after each tribution, with most game exits occurring at the game they completed). The re-pairing of third or fourth decision node and no game con- players was randomized with replacement, tinuing beyond the eighth decision node. In meaning that the ideal of perfect stranger Condition C (increasing δ), the pattern also matching (i.e., never encountering the same resembled a bell-shaped distribution but the dis- co-player twice) was not achieved. However, persion was larger. Most subjects exited this given the relatively large size of our testing ses- treatment condition at Node 6, but some games sions (compared to other CG research includ- continued for longer, with 19 being the latest ing Rapoport, Stein, Parco, & Nicholas, 2003), exit point reached. Finally, the exit distribution we do not believe this to be a problem. The of Condition D (decreasing δ), showed an web application provided them with real-time almost linear decrease across exit points. The feedback about all the moves made and on the majority of games (40%) that were exited by outcome of each game. Once each subject had human subjects stopped at Node 1, 30% completed 20 rounds of Centipede games, stopped at Node 2, 20% stopped at Node 3, and one subject was drawn at random for the lot- the ﬁnal 10% stopped at Nodes 4, 5, 6 and tery prize. The winner received his or her out- 7. Interestingly, the exit distributions described come (in pounds sterling) of one randomly above follow the probability function of game selected game which they completed during terminations by the computer. In Condition A the session. with zero possibility of computer termination throughout the game, subjects’ defection levels remain very low across many decision nodes Results before suddenly spiking close to the game’s The proportion of games ending at each end. In Conditions B, C, and D, which were exit node for the different conditions is shown characterized by high game termination proba- in Figures 3 and 4. Figure 3 displays the pro- bilities in the beginning, a much higher per- portions of games terminated by human centage of games were stopped at early exit players, and plots these results against the nodes by the subjects. Particularly Condition D probability functions of random computer ter- (decreasing δ) shows a close match between the mination. Figure 4 omits the probability func- subjects’ exit distributions and the computers’ tions, and shows the computer’s actual game linearly decreasing termination probabilities. terminations instead. For examples of individ- The overview of mean exit points is comple- ual behavior, please see the Appendix. mented by the display of players’ conditional As can be seen in Figure 4, a large propor- STOP probabilities at each node (see Fig. 5), tion of games in the treatment conditions with showing percentages of individuals who random computer stopping were in fact termi- reached each decision node and decided to nated by the computer. In Condition B (con- defect at that node. In Condition A stant δ) this proportion amounted to 0.65, in (no random termination), STOP probabilities Condition C (increasing δ) it was 0.49, and in are very low until Node 21, from which point

RANDOM GAME TERMINATION 357 Fig. 4. Total exit percentages in the experiment. Graphs show the percentages of experimental games that were ter- minated at each of the 25 exit nodes in our Centipede games. The black bars represent the percentages of games stopped by experimental subjects. The grey bars represent the percentages of games ended by a computer move. Graphs A–D correspond to the four conditions with different types of random computer termination. they steadily increase toward a mode of 100% Condition D (decreasing δ), a small bell-curve at Node 25. In Condition B (constant δ), of STOP probabilities was found: Starting with STOP probabilities are below 10% on Node a percentage of approximately 10% at Node 1, but increase almost steadily until Node 1, STOP probabilities rise to almost 30% at 7, beyond which no game in this condition Node 5 and then begin to fall again. continued: The modal STOP probability was A Kruskal-Wallis H test was conducted to above 40% at Node 6. In Condition C compare the normalized cooperation rates (increasing δ), most STOP probabilities of sub- (i.e., the proportion of GO moves per total jects stayed below 20%, and the modal STOP moves) per subject across conditions. Signiﬁ- probability was found at Node 19, where a cant differences were found, χ 2(3) = 14.95, third of all subjects stopped. Finally, in p < .005, with a mean rank cooperation rate of

358 EVA M. KROCKOW et al. Fig. 5. Subjects’ STOP probabilities at each of the 24 decision nodes. Based on the experimental results, the graphs display calculated conditional probabilities (in percentages) of a subject choosing “STOP” assuming that they have reached the respective decision point. Graphs A–D correspond to the four conditions with different rules for random computer termination. 89.40 for Condition A, 84.49 for Condition C, 30.12 (decreasing δ) (U = 429, p < .05, r = .32). 59.87 for Condition D, and 59.85 for Condi- Furthermore, Condition C (increasing δ) with a tion B (for mean cooperation rates see also mean rank of 44.18 was found to have a signiﬁ- Table 2). Pairwise comparisons using Mann– cantly higher cooperation rate than Condition Whitney U tests showed that Condition A B (constant δ) with a mean rank of 29.65 (no random termination) with a mean rank of (U = 413, p < .005, r = .34). Condition C also 43.95 had a signiﬁcantly higher cooperation had a signiﬁcantly higher cooperation rate with rate than Condition B (constant δ) with a a mean rank of 42.99 than Condition D mean rank of 29.91 (U = 422, p < .005, (decreasing δ) with a mean rank of 31.04 r = .33). Condition A also had a signiﬁcantly (U = 460.5, p < .05, r = .28). higher cooperation rate with a mean rank of The mean percentages of GO moves per 43.78 than Condition D with a mean rank of game round for all four conditions are

RANDOM GAME TERMINATION 359 displayed in Figure 6. Only the graphs of Condi- rounds and suggesting that no learning took tion A (no random termination) and Condition place. C (increasing δ) show discernible temporal trends, indicating an increase of cooperation over rounds. In Condition A, the mean percent- Discussion age of GO moves increased from a value of approximately 89% in Round 1 to a value of This experiment aimed to extend previous approximately 96% in Round 20. Time series research on repeated games with random ter- analyses conﬁrmed the learning pattern appar- mination rules by providing the ﬁrst investiga- tion of CGs with varying termination rules and ent in Condition A. The SPSS Expert Modeler long decision sequences. In particular, we identiﬁed an exponential smoothing Holt linear used 24-node ﬁnite-horizon games and tested trend model with parameters of α (level for effects of different rules of random com- smoother) = 0.20 and γ (trend smoother) = puter termination (no random termination, 1.00, indicating a linearly increasing score pat- constant, increasing, and decreasing termina- tern. The stationary R2 model ﬁt statistic was cal- tion probability) on human cooperation levels. culated to estimate the model’s goodness of ﬁt. All treatment conditions with random com- With an R2 value of .75, the model can explain puter termination were controlled for average approximately 75% of the variance in the data termination probability across the 24 decision and indicates a superior ﬁt compared to a sim- nodes (the mean probability was 1/4 for each ple mean model used as a baseline for compari- condition). However, the conditions varied son. Additionally, the Ljung-Box statistic Q was regarding their expected computer termina- calculated to test whether the model was cor- tion points, ranging from TD = 2.13 to TC = rectly speciﬁed. The value of Q(16) = 18.38, 8.99. Our results revealed large differences (p = .302) showed that no signiﬁcant temporal between the four treatment conditions, with structure in the data set was unaccounted for by subjects’ mean exit points varying across con- the Holt linear model identiﬁed. ditions. Condition A (no random termination) In Condition C, the mean percentage of yielded signiﬁcantly higher mean exit points GO moves increased from a value of approxi- than Condition C (increasing δ), and both of mately 80% in Round 1 to values above 90% these conditions yielded signiﬁcantly higher in later rounds. Again, time series analyses means than Conditions B (constant δ) and D identiﬁed an exponential smoothing Holt lin- (decreasing δ). Matching the subjects’ mean ear trend model with parameters of α (level exit points with the respective expected game smoother) = 0.11 and γ (trend smoother) = lengths (as based on the random computer 2.281E–6, indicating a linearly increasing score termination rules), the values of mean exit pattern. With a stationary R2 value of .72, the points follow the same order as the values of model can explain approximately 72% of the the expected game length. More speciﬁcally, variance of the data. Additionally, the Ljung- games with a higher expected game length Box statistic Q was calculated; the value of Q were stopped later than those with a lower (16) = 12.34, (p = .72) showed that no signiﬁ- expected game length. Additionally, inspec- cant temporal structure in the data set was tion of results showed a close match between the percentages of subjects’ exit moves per unaccounted for by our model. decision node and the random termination Conditions B (constant δ) and D (decreas- probability associated with the respective ing δ) did not show any temporal trends. For node. This ﬁnding is in line with our hypothe- both conditions, the SPSS Expert Modeler ses, and it supports previous experimental identiﬁed ARIMA (0,0,0), a model indicating results (e.g., Dal Bó, 2005; Roth & Mur- nothing but white noise in the data across nighan, 1978). Table 2 Expected game length and cooperation rate Condition No Termination Constant δ Increasing δ Decreasing δ Expected game length T 24.00 4.00 8.99 2.13 Cooperation rate, M (SD) .92 (.10) .86 (.10) .92 (.06) .80 (.22)

360 EVA M. KROCKOW et al. Fig. 6. Mean cooperation rates (percentage of GO moves) for each of the 20 game rounds. Graphs A–D correspond to the four conditions with different types of random computer termination. Black lines show the observed values (i.e., the data obtained experimentally). Dotted lines show the ﬁt line indicating the temporal data trend. Interestingly, however, the decrease of the example, although Condition A’s expected game mean exit points was less severe than what could length of 24 nodes was 12 times higher than the have been expected from the drastic decrease of expected game length of Condition D (2 nodes), expected game length across conditions. For the mean exit point of subjects in Condition A

RANDOM GAME TERMINATION 361 was only 7.07 times higher than in Condition both the expected game length and the com- D. This indicates that cooperativeness did not puter’s termination rules across conditions. increase proportionately with the expected Based on the present design, it is not possible length of the games. to be certain of the reasons for differences in Indeed, the comparison of subjects’ cooper- the cooperation rates across the different ation rates across treatment conditions con- games, but we believe that they are jointly ﬁrmed this ﬁnding. Cooperation rates were inﬂuenced by expected game length and ter- surprisingly high across all conditions, with mination rules. Future research could extend 98% of subjects choosing GO more than half this study by controlling treatment conditions of the time, and more than 10% always choos- for the expected game length (rather than the ing GO. Signiﬁcant differences in cooperation mean termination probability), while compar- rates between conditions became apparent, ing different termination rules. Additionally, it but these differences did not follow the data is possible that an increase in stimulus control patterns previously identiﬁed when using could be achieved by announcing the com- mean exit points as dependent variable in the puter’s termination probabilities at each stage analyses. Condition A (no random termina- of the game. tion) and Condition C (increasing δ) yielded When examining temporal data trends, it comparable mean cooperation rates of appears that learning occurred only in the treat- approximately .92. Condition B (constant δ) ment conditions with longer expected game produced a mean rate of approximately .86, lengths and either no random game termina- and Condition D (decreasing δ) generated the tion or increasing probability of termination. In lowest cooperation rates (.80). However, due the standard 24-node game, cooperation rates to comparatively high variances within groups, increased linearly with increasing experience in the only signiﬁcant differences were found the game, reaching very high rates of over 95% between Condition D on the one hand and in the ﬁnal game rounds. Hence, learning Conditions A and C on the other hand, indi- occurred in the opposite direction of equilib- cating that only Condition D, with the lowest rium play. Similarly, in the condition with expected game length TD = 2.13, resulted in a increasing termination probabilities, initial signiﬁcant decrease in subjects’ cooperative- cooperation rates started at 83.3% and many ness compared to the control condition with- reached percentages higher than 90 toward the out random termination. ﬁnal game rounds. This is an interesting ﬁnd- An explanation for the large variances within ing, as the majority of experimental CG investi- groups could be the importance of individual gations reported decreases in cooperation over differences inﬂuencing cooperation rates. rounds (e.g., McKelvey & Palfrey, 1992; Rapo- Although the treatment condition had an port et al., 2003). Our learning effects could be impact on behavior, other-regarding behavioral explained by the linear payoff function and propensities (e.g., cooperative social value ori- comparatively low risk associated with each GO entations) may have accounted for some of the move in Condition A of the present study. variance (e.g., Krockow et al., 2016b; Pulford, Another reason may be the greater game Krockow, Colman, & Lawrence, 2016). Addi- length, which offers more opportunities for tionally, numeracy skills could have had an reciprocal cooperation (Krockow et al., 2016a). impact on decision making. The disproportion- Taken together, the ﬁndings suggest that CGs ally large number of cooperative choices in con- with far and ﬁnite horizons and linearly increas- ditions with shorter expected game lengths ing payoff functions generate high levels of could be explained by the subjects’ inability to cooperation that increase with higher experi- anticipate likely computer exit points from the ence in the game. When these games are com- termination probabilities. In future investiga- bined with different rules of random game tions, any confounding effects of numeracy and termination by the computer, the subjects’ mathematical ability could be reduced by mean exit points typically decrease. However, informing subjects about the expected game subjects’ cooperativeness as assessed by the length of their condition before the start of more accurate measure of cooperation rates each experiment. may be affected only in conditions with very A limitation of the present study’s research extreme conditions such as very low expected design concerns the simultaneous changes to game lengths. In this experiment, only

362 EVA M. KROCKOW et al. Condition D, with decreasing termination prob- Binmore, K. G. (1998). Game theory and the social contract: ability and an expected game length of approxi- Just playing (Vol. 2). Cambridge, MA: MIT press. Bolle, F. (1990). High reward experiments without high mately two decision nodes, led to a signiﬁcant expenditure for the experimenter. Journal of Economic decrease in cooperativeness relative to the con- Psychology,11(2), 157–167. https://doi.org/10.1016/ trol condition. Future research should investi- 0167-4870(90)90001-P gate the effects that individual differences may Bornstein, G., Kugler, T., & Ziegelmeyer, A. (2004). Indi- have on cooperation levels in CGs and RPDGs vidual and group decisions in the Centipede game: Are groups more “rational” players? Journal of Experi- with random termination rules. Interesting vari- mental Social Psychology, 40(5), 599–605. https://doi. ables to investigate could be social value orienta- org/10.1016/j.jesp.2003.11.003 tion and general numeracy skills. To increase Cerutti, D. T. (1989). Discrimination theory of rule- external validity of the current study design fur- governed behavior. Journal of the Experimental Analysis ther, follow-up research could dispense with the of Behavior, 51(2), 259–276. https://doi.org/10.1901/ jeab.1989.51-259 formal rules communicated to experimental Colman, A. M., Krockow, E. M., Frosch, C. A., & subjects, because many real-life choices with Pulford, B. D. (2016). Rationality and backward probabilistic consequences are not presented induction in Centipede games. In N. Galbraith, with explicit probabilities. We tend instead in E. Lucas, & D. E. Over (Eds.), The thinking mind: A Festschrift for Ken Manktelow (pp. 139–150). London: some situations to adapt our behavior to proba- Routledge. bilities through learning. An experiment with Colman, A. M., & Pulford, B. D. (2015). Psychology of learned instead of explicit probabilities, would game playing: Introduction to a special issue. Games, shift the experimental focus from rule-governed 6(4), 677–684. https://doi.org/10.3390/g6040677 behavior (or instructional control) to a focus on Cubitt, R., Starmer, C., & Sugden, R. (1998). On the valid- contingency-shaped behavior (learned behav- ity of the random lottery incentive system. Experimental Economics, 1(2), 115–131. https://doi.org/10.1007/ ior) (e.g., Cerutti, 1989), which could corre- BF01669298 spond more closely to everyday experience. Dal Bó, P. (2005). Cooperation under the shadow of the Applying the ﬁndings of our abstract game future: experimental evidence from inﬁnitely context to the previous real-life examples of repeated games. American Economic Review, 95, 1591–1604. https://doi.org/10.1257/0002828057750 different baby-sitting scenarios presented in 14434 the introduction, it appears that mutual trust Dal Bó, P., & Fréchette, G. R. (in press). On the determi- and reciprocal cooperation are common in nants of cooperation in inﬁnitely repeated games: A prolonged decision contexts marked by a per- survey. Journal of Economic Literature. sonal risk due to the other person’s possible Engle-Warnick, J., & Slonim, R. L. (2004). The evolution of strategies in a repeated trust game. Journal of Eco- defection. Cooperation is maintained even nomic Behavior and Organization, 55, 553–573. https:// under circumstances of increased uncertainty doi.org/10.1016/j.jebo.2003.11.008 including the relationship’s likely termination Fréchette, G. R., & Yuksel, S. (2017). Inﬁnitely repeated through an external force beyond the decision games in the laboratory: Four perspectives on dis- makers’ control. Only very extreme condi- counting and random termination. Experimental Economics, 20, 279–308. https://doi.org/10.1007/ tions, such as an expected interaction length s10683-016-9494-z of only two encounters, appear to lead to a sig- Green, L., & Myerson, J. (2004). A discounting framework niﬁcant decrease of cooperation. for choice with delayed and probabilistic rewards. Psy- chological Bulletin, 130, 769–792. https://doi.org/10. 1037/0033-2909.130.5.769 References Herrnstein, R. J. (1961). Relative and absolute strength of Andreoni, J. (1988). Why free ride? Strategies and learn- response as a function of frequency of reinforcement. ing in public good experiments. Journal of Public Eco- Journal of the Experimental Analysis of Behavior, 4, nomics 37(3), 291–304. 267–272. https://doi.org/10.1901/jeab.1961.4-267 Jiborn, M., & Rabinowicz, W. (2003). Reconsidering the Aumann, R. J. (1995). Backward induction and common Foole’s rejoinder: Backward induction in indeﬁnitely knowledge of rationality. Games and Economic Behavior, iterated Prisoner’s dilemmas. Synthese, 136(2), 8(1), 6–19. https://doi.org/10.1016/S0899-8256(05) 135–157. https://doi.org/10.1023/A:1024731815957 80015-6 Krockow, E. M., Colman, A. M., & Pulford, B. D. (2016a). Aumann, R. J. (1998). On the Centipede game. Games and Cooperation in repeated interactions: a systematic Economic Behavior, 23(1), 97–105. https://doi.org/10. review of Centipede game experiments, 1992-2016. 1006/game.1997.0605 European Review of Social Psychology, 27, 231–282. Bardsley, N., Cubitt, R., Loomes, G., Moffatt, P., https://doi.org/10.1080/10463283.2016.1249640 Starmer, C., & Sugden, R. (2010). Experimental econom- Krockow, E. M., Colman, A. M., & Pulford, B., D. (2016b). ics: Rethinking the rules. Princeton, NJ: Princeton Uni- Exploring cooperation and competition in the Centi- versity Press. pede game through verbal protocol analysis. European

RANDOM GAME TERMINATION 363 Journal of Social Psychology, 46, 746–761. https://doi. Axelrod’s tournaments. PLOS ONE, 10(7), 1–11, org/10.1002/ejsp.2226 e0134128. https://doi.org/10.1371/journal.pone. Krockow, E. M., Colman, A. M., & Pulford, B. D. (2017). 0134128. Far but ﬁnite horizons promote cooperation in the Centipede Rapoport, A., Stein, W. E., Parco, J. E., & Nicholas, T. E. game. Unpublished manuscript, Department of Neu- (2003). Equilibrium play and adaptive learning in a roscience, Psychology and Behaviour, University of three-person Centipede game. Games and Economic Leicester, UK. Behavior, 43, 239–265. https://doi.org/10.1016/ Krockow, E. M., Pulford, B. D., & Colman, A. M. (2015). S0899-8256(03)00009-5 Competitive Centipede games: Zero-end payoffs and Rosenthal, R. W. (1981). Games of perfect information, payoff inequality deter reciprocal cooperation. Games, predatory pricing and chain store paradox. Journal of 6(3), 262–272. https://doi.org/10.3390/g6030262 Economic Theory, 25, 92–100. https://doi.org/10. McKelvey, R. D., & Palfrey, T. R. (1992). An experimental 1016/0022-0531(81)90018-1 study of the Centipede game. Econometrica, 60, Roth, A. E. (1995). Introduction to experimental econom- 803–836. https://doi.org/10.2307/2951567 ics. In J. Kagel & A. E. Roth (Eds.), Handbook of experi- McKelvey, R., D., & Palfrey, T. R. (1998). Quantal mental economics (pp. 3–109). Princeton, NJ: Princeton response equilibria for extensive form games. Experi- University Press. mental Economics, 1, 9–41. https://doi.org/10.1007/ Roth, A. E., & Murnighan, J. K. (1978). Equilibrium behav- BF01426213 ior and repeated play of the Prisoner’s Dilemma. Jour- nal of Mathematical Psychology, 17(2), 189–198. https:// Myerson, J., Green, L., & Morris, J. (2011). Modeling the doi.org/10.1016/0022-2496(78)90030-5 effect of reward amount on probability discounting. Selten, R., Mitzkewitz, M., & Uhlich, G. R. (1997). Duopoly Journal of the Experimental Analysis of Behavior, 95, strategies programmed by experienced players. 175–187. https://doi.org/10.1901/jeab.2011.95-175 Econometrica, 65, 517–556. https://doi.org/10. Nagel, R., & Tang, F. F. (1998). Experimental results on the 2307/2171752 Centipede game in normal form: An investigation on Selten, R., & Stoecker, R. (1986). End behavior in learning. Journal of Mathematical Psychology, 42(2/3), sequences of ﬁnite Prisoner’s Dilemma supergames: 356–84. https://doi.org/10.1006/jmps.1998.1225 A learning theory approach. Journal of Economic Behav- Normann, H. T., & Wallace, B. (2012). The impact of the ior and Organization, 7(1), 47–70. https://doi.org/10. termination rule on cooperation in a prisoner’s 1016/0167-2681(86)9002-1 dilemma experiment. International Journal of Game Tucker, A. (2001). A two-person dilemma (Unpublished Theory, 41(3), 707–718. https://doi.org/10.1007/ notes, Stanford University). Reprinted in E. Rasmussen s00182-012-0341-y (Ed.), Readings in games and information (pp. 7–8). Mal- Palacios-Huerta, I., & Volij, O. (2009). Field centipedes. den, MA: Blackwell. (Original work published 1950) American Economic Review, 99, 1619–1635. https://doi. Zhang, H., & Maloney, L. T. (2012). Ubiquitous log odds: org/10.1257/aer.99.4.1619 a common representation of probability and fre- Pulford, B. D., Krockow, E. M., Colman, A. M., & quency distortion in perception, action, and cogni- Lawrence, C. L. (2016). Social value induction and tion. Frontiers in Neuroscience, 6, 1. https://doi.org/10. cooperation in the Centipede game. PLOS ONE, 11(3), 3389/fnins.2012.00001 1–21. https://doi.org/10.1371/journal.pone.0152352 Rapoport, A., Seale, D. A., & Colman, A. M. (2015). Is tit-for- Received: August 4, 2017 tat the answer? On the conclusions drawn from Final Acceptance: February 12, 2018

364 EVA M. KROCKOW et al. Appendix Condition A, Participant ID 118, Player role 2 25 Exit point 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Game round 25 Condition A, Participant ID 505, Player role 2 Exit point 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Game round 25 Condition C, Participant ID 207, Player role 1 Exit point 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Game round 25 Condition D, Participant ID 811, Player role 1 Exit point 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Game round Fig. A1. Examples of individual participant behavior. For each condition, decisions of one representative participant displaying typical behavior for that condition is shown. The exit points of these participants are displayed across the 20 game rounds. Those games terminated by the individual participant are marked by black circular shapes. Those games terminated by the other participant are marked by circular shapes with the letter “O”. Those games terminated by the computer (only applicable in Conditions B, C, and D) are marked by a square shapes with the letter “C”.

You can also read