Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid

Page created by Randy Lambert
 
CONTINUE READING
Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid
Dare you buy a Henry
Moore on eBay?
Statistics can tell you what to avoid

                                  When the rarefied world of modern art sales meets the digital age,
                                  almost anything is possible. You, too, can buy a Henry Moore on
                                  eBay. But it is risky. The old, high-commission auction-houses have
                                  rivals, but you will need statistics to guide you. Joseph Gastwirth
                                  and Wesley Johnson tell you where the fakes may be lurking.

                                                                              Henry Moore’s sculptures – huge works in bronze
                                                                              or carved marble – are iconic. The public know them
                                                                              from museums and public places. Less well known is
                                                                              that Moore, the pre-eminent British sculptor of the
                                                                              20th century, produced many smaller-scale sculptures,
                                                                              drawings, etchings and lithographs, and that these
                                                                              frequently come up for sale. One perhaps surprising
                                                                              place to find them is on eBay. Guide prices there range
                                                                              from £250 to tens of thousands of pounds. And such
                                                                              eBay Henry Moores are not at all uncommon. On one
                                                                              day in December 2010, no fewer than five were on offer,
                                                                              from apparently separate sellers, and all described as
                                                                              original.
                                                                                    The internet has provided consumers with new
                                                                              and easy ways to purchase goods – and the commis-
                                                                              sions charged by an internet auction host are a fraction
                                                                              of those of the major art houses. But it has allowed
                                                                              less scrupulous businesses and individuals to offer
                                                                              poor-quality or mislabelled items. Ideally, before buying
                                                                              works at auction one would have experts examine them,
Mother and Child II (1983) Cramer, Grant and Mitchinson (CGM) catalogue 672   as is done at the big auction-houses like Sotheby’s and

    10                         march2011                                                                      © 2011 The Royal Statistical Society
Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid
because they recently had surgery. The method             authentic when it is not, objects that were
                                                    also needs at least two evaluators of classifica-         described as “similar” to or “related” to Moore
                                                    tions or tests of success or failure. Furthermore,        were excluded. We analysed only objects that
                                                    the evaluations should be independent of each             were claimed to be actually by Henry Moore.
                                                    other. In the case of screening for DVT, one
                                                    test measures the level of antibodies while the
                                                                                                              The data and statistical model
                                                    other is based on different technologies4.
                                                          Here we had our two subgroups – draw-               The results of the study were summarized in
                                                    ings and sculpture, where questionable works              two 2 × 2 tables reporting the matched pair
                                                    are common, versus lithographs and etchings,              classifications for the two groups of artwork
                                                    where they are rarer. In order to obtain a suf-           (Table 1). The drawings and sculpture data,
                                                    ficient and representative sample of Moore’s              as we have said, were combined because the
                                                    work, almost all of the objects described as              background information indicated that the
                                                    having been created by Henry Moore that ap-               prevalence of non-genuine items of both types
                                                    peared on eBay during the period from March               was similar. Furthermore, the fractions of
                                                    2005 to November 2007 (239 of them in all)                these two items that both evaluators thought
                                                    were assessed. We needed not one but two                  dubious were similar. Table 1 shows, for each
                                                    independent evaluators: the first was Stephen             group, the items that both evaluators thought
                                                    Gabriel, an expert on Moore, and the second               questionable; that Stephen Gabriel thought
                                                    was one of the authors ( JLG). Both have had              questionable but Gastwirth thought genuine;
                                                    a long-time interest in Moore’s art and have              that Gabriel thought genuine but Gastwirth
Mother and Child VIII (1983) CGM catalogue 678
                                                    extensive libraries. A third collaborator, Dr H.          thought questionable; and that both evaluators
                                                    Hikawa, downloaded the descriptions of each               thought genuine.
                                                    item, which typically included a digital photo,                 Two things affect the numbers that ap-
Christie’s, but for items listed on eBay, which     and provided the two evaluators with copies.              pear in the tables: the actual prevalence of non-
come from all over the world, this is impracti-     The files were e-mailed to the first evaluator,           genuine objects in each of the two groups, and
cal. One of us ( JLG) has long been interested      while the second evaluator was given a printed            the accuracy of the evaluators. This is where
in art by Moore. After noticing on eBay a small     version. To further ensure the independence               having two independent evaluators is so vital:
sculpture, supposedly made by Henry Moore,          of the evaluations, the two assessors did not             they provide a mutual cross-reference. Suitably
that did not “look right” he checked with           discuss any of the art for sale during this               statistically treated, each can provide a stand-
friends at the Moore Foundation.                    period. Because a major objective of the study            ard by which to judge the other. Furthermore,
      They had received inquiries from buyers       is to protect consumers against “misleading”              the evaluators’ accuracy has two parts. The first
who have purchased works incorrectly attrib-        descriptions, which suggest that an item is               is their sensitivity. This is the probability that a
uted to Moore; so the question of estimating
the prevalence of counterfeit art work arose.
      From our informal correspondence it           Table 1. Assessments of genuineness of Henry Moore’s art offered on eBay from March 2005 to November
appeared that a much higher percentage of           20075
“drawings” or “small sculptures” were dubious
than was the case for signed etchings and                             Prints                                    Evaluator 2
lithographs (prints). This last detail sug-
gested a statistical approach that we could                                                      Questionable               Genuine                 Total
use to estimate the proportion of fake Henry
Moores – or “questionable works”, in the more           Evaluator 1            Questionable              6                      10                    16
cautious language of the art world – that were                                   Genuine                 1                     149                   150
out there.
      In medical and social science applications,                                 Total                  7                     159                   166
where even the best method of classification is
not a “gold standard”, the Hui–Walter method1
can be used to estimate the accuracy rates of              Sculptures and drawings                              Evaluator 2
clinical tests2 and survey classifications3. That
                                                                                                 Questionable               Genuine                 Total
method requires one to study two subpopula-
tions, with a different prevalence of the trait         Evaluator 1            Questionable              59                      6                   65
in each. The high prevalence group might be
individuals who had symptoms of deep vein                                        Genuine                  2                      6                     8
thrombosis (DVT) while the low prevalence
                                                                                  Total                  61                    12                    73
group consists of individuals at risk of DVT

                                                                                                                         march2011                         11
Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid
non-genuine object will be classified correctly        interval gives 82% of them questionable. This       the data9. The estimated correlation was 0.29,
– that they will spot a fake. The second part          clearly means that government agencies con-         which is insufficient to result in a serious bias
is their specificity, which is in some ways the        cerned with consumer protection are justified       in the prevalence estimates.
reverse – that they will know a genuine article        in informing the public of potential authentic-
when they see one. These are very far from be-         ity issues. In contrast, only 4.1% of the signed
ing the same thing. An evaluator who classified        prints appear to be of doubtful authenticity.       Implications for buyers of artwork
every item as genuine would have a very high           The obvious first lesson is: if you are thinking
specificity, but a sensitivity of zero.                of buying a Henry Moore on eBay, buy a print        Clearly the results indicate that consumers
      If one considers the classification of           rather than a drawing or small sculpture.           should not take for granted the authenticity of
objects in the framework of classical statistics,            While a number of authors have raised         works by Moore, and probably other major art-
where the null hypothesis is that the object is        questions about the validity of the estimates       ists, that are offered on eBay or other internet
genuine and the alternative is that it is not, the     from latent class models such as the Hui–           sellers, and that they should carefully compare
Type I error equals 1 minus specificity and the        Walter7, most of the studies indicate that it is    the digital photographs and related informa-
Type II error is 1 minus sensitivity.                  the estimates of sensitivity and specificity that   tion provided by sellers with the correspond-
      The Hui–Walter method takes the data             are most affected by modest violations of its       ing information in the major catalogues. This
of Table 1 and calculates probabilities of             assumptions; the estimates of prevalence are        also applies to Moore’s prints because several
genuineness for objects in each category and           more sturdy. Furthermore, the greater the dif-      that were classified as non-genuine were from
calculates also estimates of the accuracies of         ference in the prevalence of the characteristic     an unsigned version where a questionable
the evaluators. The virtue of the method is that       in the two groups, the greater is the robustness    signature was added. As in all observational
it gives information both about the evaluators         of the prevalence estimate8 – and here our          studies, there is a possibility that some impor-
and the evaluated. It assumes that the accuracy        difference is indeed great: between fake rates      tant covariates, such as provenance or prior
rates of each evaluator are the same for both          of 91.5% in drawings and 4.1% in prints lies        ownership of the item, were not available. It is
categories of art and that, conditional on the         a difference of 87.4%. We may therefore place       difficult to think, however, of a realistic covari-
true status of an object, the evaluations are          some reliance on our conclusions. The key as-       ate that could explain the very low prevalence
independent. Given that, it provides statistical       sumption is that each evaluator has the same        of genuine drawings and small sculptures. The
estimates of the specificity and the sensitivity of    sensitivity and specificity for artworks of both    very high proportion of dubious drawings and
each evaluator, and of the fraction of prints and      types, and that they are independent.               small sculptures by Moore offered on eBay
the fraction of drawings that are questionable.              Although we took pains to ensure that         indicates that prospective buyers of art by
The results, with their confidence intervals, are      the evaluators worked independently, there          other major artists, such as Picasso or Chagall,
given in Table 2.                                      are two ways in which a modest degree of            should also be very careful.
      Although the confidence intervals for the        dependence could arise. Some sellers may
accuracy rates overlap, they suggest that the          offer multiple objects and, whether by design
                                                                                                           Potential applications in legal cases
evaluators had similar but not identical rates of      or ignorance, there is likely to be correlation
accuracy. The first evaluator, Stephen Gabriel,        in the status of the items put on eBay by the       After we began the project we became aware
was more sensitive, detecting more counterfeit         same seller. Also, both evaluators probably         of several legal decisions in cases where eBay
items, while the second, Joseph Gastwirth, had         consulted many of the same definitive cata-         was sued for assisting the sale of counterfeit
a slightly higher specificity, correctly classifying   logues and books and might have compared            products. All the suits involved possible viola-
legitimate items. What is more remarkable is           the photograph on eBay with the same “refer-        tions of intellectual property and trademark
the estimated prevalence of dubious drawings           ence photo”. To check the potential sensitivity     infringement, but the legal criteria used in
and sculptures: 91.5% of them are questionable.        of the results to possible dependence, a model      different nations are not uniform. Moreover,
Even taking the lower end of a 95% confidence          allowing for such correlation was also fitted to    eBay did have a process that allowed firms to
                                                                                                           report counterfeit items. Statistical evidence
                                                                                                           had a key role in many of the cases.
                                                                                                                 In the United States, eBay was found not
Table 2. Maximum likelihood estimates of the two prevalence parameters and accuracy rates of the two
evaluators. Maximum likelihood estimates were obtained using the EM algorithm with standard errors based   to have contributed to trademark infringement
on the bootstrap using the program TAGS6                                                                   in Tiffany v. eBay10. Tiffany presented a survey
                                                                                                           which claimed that about 75% of the items
                                                                                                           labelled as its product were counterfeit, while
 Parameter                                                         Mean 95% Confidence interval
                                                                                                           only 5% were surely genuine11,12. The courts
 Se1, sensitivity of evaluator 1 (Stephen Gabriel)                 0.968           (0.877,0.992)           decided that, even though eBay had general
 Se2, sensitivity of evaluator 2 (Joseph Gastwirth)                0.913           (0.810,0.962)           knowledge that counterfeit Tiffany silver jew-
 Sp1, specificity of evaluator 1                                   0.941           (0.889,0.969)           ellery was being sold, it was only required to
 Sp2, specificity of evaluator 2                                   0.995           (0.939,0.999)           take action if it had contemporary knowledge
                                                                                                           of which particular listings were infringing or
 Prev1, fraction of prints that are dubious                        0.041           (0.018,0.089)
                                                                                                           would infringe in the future. Furthermore, the
 Prev2, fraction of sculptures and drawings that are dubious       0.915           (0.818,0.962)
                                                                                                           trial court noted significant flaws in Tiffany’s

    12                         march2011
Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid
survey. It was not probability-based, so one         amount (at least 30%) of Tiffany jewellery         trademark infringement cases. The method
could not calculate a confidence interval for the    was counterfeit, this only helped establish that   we have used here might be adapted to help
fraction of non-genuine items. Furthermore,          eBay had general knowledge that counterfeit        monitor the authenticity of items offered for
the search used to identify the items that           products were being sold on its site.              sale on the internet.
were purchased and examined by two Tiffany                 In France, however, a study that was
experts included non-silver jewellery as well        submitted by Christian Dior and Luis Vuitton
                                                                                                        Potential refinements and
as the silver items that were at issue in the        estimated that 90% of items allegedly made by
                                                                                                        improvements to the study design
case. The sample sizes (186 in 2004 and 139          these designers were not genuine. This study
in 2005) were less than those specified by the       was accepted by the court. This estimate is        Our work can be regarded as a proof of prin-
survey designer. One reason for this shortfall       surprisingly similar to the 91.5% prevalence       ciple: it is possible to obtain reasonable esti-
was that Tiffany was unable to purchase some         estimate for non-genuine Henry Moore               mates of the prevalence of counterfeit items
of the items that were supposed to be in their       drawings and small sculptures in our study.        even when the evaluators do not examine the
sample. It was quite likely that those “missing      Partly on this basis, eBay was found liable for    pieces individually. During the time the data
items” had a higher probability of being genu-       contributing to trademark infringement.            were collected, the evaluators observed that
ine than those that they were able to acquire,             Although surveys have been used to           some particular art objects came up repeat-
as knowledgeable individual buyers were also         estimate the proportion of potential consum-       edly, and that items from some particular
bidding for the genuine pieces but not for           ers who are “confused” – a polite word for         sellers, especially those who sold many items,
the fakes. Finally, Tiffany did not participate      “deceived” – as to the source of a product         were more likely not to be authentic. The
in eBay’s monitoring programme during this           because of the design or packaging or are          approach could be improved by incorporat-
time, so that items that could have been re-         misled by advertising, statisticians may not       ing knowledge that is gained during a first
moved from the site were not. Although eBay’s        fully appreciate the potential for using sta-      phase, either about the type of items that are
statistical expert agreed that a substantial         tistical surveys and studies similar to ours in    non-genuine or sellers of those products, into

Two Women Seated on Beach (1984) CGM catalogue 719

                                                                                                                   march2011                   13
a second phase study. That study might be a
probability-based buying programme that is
focused on a smaller group of likely sellers of
problematic objects.
      When it is possible to obtain a third in-
dependent evaluation the latent class approach
does not require two subpopulations and has
been successfully used to evaluate screening
tests and estimate the prevalence of disease in
animals. The three-evaluator version is well
suited to estimating the prevalence of counter-
feit jewellery, as a second subpopulation with a
low prevalence of fakes might not exist.
      One possible limitation of the method is
that an infringing seller might purchase one
expensive handbag, say, make counterfeit ver-
sions, but put a picture of the genuine bag on
the internet. Presumably, a disappointed pur-
chaser would complain to eBay, which would
inform the company about a particular seller
of infringing items. Trademark holders and
consumer protection agencies might still find
a broad-based study or survey that provided
a statistically reliable estimate of the fraction
of counterfeit products sold by internet sites
useful both in legal cases and to inform policy-
makers and the public of the magnitude of the
problem.

References
     1. Hui, S. L. and Walter, S. D. (1980) Esti-
mating the error rates of diagnostic tests. Biometrics,
36, 167–171.
     2. Pepe, M. and Janes, H. (2007) Insights into
latent class analysis of diagnostic test performance.
Biostatistics, 8, 474–484.                                 Two Reclining Figures in Yellow and Green (1967) CGM catalogue 74
      3. Sinclair, M. D. and Gastwirth, J. L. (1996)
On procedures for evaluating the effectiveness of
reinterview survey methods: application to labor
force data. Journal of the American Statistical As-        models overstate accuracy for binary classifiers? (in   207–238.
sociation, 91, 961–969.                                    press).                                                        12. Levin, E. K. (2009) A safe harbor for
      4. Line, B.R., Peters, T. L. and Keenan, J.               8. Sinclair, M. D. and Gastwirth, J. L. (2000)     trademark: Reevaluating secondary trademark
(1997) Diagnostic test comparisons in patients             Properties of the Hui and Walter and related meth-      liability after Tiffany v. eBay. Berkeley Technology
with Deep Venous Thrombosis. Journal of Nuclear            ods for estimating prevalence rates and error rates     Law Journal, 24, 491–527.
Medicine, 38, 89–92.                                       of diagnostic testing procedures. Drug Information
      5. Gastwirth, J. L., Johnson, W. O. and              Journal, 34, 605–615.                                   Joseph Gastwirth is Professor of Statistics and
Hikawa, H. (2011) Estimating the fraction of                    9. Dendukuri, N. and Joseph, L. (2001)             ­Economics at the George Washington University,
“non-genuine” artwork by Henry Moore on eBay:              Bayesian approaches to modeling the conditional          Washington, DC, and Wesley Johnson is Professor of
application of latent class screening test methodol-       dependence between multiple diagnostic tests.            Statistics at the University of California at Irvine.
ogy. Journal of the Royal Statistical Society, Series A,   Biometrics, 57, 158–167.
174 (in press).                                                 10. 576 F. Supp. 2d 463 (S.D.N.Y. 2008) and
                                                                                                                   Acknowledgements
      6. Pouillot, R., Gerbier, G. and Gardner, I.         600 F. 3d 93 (2d. Cir. 2010).                           Grateful thanks are due to the Henry Moore Founda-
A. (2002) “TAGS”, a program for the evaluation                  11. Goldwasser, K. (2010) Knock it off: An         tion (www.henry-moore.org) for their generosity in
of test accuracy in the absence of a gold standard.        analysis of trademark counterfeit goods regulation      providing digital images of the artwork. A fuller, more
Preventive Veterinary Medicine, 53, 67–71.                 in the United States, France and Belgium. Cardozo       technical version is to appear in the Journal of the
      7. Spencer, B. (2010) When do latent class           Journal of International and Comparative Law, 18,       Royal Statistical Society, Series A.

    14                            march2011
You can also read