How Scholarly Is Google Scholar? A Comparison to Library Databases

Page created by Christine Austin
 
CONTINUE READING
How Scholarly Is Google Scholar? A
Comparison to Library Databases
Jared L. Howland, Thomas C. Wright, Rebecca A.
Boughan, and Brian C. Roberts

        Google Scholar was released as a beta product in November of 2004.
        Since then, Google Scholar has been scrutinized and questioned by
        many in academia and the library field. Our objectives in undertaking this
        study were to determine how scholarly Google Scholar is in comparison
        with traditional library resources and to determine if the scholarliness of
        materials found in Google Scholar varies across disciplines. We found
        that Google Scholar is, on average, 17.6 percent more scholarly than
        materials found only in library databases and that there is no statistically
        significant difference between the scholarliness of materials found in
        Google Scholar across disciplines.

Originally presented June 30, 2008, at the             as a beta version, has not only become a
American Library Association’s Annual                  common fixture in library literature but is
Conference in Anaheim, California.                     also becoming ubiquitous in information-
                                                       seeking behavior of users. Google Scholar
           oogle Scholar was introduced                was initially met with curiosity and skep-
           to the world in November of                 ticism.1 This was followed by a period of
           2004 as a beta product. It has              systematic study.2 More recently, there has
           been embraced by students,                  been optimism about Google Scholar’s
scholars, and librarians alike. However,               potential to move us toward Kilgour’s
Google Scholar has received criticism                  goal of 100 percent availability of infor-
regarding the breadth and scope of                     mation.3 Librarians now find themselves
available content. We undertook this                   acknowledging users’ preferences for
study to answer two questions regarding                one-stop information shopping by giving
these common criticisms: (1) Are Google                Google Scholar ever-increasing visibility
Scholar result sets more or less scholarly             on their Web pages.4 Even as librarians
than licensed library database result sets?            begin to promote Google Scholar, the
and (2) Does the scholarliness of Google               debate continues within the information
Scholar vary across disciplines?                       community as to the advisability of guid-
                                                       ing users to this tool. The view of critics
Literature Review                                      like Péter Jascó, who use terms such as
Google Scholar, which is still branded                 “shallowness” and “artificial unintel-

Jared L. Howland is Electronic Resources Librarian, Thomas C. Wright is Collection Development Coordina-
tor, Rebecca A. Boughan is Electronic Resources Assistant, and Brian C. Roberts is Process Improvement
Specialist, all at Brigham Young University. E-mail: jared_howland@byu.edu, tom_wright@byu.edu,
rebecca_boughan@byu.edu, brian_roberts@byu.edu.

                                                 227
228 College & Research Libraries                                              May 2009

ligence” to describe the program,5 seems       be to include each citation in a scholarly
to be giving way to a landscape where          research paper at an academic institution.
respected publishers (like Cambridge)             This notion of scholarliness, that we
and platforms (for instance, JSTOR) are        attempted to encapsulate in the rubric,
now offering links out to Google Scholar       uses a common collection-assessment tool
for more citations.                            as outlined by Kapoun.9 This model con-
   Early studies of Google Scholar tried to    siders many factors, including accuracy,
match citations “hit to hit” in comparison     authority, objectivity, currency, and cov-
with traditional library databases. Jascó      erage. For the purposes of the study, we
even provided a Web site where the curi-       added relevancy because materials that
ous could compare search results between       met those five criteria were not always
Google Scholar and the likes of Nature,        relevant to the research topic.
Wiley, or Blackwell.6 More recently, stud-        The Kapoun model of evaluating
ies have appeared that track the “value-       scholarliness was based on his experience
added” open-access citations that appear       in evaluating print resources but was
uniquely in Google Scholar versus other        expanded for evaluating Web resources.
sources.7 However, is comprehensive-           Because this study was constructed
ness of content the primary indicator of       to compare library database results to
a resource’s usefulness?                       Google Scholar results, we had no way
   Every title from every database may         of knowing the breadth of materials the
not be in Google Scholar, but that should      rubric would be required to evaluate.
not be an indictment of Google Scholar’s       Google Scholar alone references materi-
inability to return scholarly results across   als in any format whether it is in print
disciplines. The algorithms Google Schol-      or electronic only and includes journals,
ar uses to return result sets cannot really    books, syllabi, and conference proceed-
be compared to library database algo-          ings. These are just a few examples of the
rithms. However, what is returned can be       disparate types of materials the rubric
judged for its relevancy and scholarliness.    would need to handle. By using a model
Up to this point, studies of Google Scholar    flexible enough to evaluate materials in
have followed the example of Neuhaus           any format, we have attempted to inject a
et al., which compared Google Scholar          qualitative value of Google Scholar results
content to forty-seven other databases.8       to the ongoing debate.
This title-by-title and citation-by-citation
comparison is a pure numerical measure,        Methodology
but it neglects to address the efficacy of     We selected seven subject librarians
any particular search or the scholarly         from Brigham Young University to cover
nature of content or algorithms in discov-     various academic disciplines: humani-
ering that content.                            ties, sciences, and social sciences. Each
   We felt that a different approach was       specialist was blind to the purpose of the
needed. Rather than measuring what has         study. We requested that they provide us
gone into the database, we have sought, to     (1) a sample question that they typically
some degree, to evaluate what comes out        receive from students, (2) a structured
as a result of search queries. We have done    query to search a library database, and
this by involving subject librarians with      (3) the library database they would use
knowledge of typical reference questions       for that particular query.
and using those questions to query both           We then used their data in two differ-
Google Scholar and discipline-specific         ent ways. First, we translated the library
databases. We then asked the same librar-      database query into an equivalent search
ians to judge the search results using a ru-   string used by Google Scholar. Using the
bric of scholarliness. In short, we wanted     original query and the query translated to
to determine how appropriate it would          work with Google Scholar, we searched
How Scholarly Is Google Scholar? 229

                                  Table 1
                     academic Representation in This Study
 academic         Database Query                  GS Query             library Database
 Discipline
 Science      (ACL or “anterior cruci-     ACL OR “anterior cruci-     SportDiscus
              ate ligament*”) and in-      ate ~ligament” ~in-
              jur* and (athlet* or sport   jury ~athlete OR sport
              or sports) and (therap*      ~therapy OR ~treatment
              or treat* or rehab*)         OR ~rehabilitation
 Science      lung cancer and (etiol*      lung cancer ~etiology       Medline
              or caus*) and (cigarette*    OR ~cause ~cigarette OR
              or smok* or nicotine*)       ~smoking OR ~nicotine
 Science      “dark matter” and            “dark matter” evidence      Applied Science
              evidence                                                 and Technology
                                                                       Abstracts
 Social       (“fast food” or mcdon-       “fast food” OR mc-      Business Source
 Science      ald’s or wendy’s or          donald’s OR wendy’s     Premier
              “burger king” or restau-     OR “burger king” OR
              rant) and franchis* and      restaurant ~franchise
              (knowledge n3 transfer       “knowledge transfer” OR
              or “knowledge manage-        “knowledge management”
              ment” or train*)             OR ~train
 Social       (“standardized test*” or     “standardized ~test”        PsycINFO
 Science      “high stakes test*”) and     OR “high stakes ~test”
              (“learning disabilit*”       “learning ~disability” OR
              or dyslexia or “learning     dyslexia OR “learning
              problem”) and accom-         problem” ~accommoda-
              modat*                       tion
 Humanities   (bilingual* or L2) and  ~bilingual OR L2 ~child          Linguistics and
              (child* or toddler) and OR toddler “cognitive            Language Behavior
              “cognitive development” development”                     Abstracts
 Humanities   (memor* or remem-         ~memor OR remembrance JSTOR
              brance or memoir*) and OR ~memoir holocaust
              (holocaust) and (Spiegel- Spiegelman OR Maus
              man or Maus)

both the library database and Google            in the library database. This allowed us to
Scholar and retrieved the citations and         calculate the overlap of citations between
full text for the first thirty results. We      the library databases and Google Scholar.
selected thirty results because research           We standardized the formatting of the
has shown that less than one percent of         citations and inserted them randomly into
all users ever go beyond a third page of        a spreadsheet, which contained a rubric
results and most search engines return          that was used to assign a scholarliness
about ten results per page.10                   score to each of the citations. The rubric
   Next, we took the citations from the         contained six criteria, based on Kapoun’s
library databases and determined if they        model of evaluating resources, to judge
could also be found using Google Scholar        scholarliness: (1) accuracy, (2) authority,
and took the citations from Google              (3) objectivity, (4) currency, (5) coverage,
Scholar to see if they could also be found      and (6) relevancy.11 These criteria were
230 College & Research Libraries                                                                May 2009

                                           Table 2
                                Rubric for Grading Scholarliness
 1 = Below Average Quality; 2 = Average Quality; 3 = Above Average Quality
 Citation        References         accuracy    authority   Objectivity   Currency   Coverage    Relevancy
 Number

 1          Barnes, J.E., &          123          123         123          123        123         123
            Hernquist, L.E.
            (1993) Computer
            models of col-
            liding galaxies.
            Physics Today,
            46, 54–61.
 2          Bergstrom, L.            123          123         123          123        123         123
            (2000) Nonbary-
            onic dark matter:
            Observational
            evidence and de-
            tection methods.
            Reports on Prog-
            ress in Physics,
            63(5), 793–841.

graded on a scale of 1 (below average) to               E = Effect due to “exclusivity” (i = 1, 2, 3)
3 (above average) and summed to create a                L = Effect due to librarian (j = 1, 2, …7)
total scholarliness score for each citation.            EL = Interaction between “exclusivity”
    We provided the subject librarians with             and librarian
the full text of each of the citations and              ε = Error term (k = degrees of freedom
asked them to use the rubric to evaluate                associated with the error term)
the scholarliness of the individual cita-                  Within the context of this formula,
tions. After the grading was completed,                 E controls for any effect due to the
we were able to group each citation from                “exclusivity” of the citation, where i
the subject librarian into one of three cat-            represents each of the three categories
egories: (1) the citation was available only            of “exclusivity” (that is, the citation
in the library database, (2) the citation was           was found only in the database, it was
available only in Google Scholar, or (3) the            found only in Google Scholar, or it was
citation was available in both the library              found in both the database and Google
database and Google Scholar. We have                    Scholar). Each librarian (L) also played a
used the term “exclusivity” to describe                 role in the total scholarliness score. One
the three categories.                                   librarian could have provided consis-
    Once we had grouped the citations by                tently low scores, with another having
category, we ran a statistical analysis that            a tendency toward higher scores. To
controlled for the effect of the individual             account for this disparity, each librar-
librarian on the total scholarliness score,             ian was treated as a factor in the total
for the effect of “exclusivity” and for any             scholarliness score, where j represents
interaction there may have been between                 each of the seven participants. In short,
both librarian and “exclusivity”:                       this formula allowed us to calculate a
                                                        measure of scholarliness while account-
total scholarliness score = µ + Ei + Lj + ELij + εijk   ing for differences in the location of
Where:                                                  citations as well as differences between
µ = Average total score                                 librarians.
How Scholarly Is Google Scholar? 231

                                     Table 3
     Scholarliness based on “exclusivity” (Maximum Scholarliness Score Is 18)
     Participant    Found Only      Found Only         Percent Change in         Found in both
                    in Database    in GS average   Scholarliness Score between   average Score
                      average          Score          the Database and GS
                       Score
 1                     11.7            16.1                  36.8%                   13.5
 2                     13.2            13.8                  4.5%                    14.6
 3                     N/A             12.0                   N/A                    15.6
 4                     10.0            13.5                  35.0%                   14.3
 5                     10.0            11.6                  16.0%                   11.5
 6                     11.7            12.8                  8.5%                    14.3
 7                     16.5            14.4                 –12.7%                   13.9
 least Squares         11.9            14.0                  17.6%                   14.2
 Mean

Results                                            and licensed library databases had a
The mean scholarliness score of citations          higher average score than citations found
found only in Google Scholar was 17.6              only in one or the other. Finally, there
percent higher than the score for citations        was no statistically significant difference
found only in licensed library databases.          found between the scholarliness score
In fact, across all but one of the tested          across disciplines within Google Scholar.
disciplines, citations found only in Google        Searching for either a humanities topic or
Scholar had a higher average scholarliness         a science topic yielded no difference in the
score than citations found only in licensed        scholarliness score of citations discovered
library databases. The one discipline with         in Google Scholar.
a lower score, however, had only two
unique citations in the library database,          Discussion of Results
so the exact significance of the scores for        It is interesting to note that there was very
that discipline is imprecise. Additionally,        little overlap between the initial thirty
the citations found in both Google Scholar         citations returned by the databases and
                                                   the initial thirty citations returned by
                 Table 4                           Google Scholar. In fact, only one query
            Overlap of Citations                   of the seven had any overlapping cita-
                                                   tions between Google Scholar and the
 Participant       Percent of      Percent of      database—an overlap of five citations
                   Database       GS Citations
                                                   from JSTOR that appeared within the first
                   Citations      in Database
                                                   thirty results in Google Scholar.
                     in GS
                                                       However, during the second phase of
 1                   76.7%           0.0%          the study, when we began to search for
 2                   83.3%           43.3%         specific citations, we found that Google
                                                   Scholar actually contained 76 percent of all
 3                  100.0%           96.7%
                                                   the citations found in the library databases,
 4                   96.7%           80.0%         while the library databases contained only
 5                   93.3%           28.0%         47 percent of the citations found in Google
 6                   0.0%            46.7%         Scholar. Despite the initial lack of overlap
                                                   in the search results, it was clear that
 7                   81.8%           34.5%         Google Scholar included a large portion of
 Average             76.0%           47.0%         the citations available in library databases.
232 College & Research Libraries                                                May 2009

    This seems to validate the decision        evancy search options seems to indicate
of many students to use Google first to        that Google Scholar got it right in the first
look for information. If Google Scholar        place. It appears that Google Scholar has
contains much of the content available in      done a better job of both precision and
library databases, why shouldn’t students      recall than library databases have.
begin where the most content exists? The           Many studies have compared content
argument is made that Google Scholar           in library databases to content in Google
will return millions of hits, many of          Scholar and found inconsistencies. The
which are spurious at best, while a library    purpose of both search systems, however,
database will only return a few thousand       is to discover relevant, scholarly content.
results that are more focused to the query.    Using our scholarliness model, we found
    However, the power of ordering results     that, across disciplines, Google Scholar
by relevancy, combined with the fact that      is generally superior to individual data-
very few people ever go beyond the third       bases in retrieving appropriate citations.
page of results, creates a searcher-imposed    As more publishers share their content
higher level of precision for any search       with Google Scholar, we would expect the
engine. This is particularly true of Google    effectiveness of a Google Scholar search
Scholar, where the most relevant and more      to increase.
scholarly material floats to the top of the
list, while the less precise material falls    Future Studies
to the bottom, where it is rarely seen. Hit    The statistical results from this study can
counts are of secondary importance in a        be extrapolated only to the specific topics
Google Scholar search; the key to Google       and subject librarians that were involved
Scholar’s success is relevancy ranking and     in the study. A more comprehensive sta-
a large universe of information.               tistical methodology would need to be
    A database is limited to its defined       constructed to make the results generally
title list of content, whereas Google          applicable. However, our results were
Scholar, by its very nature, is open to a      compelling enough to make us believe that
much broader set of content that aids the      the results would hold up to more strenu-
researcher. Business Source Premier, one       ous tests. Additionally, the rubric we used
of the library databases, was the only         in our study was only a three-point Likert
library database where we found more           scale. Finding statistically significant dif-
Google Scholar citations in the database       ferences would have been easier had we
than database citations in Google Scholar.     selected a seven or more point Likert scale.
However, even in this one instance, the           Additionally, our analysis used a vet-
scholarly score for citations found only       ted approach to evaluating scholarliness
in Google Scholar was higher than the          of resources. A more objective view of
score for citations found only in the data-    scholarliness could be obtained by using
base. The citations found in both Google       some variation of citation analysis (such
Scholar and the database received even         as citation counts or ISI impact factor). We
higher scores, and these citations were        started to do such an analysis but decided
only exposed through the first thirty hits     there were too many trade-offs to be ap-
in Google Scholar. This seems to indicate      propriate, given the methodology we
that, even when Google Scholar is return-      used for this study. For example, citation
ing fewer titles, as in this case with Busi-   counts are difficult to come by for materi-
ness Source Premier, it still returns cita-    als other than journal articles, and impact
tions that are more scholarly to the top.      factors are calculated for journals only
    Up to this point, many library data-       and not for specific articles.12 Alternate
bases have defaulted to sorting by date        methodologies might be able to overcome
rather than by relevancy. The fact that        or account for the shortcomings of using
many databases are now adding rel-             citation analysis to judge scholarliness.
How Scholarly Is Google Scholar? 233

    Our study used skilled librarians to            discovering even more scholarly materi-
create search queries and to judge the              als than what is currently discovered in
quality of the citations retrieved. Unlike          Google Scholar. Comparing Google Schol-
most students, the librarians used com-             ar to a future system that has completely
plex search queries to find more relevant           indexed all local content and content
results. Students would be more likely              available to libraries but provided by third
to use natural language queries to find             parties would be the ultimate comparison.
citations. Complex search queries could
return very different results from natural          Conclusion
language queries. Future studies will               Typical arguments against Google Scholar
need to address the potential differences           focus on citation counts and point to in-
to find out if the results we found hold            consistent coverage between disciplines.
across different types of searches.                 We felt the more appropriate analysis
    Finally, future studies need to look at         was to compare the scholarliness of re-
the appropriateness of comparing Google             sources discovered using Google Scholar
Scholar to individual library databases.            with resources found in library data-
It is probable that federated searching is          bases. This analysis showed that Google
more comparable to Google Scholar than              Scholar yielded more scholarly content
are individual library databases. Howev-            than library databases, with no statisti-
er, how users and librarians select which           cally significant difference in scholarliness
resources to use in a federated search and          across disciplines. Despite these findings,
how the federated search engine returns             Google Scholar is not in competition with
the results would still impact the discov-          library databases. In truth, without the
erability of scholarly resources. Some              cooperation of database vendors and pub-
studies have already started down this              lishers, Google Scholar would not exist as
road,13 but Google Scholar result sets have         it does today. Google Scholar is simply a
still not been carefully compared to result         discovery tool for finding scholarly infor-
sets from federated search products.                mation, while databases still perform the
    Libraries have begun to build local             function of providing access to the content
Google Scholars, using tools such as Primo          unearthed by a Google Scholar search.
(Ex Libris), AquaBrowser (Medialab Solu-            The enhanced discoverability of informa-
tions), and Encore (Innovative Interfaces),         tion in Google Scholar makes it a great
that have the potential to aid users in             tool for librarians as well as library users.

                                              Notes
     1. Jan Brophy and David Bawden, “Is Google Enough? Comparison of an Internet Search
Engine with Academic Library Resources,” Aslib Proceedings 57, no. 6 (2005): 498–512; Susan
Gardner and Susanna Eng, “Gaga Over Google? Scholar in the Social Sciences,” Library Hi Tech
News 22, no. 8 (2005): 42–45; Martin Kesselman and Sarah Barbara Watstein, “Google Scholar and
Libraries: Point/Counterpoint,” Reference Services Review 33, no. 4 (2005): 380–87.
     2. Nisa Bakkalbasi et al., “Three Options for Citation Tracking: Google Scholar, Scopus and
Web of Science,” Biomedical Digital Libraries 3, no. 7 (2006), available online at www.bio-diglib.
com/content/3/1/7 [accessed 24 March 2008]; Kayvan Kousha and Mike Thelwall, “Google Scholar
and Google Web/URL Citations: A Multi-Discipline Exploratory Analysis,” Journal of the American
Society for Information Science and Technology 58, no. 7 (2007): 1055–65; Mary L. Robinson and Judith
Wusteman, “Putting Google Scholar to the Test: A Preliminary Study,” Program: Electronic Library
& Information Systems 41, no. 1 (2007): 71–80; Chris Neuhaus et al., “The Depth and Breadth of
Google Scholar: An Empirical Study,” Portal: Libraries and the Academy 6, no. 2 (2006): 127–41.
     3. Jeffrey Pomerantz, “Google Scholar and 100 Percent Availability of Information,” Informa-
tion Technology and Libraries 25 (June 2006): 52–56.
     4. Laura Bowering Mullen and Karen A. Hartman, “Google Scholar and the Library Web
Site: The Early Response by ARL Libraries,” College and Research Libraries 67, no. 2 (2006): 106–22.
     5. Péter Jascó, “Deflated, Inflated and Phantom Citation Counts,” Online Information Review
234 College & Research Libraries                                                                  May 2009
30, no. 3 (2006): 297–309; Péter Jascó, “As We May Search: Comparison of Major Features of the
Web of Science, Scopus, and Google Scholar Citation-Based and Citation-Enhanced Databases,”
Current Science 89, no. 9 (2005): 1537–47; Péter Jascó, “Google Scholar: The Pros and the Cons,”
Online Information Review 29, no. 2 (2005): 208–14.
    6. Péter Jascó, “Side-by-side2: Native Search Engines vs Google Scholar” (2005). Available
online at www2.hawaii.edu/~jacso/scholarly/side-by-side2.htm. [Accessed 24 March 2008].
    7. Kayvan Kousha and Mike Thelwall, “How Is Science Cited on the Web? A Classification of
Google Unique Web Citations,” Journal of the American Society for Information Science and Technology
58, no. 11 (2007): 1631–44.
    8. Neuhaus et al., “Depth and Breadth of Google Scholar.”
    9. Jim Kapoun, “Teaching Undergrads Web Evaluation: A Guide for Library Instruction,”
College and Research Libraries News 59, no. 2 (1998): 522–23.
   10. Jakob Nielsen and Hoa Loranger, Prioritizing Web Usability (Berkeley, Calif.: New Riders, 2006).
   11. Kapoun, “Teaching Undergrads Web Evaluation.”
   12. Per O. Seglen, “Why the Impact Factor of Journals Should Not Be Used for Evaluating
Research,” BMJ 314, no. 7079 (1997): 498–502.
   13. Xiaotian Chen, “Metalib, Webfeat, and Google: The Strengths and Weaknesses of Federated
Search Engines Compared With Google,” Online Information Review 30, no. 4 (2006): 413–27.

50th Annual RBMS Preconference

Seas of Change: Navigating the Cultural and Institutional
Contexts of Special Collections
June 17-20, 2009 | Charlottesville, Virginia
With major sponsorship from the University of Virginia Library and the
Antiquarian Booksellers Association of America

As the 50th anniversary of the RBMS preconference, this year represents an important moment in
the history of RBMS and its affiliated professions. In addition to celebrating our achievements, we
will also look broadly at how special collections librarianship has evolved over the past half century
with respect to changes in social, cultural, technological, economic, and academic environments,
and – more importantly – how we will need to respond to such changes in the future.

A variety of workshops, seminars, short papers, discussion topics, tours, receptions and a
booksellers’ showcase will complement the main program. For complete schedule, registration,
and housing information, please visit:

http://rbms.info
Image of the H.M.S. Beagle from an original watercolor by Conrad Martens in the PaulVictorius Evolution
Collection, courtesy of Special Collections, University of Virginia Library
You can also read