The Semantic Web Revisited - The Semantic Web

Page created by Clinton Floyd
 
CONTINUE READING
The Semantic Web Revisited - The Semantic Web
The Semantic Web

     Editor: Steffen Staab
     University of Koblenz-Landau
     staab@uni-koblenz.de

                                        The Semantic Web Revisited
                                        Nigel Shadbolt and Wendy Hall, University of Southampton
                                        Tim Berners-Lee, Massachusetts Institute of Technology

                                                                                           entific American article assumed that this would be straight-
                      I   n the 50 years since the term AI was coined at the Dart-
                          mouth Conference, the digital world has evolved at a
                       prodigious rate. It has produced an information infrastruc-
                                                                                           forward, but it’s still difficult to achieve in today’s Web.
                                                                                              The article included many scenarios in which intelli-
                                                                                           gent agents and bots undertook tasks on behalf of their
                       ture that few would have anticipated—with the possible              human or corporate owners. Of course, shopbots and auc-
                                                                                           tion bots abound on the Web, but these are essentially
                       exception of Vannevar Bush,1 although even he might have            handcrafted for particular tasks; they have little ability to
                       thought the scale of achievement extraordinary. Today, the          interact with heterogeneous data and information types.
                       World Wide Web links 10 billion pages, and search engines           Because we haven’t yet delivered large-scale, agent-based
                       can divine themes embodied in the links to serve useful and         mediation, some commentators argue that the Semantic
                       relevant content almost instantaneously.                            Web has failed to deliver. We argue that agents can only
                           Fifty years ago it might have appeared audacious to build       flourish when standards are well established and that the
                       a global web of information, to deploy semantics on such            Web standards for expressing shared meaning have pro-
                       a scale, and to attempt inference over the resulting compo-         gressed steadily over the past five years. Furthermore, we
                       nents. Fifty years ago, even if you could have explained            see the use of ontologies in the e-science community pre-
                       it, a Semantic Web would have seemed as remote as general           saging ultimate success for the Semantic Web—just as the
                       AI. Yet today we believe that the Semantic Web is attain-           use of HTTP within the CERN particle physics commu-
                       able. We are seeing its first stirrings, and it will draw on        nity led to the revolutionary success of the original Web.
                       some key insights, tools, and techniques derived from 50
                       years of AI research.                                               A growing need for data integration
                                                                                              Meanwhile, the need has increased for shared seman-
                       From documents to data and information                              tics and a web of data and information derived from it.
                          The original Scientific American article on the Seman-           One major driver—one that this magazine has reported on
                       tic Web appeared in 2001.2 It described the evolution of a          extensively—has been e-science (IEEE Intelligent Sys-
                       Web that consisted largely of documents for humans to               tems, special issue on e-science, Jan. 2004). For example,
                       read to one that included data and information for comput-          life sciences research demands the integration of diverse
                       ers to manipulate. The Semantic Web is a Web of action-             and heterogeneous data sets that originate from distinct
                       able information—information derived from data through              communities of scientists in separate subfields. Scientists,
                       a semantic theory for interpreting the symbols. The seman-          researchers, and regulatory authorities in genomics, pro-
                       tic theory provides an account of “meaning” in which the            teomics, clinical drug trials, and epidemiology all need a
                       logical connection of terms establishes interoperability            way to integrate these components. This is being achieved
                       between systems. This was not a new vision. Tim Berners-            in large part through the adoption of common conceptual-
                       Lee articulated it at the very first World Wide Web Confer-         izations referred to as ontologies. In the past five years,
                       ence in 1994. This simple idea, however, remains largely            the argument in favor of using ontologies has been won—
                       unrealized.                                                         numerous initiatives are developing ontologies for biology
                          A Web of data and information would look very differ-            (for example, see http://obo.sourceforge.net), medicine,
                       ent from the Web we experience today. It would routinely            genomics, and related fields. These communities are devel-
                       let us recruit the right data to a particular use context—for       oping language standards that can be deployed on the Web.
                       example, opening a calendar and seeing business meet-                  Many other disciplines are adopting what began in the
                       ings, travel arrangements, photographs, and financial               life sciences. Environmental science is looking to integrate
                       transactions appropriately placed on a time line. The Sci-          data from hydrology, climatology, ecology, and oceanogra-

96                                                           1541-1672/06/$20.00 © 2006 IEEE                               IEEE INTELLIGENT SYSTEMS
                                                            Published by the IEEE Computer Society
Resource Description Framework
      RDF assigns specific Universal Re-
   source Identifiers (URIs) to its individ-
   ual fields. Figure A is an example RDF                                    http://www.w3.org/2000/10/swap/pim/contact#Person
   graph from the W3C RDF Primer
   (www.w3.org/TR/rdf-primer), show-                                                       http://www.w3.org/1999/02/22-rdf-syntax-ns#type
   ing a representation for a person
                                                     http://www.w3.org/People/EM/contact#me
   named Eric Miller. As we create an
   RDF graph of nodes and arcs, a URI
                                                                                           http://www.w3.org/2000/10/swap/pim/contact#fullName
   reference used as a graph node iden-
   tifies what the node represents; a URI
   used as a predicate identifies a rela-                                                                   Eric Miller
   tionship between the things identi-
   fied by the connected nodes. So, the                                            http://www.w3.org/2000/10/swap/pim/contact#mailbox
   RDF in Figure A represents
   • individuals—such as Eric Miller,                                       mailto:em@w3.org
     identified by http://www.w3.org/
     People/EM/contact#me;                                  http://www.w3.org/2000/10/swap/pim/contact#personalTitle
   • kinds of things—such as Person,
     identified by http://www.w3.org/                   Dr.
     2000/10/swap/pim/contact#Person;
   • properties of those things—such
     as mailbox, identified by http://       Figure A. An RDF graph representing Eric Miller.
     www.w3.org/2000/10/swap/pim/
     contact#mailbox; and
                                                                           
   • values of those properties—such as mailto:em@w3.org as
     the value of the mailbox property (RDF also uses character            
     types such as integers and dates as property values).
                                                                                
     RDF also provides an XML-based syntax called RDF/XML for
                                                                                  Eric Miller
   recording and exchanging graphs. Figure B shows a small chunk                  
   of RDF in RDF/XML corresponding to the graph in Figure A.                      Dr.
     The Figure B rendering is actually quite clumsy syntactically,             
   and its lack of transparency and readability might have been a
                                                                              
   factor inhibiting rapid adoption of RDF. However, there are
   alternative forms that are easier to interpret; for example, see        Figure B. A chunk of RDF in RDF/XML describing Eric Miller and
   the N3 notation (www.w3.org/DesignIssues/Notation3.html).               corresponding to the graph in Figure A.

phy (see http://marinemetadata.org/examples/      EU countries are developing similar pro-             languages for sharing meaning. These lan-
mmihostedwork/ontologieswork). The need           grams to implement the EU directive.                 guages provide a foundation for semantic
to understand systems across ranges of scale         Despite these and other significant drivers       interoperability.
and distribution is evident everywhere in sci-    in defense, business, and commerce, it’s still          In 1997, the W3C defined the first Resource
ence and presents a pressing requirement for      apparent that the Semantic Web isn’t yet with        Description Framework specification (see
data and information integration.                 us on any scale.                                     the related sidebar). RDF provided a simple
   Various e-government initiatives repre-           So let’s review what progress we’ve made          but powerful triple-based representation lan-
sent similar efforts. The United Kingdom          and consider the various impediments to its          guage for Universal Resource Identifiers
has developed an Integrated Public Sector         global adoption.                                     (URIs). It became a W3C recommendation
Vocabulary (www.esd.org.uk/standards/                                                                  by 1999—a crucial step in drawing attention
ipsv). The recently created UK Office of          Progress                                             to the specification and promoting its wide-
Public Sector Information (www.opsi.gov.             Consistent with the need for a Web seman-         spread deployment to enhance the Web’s
uk) is a response to an EU directive (2003/       tics, the user community, including standards        functionality and interoperability.
98/EC, http://www.ec-gis.org/document.            organizations like the Internet Engineering             The original Web took hypertext and made
cfm?id=486&db=document). OPSI aims to             Task Force and the World Wide Web Con-               it work on a global scale; the vision for RDF
exploit the considerable amounts of gov-          sortium (W3C), has directed major efforts            was to provide a minimalist knowledge repre-
ernment data for citizens’ benefit. Several       at specifying, developing, and deploying             sentation for the Web.

MAY/JUNE 2006                                              www.computer.org/intelligent                                                                 97
Universal Resource Identifiers                     stages for W3C recommendation status, is       still needs tools and software development
        URIs identify resources and so are cen-         designed to fulfill this requirement.          environments to support its production and
     tral to the Semantic Web enterprise.3 Using                                                       application. These are starting to appear but
     a global naming convention (however arbi-          RDF translation                                as yet we have few means to routinely and
     trary the syntax) provides the global net-            Other significant progress includes GRDDL   effortlessly generate Semantic Web annota-
     work effects that drive the Web’s benefits.        (Gleaning Resource Descriptions from           tions using this or other languages at the point
     URIs have global scope and are interpreted         Dialects of Languages, www.w3.org/2004/        of content use or creation.
     consistently across contexts. Associating a        01/rdxh/spec), which provides a means to
     URI with a resource means that anyone can          extract RDF from XML and XHTML docu-           Rules and inference
     link to it, refer to it, or retrieve a represen-   ments using transformations expressed in          But ontologies are only one part of the
     tation of it.                                      XSLT (Extensible Stylesheet Language)          representation picture. Rules and inference
        Given the Semantic Web’s aims, we want          and associated with the original content.      also need support. The OWL language itself
     to reason about relationships. URIs provide        This capability could potentially overcome     is designed to support various types of infer-
     the grounding for both our objects and rela-       the RDF bootstrap problem by generating        ence—typically, subsumption and classifica-
     tions. They underpin the Semantic Web,             sufficient RDF for serendipitous reuse to      tion—and a range of automated reasoners are
     allowing machines to process data directly.        occur. The amount of XML and XHTML             available (for example, see www.cs.man.ac.
     In this way, the Semantic Web shifts the           data on the Web, especially data generated     uk/~sattler/reasoners.html). Because it’s diffi-
     emphasis from documents to data. Much of           from back-end databases, is considerable       cult to specify a formalism that will capture
     the motivation for the Semantic Web comes          and offers good opportunities for RDF          all the knowledge in a particular domain,
     from the value locked in relational databases.     conversion.                                    there are other approaches to inference on
     To release this value, database objects must                                                      the Web. Work has begun on the Rule Inter-
     be exported to the Web as first-class objects                                                     change Format (www.w3.org/2005/rules), an
     and therefore must be mapped into a system                                                        attempt to support and interoperate across a
     of URIs.                                             URIs have global scope.                      variety of rule-based formats. RIF will ad-
        Languages have evolved to offer greater                                                        dress the plethora of rule-based formalisms:
     opportunities for encoding meaning that can          Associating a URI with a                     Horn-clause logics, higher-order logics, pro-
     support information integration and interop-                                                      duction systems, and so on.
     erability. RDF Schema became a recom-                resource means that anyone                      Moreover, AI researchers have extended
     mendation in February 2004. RDFS took the                                                         these various logics and modified them to
     basic RDF specification and extended it to
     support the expression of structured vocabu-
                                                          can link to it, refer to it, or              capture causal, temporal, and probabilistic
                                                                                                       knowledge. Causal logic, such as Glenn
     laries. It has provided a minimal ontology                                                        Shafer proposed,5 developed out of action
     representation language that the research
                                                          retrieve a representation of it.             logics in AI, and it’s intended to capture an
     community has adopted fairly widely.                                                              important aspect of commonsense under-
                                                                                                       standing of mechanisms and physical sys-
     Triple stores                                      Web Ontology Language                          tems. Temporal logic formalizes the rules
        As RDF and RDFS have gained ground,                For those who required greater expres-      for reasoning with propositions indexed to
     the need for repositories that can store RDF       sivity in their object and relation descrip-   particular times; Zhisheng Huang and
     content has grown. These so-called triple          tions, the OWL (Web Ontology Language,         Heiner Stuckenschmidt suggested tempo-
     stores vary in their capabilities. Some focus      www.w3.org/TR/2004/REC-owl-features-           ral-logic approaches for ontology version
     on providing a rich means to reason over           20040210) specification integrated several     management.6 Probabilistic logics are cal-
     the triples (for example, see http://jena.         efforts. The W3C recommendation pre-           culi that manipulate conjunctions of proba-
     sourceforge.net), while others focus on            sents three versions of OWL, depending on      bilities of individual events or states. Perhaps
     storing large quantities of data (see http://      the degree of expressive power required.       the most well-known of these are Bayesian,
     sourceforge.net/projects/threestore for an         OWL’s core idea is to enable efficient rep-    which you can use to derive probabilities for
     example from the open source community             resentation of ontologies that are also        events according to prior theories about
     and www.oracle.com/technology/tech/                amenable to decision procedures. It checks     how probabilities are distributed. Bayesian
     semantic_technologies/index.html for a             an ontology to see whether it’s logically      reasoning is commonplace in search engines.
     commercial example). Some operate as               consistent or to determine whether a partic-   In domains where reasoning under uncer-
     plug-ins to current Web browsers (http://          ular concept falls within the ontology.        tainty is essential, such as bioinformatics,
     simile.mit.edu/piggy-bank) and others as sys-         OWL uses the linking provided by RDF        Kenneth Baclawski and Tianhua Niu have
     tems that can operate with a range of existing     to allow ontologies to be distributed across   suggested using Bayesian ontologies to
     third-party databases (www.openrdf.org).           systems. Ontologies can become distrib-        extend the Web to include such reasoning.7
        As the stores themselves have evolved,          uted, as OWL allows ontologies to refer to
     the need has arisen for reliable and stan-         terms in other ontologies. In this way OWL     Data exposure and
     dardized data access into the RDF they             is specifically engineered for the Web and     viral uptake
     hold. The SPARQL language (www.w3.org/             Semantic Web.4                                   So far, we’ve focused on languages, for-
     TR/rdf-sparql-query), now in its final review         OWL is seeing increased adoption but        malisms, standards, and semantics. For this

98                                                              www.computer.org/intelligent                             IEEE INTELLIGENT SYSTEMS
we make no apologies. The Semantic Web            bases on proteins, but we can’t provide the      The issue for a Semantic Web built from
can’t exist without carefully developed and       URI for a Uniprot protein and then simply        these conventions is to know when parts
agreed standards, just as the existing Web        read off or determine its properties. Rather,    need revision.
couldn’t have existed without HTTP, HTML,         the server passes us a zipped bundle of data        This brings us to an often quoted concern
and XML. But languages and standards are          downloaded as a blob. Moreover, the Life         about the Semantic Web—the cost of ontol-
of no consequence without uptake, and             Science Identifier (http://lsid.sourceforge.     ogy development and maintenance. In some
uptake requires increasing the amount of          net) naming-scheme standards that life           areas, the costs—no matter how large—will
data exposed in RDF. (We identify RDF             scientists use aren’t HTTP compatible. A         be easy to recoup. For example, an ontology
because of the often-encountered principle        process is needed that routinely gives URIs      will be a powerful and essential tool in well-
of least power—the less expressive the            to such objects and entrusts their manage-       structured areas such as scientific applica-
language, the more reusable the data.)            ment to individuals and communities who          tions. In certain commercial applications,
   Uptake is about reaching the point where       care about consistent and explicit reference     the potential profit and productivity gain
serendipitous reuse of data, your own and         methods.                                         from using well-structured and coordinated
others’, becomes possible. We’ve mentioned                                                         vocabulary specifications will outweigh the
the development in the life sciences. Expe-       Ontology development and                         sunk costs of developing an ontology and
rience suggests that an incubator commu-          management                                       the marginal costs of maintenance.
nity with a pressing technology need is an           The challenges here are real. The ontolo-        In fact, given the Web’s fractal nature,
essential prerequisite for success. In the        gies that will furnish the semantics for the     those costs might decrease as an ontology’s
original Web, this community was high-            Semantic Web must be developed, man-             user base increases. If we assume that ontol-
energy physicists who needed to share large       aged, and endorsed by committed practice         ogy building costs are spread across user
document sets. It’s easier to mobilize 10                                                          communities, the number of ontology engi-
percent of a small but focused community                                                           neers required increases as the log of the user
than 10 percent of the general populace—                                                           community’s size. The amount of building
these early adopters are critical.                  The ontologies that will furnish               time increases as the square of the number of
   It’s also instructive to consider typical                                                       engineers. These are naïve but reasonable
Semantic Web projects of the past five years.       the semantics for the Semantic                 assumptions for a basic model. The conse-
They demonstrate a distinctive set of charac-                                                      quence is that the effort involved per user in
teristics. Typically, they generate new ontolo-     Web must be developed,                         building ontologies for large communities
gies for the application domain—whether it’s                                                       gets very small very quickly.10
information management in breast diseases8
or computer science research.9 They either
                                                    managed, and endorsed by                          Not all ontologies have the same charac-
                                                                                                   teristics and, in general, we can distinguish
import legacy data or else harvest and rede-                                                       deep from shallow ontologies. Deep ontolo-
posit it into a single, large repository. Then
                                                    practice communities.                          gies are often those encountered in science
they carry out inference on the RDF graphs                                                         and engineering, where considerable efforts
held within the repositories and represent                                                         go into building and developing the concep-
the information using a custom-developed          communities. Whether the subject is mete-        tualization. For domains such as proteomics
interface.                                        orology or bank transactions, proteins or        and medicine, the ontology is in a very real
   These projects have been important prov-       engine parts, we need concept definitions        sense the data of interest. This becomes
ing grounds for a number of techniques and        that we can use.                                 apparent when we use an ontology to clas-
methods. They show how to facilitate har-            Although some denotations are more            sify complex sets of properties as constitut-
vesting and semantic integration by using         persistent than others, we must recognize        ing certain sorts of object.
ontologies as mediators. They have served         that they aren’t fixed over all time. Even          Shallow ontologies comprise relatively
as a development context for RDF stores           terms used to classify medical diseases          few unchanging terms that organize very
and a whole range of important Semantic           change as new procedures and understand-         large amounts of data—for example, terms
Web middleware. In general, however, they         ing emerges. We need to regard such ontolo-      such as customer, account number, and
lack real viral uptake. Moreover, in most         gies as living structures. Some might endure     overdraft used in banking and financial
cases, we aren’t able to look up a URI and        over long periods—for example, terms             contexts or the basic relations that define
have the data returned. The data exposure         describing the elements of the periodic          geospatial information. Some might argue that
revolution has not yet happened.                  table. Others are much more volatile: the        we’ve spent rather too much time extolling
   URIs provide our symbol grounding in           18th-century concept of phlogiston doesn’t       the virtues of deep ontologies at the expense
the Web. An RDF triple as a triple of URIs        have a place in a modern ontology of chem-       of the shallow ones that deliver very large
should dereference to terms whose mean-           istry, but it was once thought to be essential   amounts of reusable data. Shallow ontolo-
ings are defined in ontologies. Often, how-       to explaining combustion and other chemi-        gies require effort but over much simpler
ever, the URIs refer to objects that aren’t so    cal reactions. Communities and practice          sets of terms and relations.
defined.                                          will change norms, conceptualizations, and
   Consider a life science example: Uniprot       terminologies in complex and sociologi-          Folksonomies: Web-scale tagging
(www.ebi.uniprot.org/index.shtml) is the          cally subtle ways. We shouldn’t be surprised        The complexity of deep ontologies has
world’s most comprehensive set of data-           or attempt to resist these reformulations.       led some to eschew ontologies altogether in

MAY/JUNE 2006                                             www.computer.org/intelligent                                                               99
favor of a different approach. Folksonomies       potential tasks in a domain, or to the opera-       This next wave of data ubiquity will pre-
  are a development generating considerable         tion of context.14 This perception might be      sent us with substantial research challenges.
  interest at the moment. They represent a          related to the idea of developing a single       How do we effectively query huge numbers
  structure that emerges organically when           consistent Ontology of Everything—like           of decentralized information repositories of
  individuals manage their own information          Cyc,15 for example. Such a wide-ranging          varying scales? How do we align and map
  requirements. Folksonomies arise when a           and all-encompassing ontology might well         between ontologies? How do we construct
  large number of people are interested in          have interesting applications, but it clearly    a Semantic Web browser that effectively
  particular information and are encouraged         won’t scale and its use can’t be enforced.       visualizes and navigates the huge connected
  to describe it—or tag it (they may tag self-         If the Semantic Web is seen as requiring      RDF graph? How do we establish trust and
  ishly to organize their own content retrieval     widespread buy-in to a particular point of       provenance of the content?
  or altruistically to help others). Rather than    view, then it’s understandable that emer-           Provenance—that is, the when, where, and
  a centralized form of classification, users       gent structures like folksonomies begin to       conditions under which data originated—has
  can assign keywords to documents or other         seem more attractive.16 But this isn’t a         become a key requirement in a range of appli-
  information sources.                              Semantic Web requirement. Ontologies             cations. We might well need the help of
     Well-known examples of applications            are a rationalization of actual data-sharing     researchers in areas as diverse as social
  that harness and exploit tagging are Flickr       practice. We can and do interact, and we do      network analysis17,18 and epidemiology to
  (www.flickr.com, a photography publica-           it without achieving or attempting to achieve    understand how information and concepts
  tion and sharing site) and del.icio.us (http://   global consistency and coverage. Ontologies      spread on the Web and how to establish
  del.icio.us, a site for sharing bookmarks).       are a means to make an explicit commitment       their provenance and trustworthiness.
  These applications, driven by decentralized       to shared meaning among an interested com-          We must not lose sight of the fact that
  communities from the bottom up, are some-                                                          the Web, and indeed many of our most
  times called Web 2.0 or social software.                                                           important digital environments, depends
     Tagging on a Web scale is certainly an                                                          fundamentally on certain general assump-
  interesting development. It provides a poten-       Ontologies are attempts to more                tions about social behavior. The Web relies
  tial source of metadata. The folksonomies                                                          on people serving useful content; it relies
  that emerge are a variant on keyword searches.      carefully define parts of the                  on content generally being on the end of
  They’re an interesting emergent attempt at                                                         links. We also require that people observe
  information retrieval. But folksonomies             data world and to allow                        copyright rules. Creative Commons (www.
  serve very different purposes from ontolo-                                                         creativecommons.org) is an RDF-based
  gies. Ontologies are attempts to more care-
  fully define parts of the data world and to
                                                      interactions between data                      representation of copyright policy to facili-
                                                                                                     tate and maximize appropriate reuse.
  allow mappings and interactions between                                                            Policy-aware research takes this further,
  data held in different formats. Ontologies
                                                      held in different formats.                     attempting to express the civic rules of
  refer by virtue of URIs; tags use words.                                                           behavior expected in a Semantic Web
  Ontologies are defined through a careful,                                                          environment.
  explicit process that attempts to remove          munity, but anyone can use these ontologies         The critical factors that led to the Web’s
  ambiguity. The definition of a tag is a loose     to describe their own data. Similarly, anyone    success will also be important to the suc-
  and implicit process where ambiguity might        can extend or reuse elements of an ontology      cess of our Semantic Web enterprise. As
  well remain. The inferential process applied      if they so wish.                                 we’ve seen, some of these factors are social;
  to ontologies is logic based and uses opera-                                                       others have their origin in elementary and
  tions such as join. The inferential process       The next wave                                    fundamental design decisions about the
  used on tags is statistical in nature and            The Semantic Web we aspire to makes           Web’s architectural principles. For exam-
  employs techniques such as clustering.            substantial reuse of existing ontologies and     ple, the URL concept embodied the princi-
     This doesn’t mean that tags will always        data. It’s a linked information space in which   ple that every Web address is equal and all
  replace shallow ontologies. Where a per-          data is being enriched and added. It lets        content one jump away. Other critical fea-
  ceived need for ontologies exists, light-         users engage in the sort of serendipitous        tures included the ability to let links fail (the
  weight but powerful ones do emerge and            reuse and discovery of related information       404 error).
  are widely used. Two examples are Friend-         that’s been a hallmark of viral Web uptake.         A great deal of the success relates to
  of-a-Friend11 and associated applications         We already see an increasing need and a          what we might call the ladder of authority.
  such as Flink.12 This fits in general with        rising obligation for people and organiza-       This is the sequence of specifications (URI,
  calls for the dual and complementary              tions to make their data available. This is      HTTP, RDF, ontology, and so on) and reg-
  development of Semantic Web technologies          driven by the imperatives of collaborative       isters (URI scheme, MIME Internet content
  and technologies that exploit the Web’s self-     science, by commercial incentives such as        type, and so on), which provide a means for
  organization.13                                   making product details available, and by         a construct such as an ontology to derive
     Some people perceive ontologies as top-        regulatory requirements. We believe this         meaning from a URI. Another example is
  down, somewhat authoritarian constructs—          could bring about a revolution in how, for       the construction of a standards body that’s
  unrelated, or only tenuously related, to          example, scientific content is managed           been able to promote, develop, and deploy
  people’s actual practice, to the variety of       throughout its life cycle.                       open standards.

100                                                         www.computer.org/intelligent                               IEEE INTELLIGENT SYSTEMS
T     hese reflections lead us to ask how
we understand the present Web and what
                                                  data mining tools, approaches to inference,
                                                  ontological engineering and knowledge rep-
                                                  resentation. All of these are fundamental to
                                                                                                                                 Nigel Shadbolt is a
                                                                                                                                 professor of artificial
                                                                                                                                 intelligence in the
developments we anticipate. This is a deep        pursuing a Web Science agenda and realizing                                    School of Electronics
                                                                                                                                 and Computer Sci-
question, and we believe the history of sci-      the Semantic Web.                                                              ence at Southampton
ence has something to teach us here. There                                                                                       University. Contact
was a time when our understanding of the                                                                                         him at nrs@ecs.
world was either a purely philosophical,                                                                                         soton.ac.uk.
reflective exercise or else craft-based and
                                                  References                                                                      Tim Berners-Lee is
rooted in hard-won experience. Empirical                                                                                         the director of the
methods eventually gave rise to the branches       1. V. Bush, “As We May Think,” Atlantic                                       World Wide Web
of natural philosophy that became physics,            Monthly, vol. 176, no. 1, 1945, pp. 101–108.                               Consortium, a senior
chemistry, and biology. Traces of this legacy                                                                                    researcher at the
                                                   2. T. Berners-Lee, J. Hendler, and O. Lassila,                                Massachusetts Insti-
can still be found: until quite recently the                                                                                     tute of Technology’s
study of physics at Oxford was termed                 “The Semantic Web,” Scientific Am., May
                                                      2001, pp. 34–43.                                                           Computer Science
experimental philosophy. More recently,                                                                                          and Artificial Intelli-
areas that were once considered amenable           3. T. Berners-Lee, R.T. Fielding, and L. Masin-        gence Laboratory, and a professor of com-
                                                      ter, “Uniform Resource Identifier (URI):            puter science in the Department of Electron-
only to analytic thought—areas such as
                                                      Generic Syntax,” IETF RFP 3986 (standards           ics and Computer Science at Southampton
epistemology and logic—are to some                                                                        University. Contact him at timbl@w3.org.
                                                      track), Internet Eng. Task Force, Jan. 2005;
extent operationalized in computers and               www.ietf.org/rfc/rfc3986.txt.
computer infrastructures. Knowledge rep-                                                                                          Wendy Hall is a
resentation and ontology engineering are           4. J.A. Hendler, “Frequently Asked Questions                                   professor of com-
                                                      on W3C’s Web Ontology Language (OWL),”                                      puter science in the
about trying to capture aspects of shared                                                                                         School of Electronics
conceptualizations.                                   W3C, 2004; www.w3.org/2003/08/owlfaq.
                                                                                                                                  and Computer Sci-
   As we build ever more complex compu-            5. G. Shafer, “Causal Logic,” Proc. 13th Euro-                                 ence at Southampton
tational artifacts and information infrastruc-        pean Conf. Artificial Intelligence (ECAI 98),                               University. Contact
                                                      John Wiley & Sons, 1998, pp. 711–720;                                       her at wh@ecs.soton.
tures, we observe that large-scale behavior
                                                      www.glennshafer.com/assets/downloads/                                       ac.uk.
emanates from small-scale and local regu-
                                                      article62.pdf.
larity. We need engineering methods to
ensure that our structures conform to reli-        6. Z. Huang and H. Stuckenschmidt, “Reason-
able and repeatable design requirements.              ing with Multi-Version Ontologies: A Tem-
We need scientific analysis to understand             poral Logic Approach,” The Semantic Web—         13. G.W. Flake, D.M. Pennock, and D.C. Fain,
                                                      ISWC 2005: 4th Int’l Semantic Web Conf.,             “The Self-Organized Web: The Yin to the
and predict the behaviors that result. When                                                                Semantic Web’s Yang,” IEEE Intelligent Sys-
                                                      LNCS 3729, Springer, 2005; www.cs.vu.
we build new opportunities for interaction,           nl/~heiner/public/ISWC05a.pdf.                       tems, July/Aug. 2003, pp. 72–86.
we’re engaged simultaneously in a syn-
thetic and an analytic project. New rules of       7. K. Baclawski and T. Niu, Ontologies for          14. K. Sparck-Jones, “What’s New about the
interaction such as peer-to-peer protocols            Bioinformatics, MIT Press, 2005.                     Semantic Web? Some Questions,” SIGIR
                                                                                                           Forum, vol. 38, no. 2, 2004; www.acm.org/
result in new macro behaviors—behaviors            8. S. Dasmahapatra et al., “Facilitating Multi-         sigir/forum/2004D/sparck_jones_sigirforum_
we can exploit and also analyze. These                Disciplinary Knowledge-Based Support for             2004d.pdf.
micro rules can occur at different levels of          Breast Cancer Screening,” Int’l J. Healthcare
abstraction—the rules of Wikipedia are                Technology and Management, vol. 7, no. 5,        15. D.B. Lenat, “Cyc: A Large-Scale Investment
                                                      2006, pp. 403–420; http://eprints.ecs.soton.         in Knowledge Infrastructure,” Comm. ACM,
beguilingly simple but lead to overall                                                                     vol. 38, no. 11, 1995, pp. 32–38.
                                                      ac.uk/8955/01/ijhtm03-miakt.pdf.
coherence. Local-scale changes in Web
architectures and resources can lead to            9. N. Shadbolt et al., “CS AKTive Space, or         16. C. Shirky, “Ontology Is Overrated: Cate-
large-scale societal and technical effects.           How We Learned to Stop Worrying and Love             gories, Links and Tags,” 2005; www.shirky.
How so?                                               the Semantic Web,” IEEE Intelligent Systems,         com/writings/ontology_overrated.html.
                                                      vol. 19, no. 3, 2004, pp. 41–47.
   We expect the developments, method-                                                                 17. D.J. Watts, P.S. Dodds, and M.E.J. Newman,
ologies, challenges, and techniques we’ve         10. T. Berners-Lee, “The Fractal Nature of the           “Identity and Search in Social Networks,” Sci-
discussed here to not only give rise to a             Web,” working draft, 1998–2005; www.w3.              ence, vol. 296, 2002, pp. 1302–1305.
Semantic Web but also contribute to a new             org/DesignIssues/Fractal.html.
                                                                                                       18. G. Kossinets and D.J. Watts, “Empirical
Web Science—a science that seeks to de-                                                                    Analysis of an Evolving Social Network,”
                                                  11. D. Brickley and L. Miller, “FOAF Vocabulary
velop, deploy, and understand distributed             Specification,” working draft, 2005, http://         Science, vol. 311, 2006, pp. 88–90.
information systems, systems of humans                xmlns.com/foaf/0.1.
and machines, operating on a global scale.
AI will be one of the contributing disciplines.   12. P. Mika, “Flink: Semantic Web Technology
                                                      for the Extraction and Analysis of Social Net-
AI has already given us functional and logic          works,” J. Web Semantics, vol. 3, no. 2, 2005,   For more information on this or any other com-
programming methods, ways to understand               pp. 211–223; www.websemanticsjournal.org/        puting topic, please visit our Digital Library at
distributed systems, pattern detection and            ps/pub/2005-20.                                  www.computer.org/publications/dlib.

MAY/JUNE 2006                                              www.computer.org/intelligent                                                                    101
You can also read