How to Build an RDF Based Wiki

Page created by Harvey Peters
 
CONTINUE READING
Institut für Informatik
Lehr- und Forschungseinheit
für Programmierung
und Softwaretechnik

                     Fortgeschrittenenpraktikum

     How to Build an RDF Based Wiki

                  Walter Christian Kammergruber

             Aufgabensteller:   Professor Dr. Martin Wirsing
             Betreuer:          Axel Rausmayer
             Abgabetermin:      Mai 2006
Zusammenfassung
Es ist ein verbreiteter Trugschluss, ein Wiki nur als eine Ansammlung von Textdokumenten zu
betrachten. Eine Wikiseite ist mehr als nur ASCII-Text: Auf der einen Seite gibt es implizite
Daten, die mit beschreibenden Text verquirlt, und zudem an anderer Stelle gespeichert sind. Dies
führt zu Konsitenzproblemen. Auf der anderen Seite gibt es Metadaten über Wikiseiten, die
mit bisherigen Wikiansätzen nicht zufriedenstellend verwaltet werden können. Wir zeigen einen
Ansatz, bei dem die Daten und Metadaten in einer RDF Datenbank gepeichert und gehandhabt
werden. Dabei können Duplikationen vermieden werden. Zudem werden verschiedene Ansichten
auf die Daten möglich. Wegen der Verwendung von RDF, ein weit unterstüzter Standard, können
externe Datenquellen in die RDF Datenbank einbezogen werden. Wir zeigen zudem einen neuen
Ansatz für eine Wikisyntax, eine Sprache mit Namen WITL. Bei WITL wird nicht ein ’search
and replace’ Stil verwendet um den Text zu rendern, sondern ein Syntaxbaum, der mittels einer
LL(k)-Grammatik definiert ist, wird erzeugt und ausgewertet, um das gewollte Ausgabeformat zu
generieren.

Abstract
It is a common fallacy to see a Wiki as just a collection of text documents. It is a network of
information. A wiki page is more than just ASCII text: On the one hand there is a lot of implicit
data tangled with descriptive text that is often a duplication of other data stored elsewhere. This
duplication leads to consistency problems. On the other hand, there is meta-data about Wiki
pages (such as their name or author) that currently cannot be properly managed. We show an
approach, where this data and meta data is stored and managed by an RDF database. This
prevents duplication and allows us to publish different views on the same data. Additionally,
because of using RDF as a widely supported standard, one can also add data from external sources
to the database.
   We also show a new Wiki syntax approach, a language called WITL. In WITL we do not use a
search and replace mechanism for rendering the text written in Wiki style but we build an abstract
syntax tree defined by a LL(k) grammar and walk this tree to generate the desired output format.
Contents

Contents
1 Introduction                                                                                                                                         4
  1.1 What Is Yet Another Wiki Good For?          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4
  1.2 About RDF . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
       1.2.1 RDF as a Graph . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
       1.2.2 RDF as Triples . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
       1.2.3 Further Advantages . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6

2 WITL: A Syntax for the Wiki                                                                                                                          7
  2.1 Wiki Markup . . . . . . . . . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
  2.2 WITL Syntax . . . . . . . . . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
      2.2.1 WITL Syntax explained by Illustrative Examples                                    .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
  2.3 Using ANTLR for Parsing WITL Code . . . . . . . . . .                                   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      2.3.1 The Lexer . . . . . . . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
      2.3.2 The Parser . . . . . . . . . . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  2.4 Rendering with WITL . . . . . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      2.4.1 Evaluating the Abstract Syntax Tree . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      2.4.2 Defining Functions . . . . . . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   13

3 Wikked Architecture                                                                                                                                 15
  3.1 Three Layer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                        15
  3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                      15

4 Scenario: A Bookmark Collection Web Site                                                                                                            18
  4.1 Motivation . . . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
  4.2 The Wikked Three Layer Model in Practice                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      4.2.1 Daniel the Data Manager . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      4.2.2 Alice the Author . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      4.2.3 Wayland the Web Designer . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19

5 Related Work                                                                                                                                        23

6 Future Research                                                                                                                                     23

7 Conclusion                                                                                                                                          23

                                                                                                                                                       3
1   Introduction

1 Introduction
1.1 What Is Yet Another Wiki Good For?
When you write an entry in a Wiki, you just type some lines of text in some Wiki syntax. You
may include links to other entries, books, web pages, images, articles, newspapers and so on. But
you do not have a handy way to access general data like your bookmark collection or your favorite
songs. You can simply quote them, but what happens when your taste of music changes? You
have to update all your entries. Therefore it would be nice to have the information separated from
the text. As an example we show how you can include your bookmark collection in your Wiki. In
general, you can see our Wiki as some kind of RDF 1 browser. Everything is encoded in RDF and
the Wiki is just a tool to make the information visible.
   Because we use RDF as a quite popular form for managing data structures, you can also include
or import data from external sources based on RDF. These sources could be for example the open
directory project created by the Netscape Communications Corporation [9] and its subprojects
for music or restaurants. There is also an increasing number of tools concerning RDF, e. g for
extracting information out of web pages. RDF at the bottom of the Semantic Web is a very
promising approach for modeling data structures with an increasing number of applications.
   The primary goal of our wiki called Wikked is that it can display any data encoded in RDF.
The point of using RDF for managing information is to have the possibility to state relations
between resources in a practical way. For example when an entry links to an other one, you can
formulate this fact with a statement such as “entry A links to entry B”. A graph representation of
such statements can be seen on Fig. 1 on the example of an address book entry. You can access
these pieces of information easily by formulating queries, e.q. by using a querying language like
RDQL[15]. These queries are an elegant way to get the information you want.
   Storing and querying meta data can become quite complex. There are some handmade solutions
for attaching meta data to Wiki entries but all of them seem to be just a makeshift. RDF is highly
developed and there are many tools for it available. By expressing the information in RDF we try
to see everything as a node in a network. You can connect or “label” everything with anything.
An entry has a creator and the creator has linked nodes or properties like names, addresses, phone
numbers and so on2 .
   A big issue for Wikis is also the Wiki syntax. In most Wikis, there are regular expressions
used for searching and replacing patterns, e. g. *bold* stands for bold and there are blocks with
macros for extra functionality.
   In WITL, we just differ between functions and text. A Wiki document then consists of a sequence
of text and functions. A function may also have nested functions and nested text, i.e. when the
document is parsed, the result is an abstract syntax tree. To keep the idea behind traditional Wiki
syntax namely being able to allow fast-paced editing we integrated Syntactic Sugar or call it wiki
markup, e. g. **bold** for bold. This syntactic sugar is replaced by the constructs of WITL
before the actual parsing of the Wiki document is done. Because of the wiki markup, users with
no experience in programming languages can use Wikked as well. They just can pick out a few
functions they want to use and omit the additional functionality.
   On the one hand, WITL is used as language additional to the wiki markup. On the other
hand WITL can also be used as template language for designing the web page of the wiki itself.
It has dynamical and static ways to include constructs ‘normal’ programming languages use but
it is designed for working with heterogeneous data, especially text. Because of this approach,
Wikked does not depend on template engines such as velocity from the apache jakarta project
[1], JavaServer Pages [16] or stringtemplate from Terence Parr [12]. The templates for the pages
 1 RDF    stands for Resource Description Format. For details see section 1.2.
 2 You   may have a look at the vCard MIME Directory Profile from the networking group [4] for more examples.

                                                                                                                4
1   Introduction

                                    http://www.w3.org/2000/10/swap/pim/contact#fullName

                                                                                       John Doe
                http://www.w3.org/1999/02/22-rdf-syntax-ns#type

                                   http://www.w3.org/2000/10/swap/pim/contact#Person

Figure 1: This small RDF graph encodes part of an address book entry. The entry itself is the
         blank node in the top left corner, it contains the full name of the contact and has the
         RDF type http://www.w3.org/2000/10/swap/pim/contact#Person. Both edges are
         drawn with a URI label, but actually they are labeled with a node. Therefore, it could
         also be a blank node.

in Wikked can also be edited in Wikked, because both the templates and the entries are pages
in Wikked. As a consequence of preprocessing wiki markup, generating templates can be done
quickly and without the need of writing too much HTML.

1.2 About RDF
For our purposes, we view RDF as a universal data structure that stores graph-based information.
Its main virtue is its simplicity and its ability to integrate external data.

1.2.1 RDF as a Graph
Similar to usual graph definitions, an RDF graph consists of
    • Nodes: There are two kinds of nodes.

         – Resources: these are the main building blocks of the graph. If a node should be publicly
           addressable, it gets a URI as an identifier. If not, it remains anonymous and is called a
           blank node.
         – Literals: Resources can also be viewed as potentially containing URIs. In order to add
           arbitrary data to RDF, one attaches a literal to a resource. Literals can only exist in
           that role, as the target of an edge. They contain a text string.

    • Edges: edges are binary and directed, connecting a source and a target node. Every edge
      has a node as its type; a label, if you will.
   Fig. 1 shows how these basic constructs are used to build an RDF graph. Note how URIs are
used as a convenient public namespace for identifying nodes. But they can also be viewed as
references to entities on the Web (including the local file system). Constructing URIs as a code
for non-Web entities, one can also put references to real-world “objects” into an RDF graph. For
example, creating a resource with the URI mailto:john@doe.com and attaching other nodes to it
can mean3 that we are describing the real-world person John Doe.

3 Naturally,   this is a matter of semantics, of how a group of people agrees on interpreting a particular RDF graph.

                                                                                                                   5
1   Introduction

1.2.2 RDF as Triples
Internally, RDF is usually stored as a set of triples. Each triple is an edge and has the components
subject, predicate and object. The names of the components already hint at the fact that with
each triple we are making an assertion. That is why triples are also sometimes called statements.
The subject is the edge source and must be a resource. The predicate is the edge label and also
has to be a resource. The object is the edge target and can either be a resource or a literal. A
consequence of RDF only knowing about edges is that there cannot be single nodes. In practice,
this is not a problem, because RDF is so fine-grained that single resources are rarely enough for
expressing anything meaningful. The following shows how the RDF triples of the example in Fig. 1
are expressed as plain text. Note that in this format, blank nodes also need an identifier (which is
internal only).

_:1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
     http://www.w3.org/2000/10/swap/pim/contact#Person .
_:1 http://www.w3.org/2000/10/swap/pim/contact#fullName
     "John Doe" .

1.2.3 Further Advantages
Fortunately, the basics of RDF are simple. Further possibilities such as reification4 of statements,
schema languages, inferencing etc. are outlined in the RDF Primer [8]. Many aspects of RDF,
such as merging of graphs and multi-dimensionality, make it very useful for software engineering
applications [14].

 4 Reifying a statement means making it available as a resource. Without reification, one can only annotate edge
    types, but not the instances of a type.

                                                                                                              6
2 WITL: A Syntax for the Wiki

2 WITL: A Syntax for the Wiki
Finding the right syntax for a Wiki is a very important issue because user of the wiki have to
use it intensively. You can either resort to some existing rendering machines or create a new one.
We tried to use Radeox [5] as a rendering engine, but we quickly reached its limitations. Adding
extra functionality beyond the wiki markup and makros led to awkward constructs and unclean
code design. For example the links to other Wiki pages are in Radeox written with brackets, e.g.
[someTitle] creates a link to a Wiki page with title: “someTitle”. We wanted to identify Wiki
pages by means other than their title 5 . They should be unique and independent from the title.
The identifier being independent from the title brings also the advantage that when a title changes,
the identifier is untouched. We wanted to distinguish between both sort of links, i.e. wiki links to
unique ids and also links to titles that refer either to a set of entries or to an explicit entry. That
differentiation in Radeox would just be possible with some makeshifts. Because of being unhappy
with exiting solutions we finally decided to create a new and simple syntax with a formal semantic.
There are two syntactic concepts involved. Thereby we differentiate between wiki markup and our
wiki language called WITL. The syntax of both languages are described in the following sections.

2.1 Wiki Markup
         “A Wiki website is a hypertext on steroids.” (Lars Aronsson [2])

   Writing wiki entries as easy as possible is a key issue of wiki markup, especially because typical
HTML source code makes the actual text content very hard to read and edit for most users6 .
Promoting plain-text editing with a few simple conventions for structure and style is therefore
advisable.
   Keeping that in mind, we introduced a simple wiki markup. Wiki markup parsing is a separate
text to text transformation that is performed before the actual WITL parsing. The wiki markup
is compatible with WITL, because WITL has very few special symbols. Fig. 2 shows the used
wiki markup.

2.2 WITL Syntax
Our syntax emerged of some aesthetic and practical considerations. We just started with setting
a link in curly brackets, e. g. {http://www.ifi.lmu.de} or {ftp://ftp.leo.org}. Because of
the structure of a URI (see RFC:3986 [3] for details), which is defined as a scheme name followed
by a colon and other chars, it is straightforward to see the scheme as a function name and the
characters behind the colon as an argument. You can consider this construct as a function with the
signature function(String schemeName, String arg), e. g. {http://www.ifi.lmu.de} stands
for function("http", "//www.ifi.lmu.de").
   A function in most cases does not only need one but several arguments and therefore we defined
a separator between the arguments. We chose “ ˆ ”, because “ ˆ ” is not allowed in a URI, rarely
used in text and not need for the wiki markup.
   In WITL we also have blocks with unparsed text. Unparsed blocks are convenient for writing
longer text parts where reserved charaters may occur and it is not desired that every single character
is “escaped”. These verbatim text blocks are surrounded by tripled brackets, e. g. [[[ some
unparsed text ]]].
   Abbreviated abstract syntax:

 5 The  title of an entry is not a good identifier. Just consider a title like “Hub”. As ”Hub” there might be a lot of
    entries with the same title.
 6 This fact is nothing new but worth mention.

                                                                                                                    7
2 WITL: A Syntax for the Wiki

                        Markup                                             Result
    ==== Title Level 1 ====
                                                  Title Level 1
    === Title Level 2 ===                         Title Level 2
    == Title Level 3 ==                           Title Level 3
    = Title Level 4 =                             Title Level 4
    **bold**, ~~italic~~, ''teletype''            bold, italic, teletype
      tabs lead to quoted paragraphs                       tabs lead to quoted paragraphs
    % comments start with a percent sign
    for http:                                     for http:
    {http://www.ifi.lmu.de} creates a link to     http://www.ifi.lmu.de creates a link to
    "http://www.ifi.lmu.de"                       "http://www.ifi.lmu.de"
    - indent opens sublist                                 indent opens sublist and this text is in the
       and this text is in the same item                   same item
    + numbered sublist at same level
                                                     1. numbered sublist at same level
    Paragraphs are separated by blank lines       Paragraphs are separated by blank lines

    This is a new paragraph .                     This is a new paragraph .
      tabs lead to quoted paragraphs                       tabs lead to quoted paragraphs
    : Definition list                             Definition list
      A list with terms                                  A list with terms
    : Start term with colon                       Start term with colon
    || Table Heading | Table Heading |
                                                     Table Heading              Table Heading
    | one            |two            |
    | three          |four           |             one                        two
                                                   three                      four

    {wikked.link:http://page1.com^page one}       page one creates a link to "http://page1.com" but
    creates a link to "http://page1.com" but      displays "page one"
    displays "page one"
    The second argument can be left out:          The second argument can be left out:
    {wikked.link:http://page1.com^page one}       page one creates a link to "http://page1.com"
    creates a link to "http://page1.com"

                          Figure 2: This figure displays the used wiki markup.

                                                                                                          8
2 WITL: A Syntax for the Wiki

 seq                ::= (node)* .
 node               ::= (plain | verbatim | funapp) .
 funapp             ::= ‘{’ ( (functionName (‘:’ seq ( ‘ ˆ’ seq ) *) ? )
                      | (‘*’ (.)* ‘*’)
                      | (‘!’ (.)* )
                      | (‘?’ (.)* ) )? ‘}’ .
 functionName       ::= ([a-z]|[A-Z]|‘_’|‘$’) ([a-z]|[A-Z]|‘_’|[0-9]|‘$’|‘+’|‘-’|‘.’)*.
 plain              ::= all characters except ’{’ and ’}’ .
 verbatim           ::= "[[[" (.) * "]]]" .

2.2.1 WITL Syntax explained by Illustrative Examples
Most of the text is considered “plain text”, except for the following constructs:
   • {h1:Heading on level one} a wikked function, most HTML commands are defined
   • {http://foo.com}, {ftp:www.leo.org} standard URLs fit naturally as links into the syntax
     scheme
   • [[[verbatim text no functions are evaluated]]] “verbatim” or unescaped text can be
     used to insert HTML (e.g. in web page templates) or source code (e.g. Java code for
     documentation)
   •    {* Comments *} produce no output
Variables:
   • {?var}gets a value (unevaluated!)
   • {:var}, gets an evaluated value
   • {!var^value} sets a value
Attributes:
   • {?var^attrib}gets a value (unevaluated!)
   • {:var}, gets an evaluated value
   • {!var^value} sets an attribute to an value
{?}, {:} and {!} functions are syntactic sugar and could be expressed by {get}, {getEval} and
{set}.

2.3 Using ANTLR for Parsing WITL Code
For parsing WITL code we use ANTLR [11], primarily developed by Terence Parr as a parser
generator. On its web site it is described the following way:
          “ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language
       tool that provides a framework for constructing recognizers, compilers, and translators
       from grammatical descriptions containing Java, C#, Python, or C++ actions. ANTLR
       is popular because it is easy to understand, powerful, flexible, generates human-readable
       output, and comes with complete source. ANTLR provides excellent support for tree
       construction, tree walking, and translation. There are currently over 5000 ANTLR
       source downloads a month.” (from: http://www.antlr.org/about.html visited on
       03/16/2006)

                                                                                                   9
2 WITL: A Syntax for the Wiki

       ANTLR creates three main parts for language recognition:
        • A Lexer that divides the input in token classes.

        • A Parser that evaluates a token stream according to specified rules.
        • A TreeWalker that evaluates a syntax tree.
       For our purpose, we only use the lexer and the parser. We do not define a TreeWalker, because
     we have our own specialized implementation of a syntax tree and do not use the default interfaces
     supplied with ANTLR. Even if we used the interfaces of ANTLR for syntax trees, a TreeWalker
     would be to inflexible for the evaluation.

     2.3.1 The Lexer
     In this section we will present the shortened employed lexer grammar. The lexer is used to split
     up the WITL code in token classes that can be processed by the Parser on a higher level of
     abstraction. ANTLR uses a syntax similar to EBNF. Elements and characteristics of the target
     language - here java - can be included.

                                               Listing 1: Lexer definition
     class WitlLexer extends Lexer ;
         /∗ The o p e n i n g t a g f o r a l l f u n c t i o n s such a s
3         ∗ {: , { ! , { ? , { : o r s p e c i a l a r i t i e s {} and
          ∗ { ( no c o l l o n ) ∗/
         OTAG : ( options { generateAmbigWarnings=false ; } :
6                   ’{’’}’ { $setType ( EMPTY_FUNCTION ) ; }
                 | "{:" !
                 | ( // u s e s s y n t a c t i c p r e d i c a t e t o d i s t i n g u i s h
9                      // between ’{ ’ < functionName > ’ : ’
                       // and ’{ ’ < functionName > ’} ’
                  ( ’{’ ! ( COMNAME ) ’:’ ! ) => ’{’ ! ( COMNAME ) ’:’ !
12                      | ’{’ ! COMNAME ’}’ ! { $setType ( ZERO_ARG_FUNCTION ) ; } )
                 | "{!" ! // f o r ”{ s e t : ”
                 | "{?" ! // f o r ”{ g e t : ”
15       );

          /∗ C l o s i n g t a g f o r f u n c t i o n s . ∗/
18        CTAG : ’}’ ;

          /∗ {∗ . ∗} matches e v e r y t i n g i n b e t w e e n ∗/
21        COMMENTTAG : "{*" ! ( . ) ∗ "*}" ! ;
          /∗ [ [ [ . ] ] ] matches e v e r y t i n g i n b e t w e e n
           ∗ t r i p l e s q u a r e b r a c k e t ∗/
24        VERBATIM : "[[[" ! ( . ) ∗ "]]]" ! ;

          /∗ Used t o s e p e r a t e between arguments i n
27         ∗ a f u n c t i o n , e . g . { fun : a r g 1 ˆ a r g 2 } ∗/
          ARGSEP : "^" ;

30        /∗ e v e r y t h i n g e x c e p t s p e c i a l s i g n s   ∗/

                                                                                                   10
2 WITL: A Syntax for the Wiki

          TEXT : ( { LA ( 2 ) ! = ’]’ | | LA ( 3 ) ! = ’]’ }? ’]’ // LA = l o o k ahead
                   | { LA ( 2 ) ! = ’[’ | | LA ( 3 ) ! = ’[’ }? ’[’
33                 | CONTENT
                   | ESC )+ ;

36        /∗ Escaped s i g n s { , } , : , ˆ , [ , ] with l e a d i n g ’ / ’ ∗/
          protected ESC : "/" ! ( ’{’ | ’}’ | ’:’ | ’^’ | ’]’ | ’[’ ) ;

39        /∗ Allowed p l a i n c o n t e n t ∗/
          protected CONTENT : ˜ ( ’{’ | ’}’ | ’\n’ | ’\r’ | ’^’ | ’]’ | ’[’ ) ;

42        protected COMNAME : ( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’$’ )
                      ( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’0’ . . ’9’ | ’$’ | ’+’ | ’-’ | ’.’ ) ∗ ;

45        WS : ( ’ ’ | ’\t’ | ’\f’ ) ;
          NL : ( ’\n’ | "\r\n" | ’\r’ ) ;

       Listing 1 shows the source code of the lexer. We assume that the comments in the source code
     describe most token classes sufficiently. Some tricky parts however, are worth being mentioned
     explicitly:
        • Functions are split up in an opening tag (line 13) and a closing tag. To distinguish between
          ’{’’:’ and ’{’’}’ ANTLR has to first look whether it
          finds an opening curly bracket followed by a functionName, followed by a colon. If it does
          not find a colon, it has to fall back and look for a closing curly bracket, i.e. it has to
          differentiate if the function has zero arguments or one or more.
        • The TEXT token definition uses a look ahead of two and three to find out if a [ or ] belongs
          to a VERBATIM token class.

     2.3.2 The Parser
     The parser code in ANTLR is analog to the lexer code. It is very similar to EBNF but with
     features needed for processing with java. Listing 2 shows a very simplified extraction of the parser
     code.

                                         Listing 2: Parser definition
     class WitlParser extends Parser ;

3         /∗ Returns t h e r o o t o f t h e WITL document . ∗/
          witl returns [ Sequence seq ] : ( node ) ∗ ;

6         /∗ g e n e r a l node ∗/
          private node returns [ WitlExpression node ] :
          ( plain | function | getCommand | setCommand | comment ) ;

          /∗ f u n c t i o n node { functionName : w i t l t e x t } ,
           ∗ { functionName } , {} ∗/
12        private function returns [ WitlExpression com ] :
               ( OTAG witl ( ARGSEP witl ) ∗ CTAG
               | EMPTY_FUNCTION | ZERO_ARG_FUNCTION ) ;

                                                                                                      11
2 WITL: A Syntax for the Wiki

          /∗ f u n c t i o n node {? w i t l t e x t } f o r G e t t e r ∗/
          private getCommand returns [ WitlExpression com ] :
18             GET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ;

          /∗ f u n c t i o n node { ! w i t l t e x t } f o r S e t t e r ∗/
21        private setCommand returns [ WitlExpression com ] :
               SET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ;

24        /∗ comment node {∗ my comment ∗} ∗/
          private comment returns [ WitlExpression node ] : COMMENTTAG ;

27        /∗ p l a i n t e x t o r verbatim t e x t ∗/
          private plain returns [ WitlExpression node ] : text | verbatim ;

30        private text returns [ String text ] : TEXT | NL | WS ;

          private verbatim returns [ String text ] : VERBATIM ;

       Some remarks:
        • Tokens delivered from the Lexer are in uppercase letters, e.g. VERBATIM.
        • Parser rules are in lowercase or mixed cases an may have a return value of any java class.
          WitlExpression and Sequence are two examples of java classes.

     2.4 Rendering with WITL
     2.4.1 Evaluating the Abstract Syntax Tree
     While building a syntax tree seems to be a nice gadget, you want to display some text in a certain
     target language. In the case of our Wiki mostly HTML but also LATEX is imaginable. A syntax tree
     for itself is just a structure and insensitive to the grammar it has been produced by. Extracting
     the information expressed by the tree is an important issue. The evaluation of markup functions
     is very easy and can be done by implementing some hard coded functions, but WITL has more to
     offer. You can also define your own functions as shown on Fig. 3 in the Wiki text. The evaluation
     then becomes more complex. We define a general evaluation strategy that can handle these cases.
     The evaluation of a tree is done by evaluating every node of the tree. The result is yet again a
     tree, but very often, it now consists of only text nodes. The final evaluation step is to flatten the
     tree via pre-order traversal to produce HTML output.
        Below, we give a more formal definition of evaluation.

                                  EBNF name         term signature
                                  seq               seq hhead, taili
                                  funapp            funapp hvarname, argi
                                  plain             plainhtexti

       The evaluation function has the signature

                                      eval : Tree × Text → Tree × Text

                                                                                                       12
2 WITL: A Syntax for the Wiki

The type Tree is defined as

                                 Tree   =   funapphVarname, Treei
                                        |   plainhTexti
                                        |   seqhTree, Sequencei

In the definition of eval, we use the type Text for storing variable bindings. Lookup is done via
function application where the instance bindings is viewed as having the signature

                                    bindings : Varname → Tree

Adding new bindings views bindings as a sequence of (varname, tree) pairs to which the . operator
prepends a pair. Lookup is always performed starting with the first element, returning the first
match if one is found.
  The evaluation of each possible alternative of a tree is defined as follows.
   • Sequence of Nodes:

                       eval(seq hhead, taili , bindings) =
                            seq result, eval(tail, bindings 0 )
                                 where (result, bindings 0 ) = eval(head, bindings)
                       eval(seq hi , bindings) = seq hi

   • Function Application: Without loss of generality, we define the function application just for
     one argument. Note that the function definition is stored as a function application whose
     functor part is called fun.

                              eval(funapp hfunname, argi , bindings) =
                                   eval(body, (param, arg) . bindings)
                                        where funapp hfun, param, bodyi =
                                                  bindings(funname)

   • Plain Text:
                          eval(plainhtexti, bindings) = (plainhtexti, bindings)

Except for the evaluation of sequences, this evaluation strategy is typical for functional languages
with lazy evaluation.

2.4.2 Defining Functions
The primary goal behind using WITL as a language for both entries and templates is that you
can define your own functions in the text as well as using hard coded functions. This fact makes
WITL convenient but also flexible. We will demonstrate the mechanism with a short example:
  For defining your own bold function you can use code like this:

{ def : {b : content } ˆ
           [ [ [  ] ] ]
          { var : content }
           [ [ [  ] ] ]
}

The function def is a predefined function and has two arguments:

                                                                                                 13
2 WITL: A Syntax for the Wiki

  1. The function to define: {b: content}
     The function has a name ”b” and one argument with variable name ”content”.
  2. The body of the function as second argument:
     There are HTML tags in triple brackets and between them, there is a predefined function
     var that tells the interpreter to fetch and evaluate its argument.

When the function application is evaluated, e. g. {b:someBold}, first the function definition of b
is looked up and the variable content is bound to ”someBold”. Then the definition of function
b is evaluated. In this definition there is a function var embedded. The function var tells the
interpreter to lookup its argument ”content” in the variable bindings and to insert the result in the
text at the position itself occurred. As result {b:someBold} is replaced by ” someBold ”
   Fig. 3 shows the syntax tree for the example above. Round nodes are functions such as
{b:content}. The only empty round node is a sequence. It is a container for the nodes in
the second argument of def. The square nodes are just for the plain text. In addition to def,
eval and var, WITL also provides further lisp-inspired functions such as {list:} and {for:}.

                                                       content
                                         b

                       def                               

                                                         var            content

Figure 3: This syntax tree is a graphical representation of the definition for a bold function. Round
          nodes are functions, the empty round node is a sequence of nodes and the square nodes
          are plain text.

                                                                                                  14
3 Wikked Architecture

3 Wikked Architecture
3.1 Three Layer Architecture
In order to reduce complexity, we created an architecture with three layers as shown in Fig. 4.
This is just an abstraction to give a bird’s-eye view on the system.
   The data layer is used to maintain most of the data to which the to layers above can refer. Thus
it is some kind of service layer, which is reliable for creating, updating and deleting data when
possible.
   The presentation layer is the layer in the middle. This is the level where entries are situated.
An entry from the author’s view stands for the traditional Wiki page. It is the actual Wiki text
extended by the possibility of adding other information like the used language or date of creation.
When writing an entry, an author can access the underlying data by including a query or just
referring to meta data for this entry in the Wiki text.
   The meta presentation layer is the layer on top. It is used for designing the web pages. There are
templates defined for creating a common look and feel. It is also used to create a certain view on
a template, because one might be interested in the text, title and date of creation for an entry and
you do not care about the language or other extra information. Somebody else, on the other hand
might be just interested in the title and creator, because he wants to create an overview of entries
that a friend of him has written. This top level gives the possibility to choose what information one
want to have displayed. The idea is to apply a template to an entry and by switching to another
template, the view can be changed while the entry stays the same. This can be done the other way
round. You can keep the template and change the entry. There may be also entries that do not
have a Wiki text but other properties. For example there might be an entry for a person and it is
desired to have some information about the person displayed. This can be easily done by using a
template that functions as a form which is filled with the data concerning the person.
                                                                     [[
                                                                 X HT [        R/x
                                                                                                ht m
                                                                                                     l1/
                                                                                                           N"        W3 C
                                                                                                                          //D
                                                                                                                              TD
                                                                                                                                   meta
                                 rel      
                                                                                                                                   layer
                                                  - -- l:                 actss  " /
                                                                               i s a>
                                               - -- li :                                wi k
                                           }       li:       on                              i g
                                                          ano e ite                              o od
                                     Wi                       the       m                                f or
                                  { i k is                         r i                                        ?
                                                                        te m
                               { li : co ll * *a r
                            bo r nk: h oba e* *
                                ati      t       r
                                     ve_ tp:// tive idly
                                         sof      e       s
                                                           w
                                                                    u
                                                                                                                                   presentation
                                              twa n.wik oftwa sed a
                                                  re}       i
                                                       } , p ed ia e
                                                            i.e
                                                                    r
                                                                     .
                                                                . t o rg /w
                                                                              s
                                                                                                                                   layer
                                                                     o w       i
                                                                          o r k ki/ Co
                                                                                in       l la

                                                                                                                                   data layer

Figure 4: A three layer architecture. On top: The meta presentation building the framework for
          the Wiki, i.e. the web page templates. In the middle: The presentation layer for authors
          to write their entries. On bottom: The data layer for managing information.

3.2 Components
This section gives a component oriented view of Wikked s architecture. Fig. 5 describes the
interaction of the components involved with Wikked:

                                                                                                                                                  15
3 Wikked Architecture

Figure 5: This figure displays the involved components of the Wikked wiki. On bottom there is
          the RDF database used by Hyena to maintain the data needed for Wikked, i.e. the wiki
          pages and other exploited data. Wikked itself runs as a servlet in a servlet container
          and uses hyena as a service to receive the desired information. The rendering engine is
          used to parse the WITL source code including wiki markup and then to generate the
          HTML output.

                                                                                              16
3 Wikked Architecture

   • Hyena:
     Hyena is used as a service to manage data concerning RDF. This includes the wiki pages,
     which are RDF nodes itself.
   • Wikked:
     Wikked runs as a servlet in a servlet container. It receives request from clients and returns
     the corresponding HTML pages. The HTML pages are generated from WITL source code
     by the Rendering Engine.
   • Rendering engine:
     The Rendering Engine includes a preprocessor for translating wiki markup to WITL. WITL
     then is translated to HTML.
This architecture can be seen as a implementation of the MVC pattern7 . Thereby the Wikked
servlet acts a controller which translates interactions with the view into actions to be performed
by the model. The model is the RDF graph managed by Hyena. The rendering engine at last is
the view. The view renders the contents of a model, i.e. it generates HTML output by applying a
template to wiki page.

7 Also   known as the Model 2 architecture.

                                                                                               17
4   Scenario: A Bookmark Collection Web Site

4 Scenario: A Bookmark Collection Web Site
4.1 Motivation
In Wiki documents you often want to refer to a data source instead of simply quoting it. In cases
like “list all my java books” or “list my ten favorite songs” it would be nice to have a mechanism
to query your database for this information. Your database in this case can be a text document,
an excel file or, as the name implies, some kind of relational database. With such a mechanism
you do not have to just copy and paste the text, for the plainest case. This is especially interesting
when data are subject to changes. You may buy some new java books or your taste in music may
vary. You also may want to apply a bit more advanced query conditions, e. g. “list my top ten brit
pop CDs from 1998 until 2006”. With your data managed by the RDF database, you can easily
refer to it with query functions.
   Why did we choose a example with bookmarks? We suggest that nearly everybody has a huge
collection of bookmarks, but when you have to search for something you ask google. When you
browse the web, you can also find a lot of web pages with only bookmarks on them. Nowadays, there
are tools such as del.icio.us 8 but the bookmark example is very simple and easy to understand.
Hence we show an application of our concept by a use case where three friends are creating a web
page for bookmark collections.

4.2 The Wikked Three Layer Model in Practice
The work for implementing the Wikked layer model can be split up to three persons, let us
call them Mathew, the data manager, Alice an author who presents her bookmarks and Wayland
the designer, who creates nice templates for the web pages. Each of them has their well defined
function. Mathew has to create an data base and create an API for accessing it. Alice has to write
articles and Wayland has to design templates.

4.2.1 Daniel the Data Manager
Daniel’s work is to import some bookmarks into the RDF database. A bookmark in our case
consists of a title, a URI and belongs to one or more categories. You can see a visual representation
of a bookmark for java.sun.com in Fig. 6. The round nodes are representing resources and oblong
nodes literal values. There are also two blank nodes called anonymous resources. The most left
one stands for the concept of “Sun’s Java home page”, the other one for the category with title
“Java”. In this very simple example there are only three categories: News, Java and Wikis. In
each category there are ten bookmarks, all having the same structure as the example of Fig. 6.

4.2.2 Alice the Author
Writing readable and informative entries is Alice’s job. In our example she only writes three entries,
one with her Java links, one with her News links and finally one with her Wiki links. Because she
wants to quote them, all of her bookmarks are imported into the database by Daniel earlier9 . Her
entry for Java my look like this:

These are my ˜˜ Java ˜˜ links :
{ queryTable : { book : cat : Java }
}

 8 http://del.icio.us/
 9 She   had sent them to Daniel per email earlier.

                                                                                                   18
4   Scenario: A Bookmark Collection Web Site

                                                                  hg:title      Java

                                                                  rdf:type
                             hg:category
                                                                             hg:category

                              hg:source    http://java.sun.com/

                               hg:title
                               rdf:type
                                            Java Technology

                                              hg:bookmark

Figure 6: A graph representation for an entry in a bookmark collection. To make the graph more
          readable the namespaces are abbreviated. This bookmark has a title:“Java Technology”,
          a source: “htp://java.sun.com”, a type:“hg:bookmark”and a category linked to a node
          with title:“Java” and type:“hg:category”.

This entry seems to be a little short, but for demonstration purposes it is sufficient. It shows the
main aspects of our Wiki syntax:
    • function orientation: A function begins with a curly brace followed by the function name,
      a colon, one or more arguments separated by ^ and finally a closing curly bracket. In the
      example there are two functions. The first one has the function name “queryTable” with
      “{book:cat:java}” as a single argument. The second one is the argument for the first one.
      It has the function name “book” and the argument “cat:java”. “queryTable” gets a query
      in RDQL as its argument and returns a table with the results. Here “book” is a predefined
      query function. It simply returns a query in RDQL according to its argument. In this case
      it returns a query to list a table with title and source for all bookmarks where the category
      is “java”.
    • nested functions: A function can also occur in a function, which makes recursive calls
      possible. When the document is parsed, the structure of the text is translated to a tree as
      described in the previous section about syntax.
    • syntactic sugar: To keep the advantages of common Wiki syntax, we added some syntactic
      sugar, e. g. ~~italic~~is translated to {i:italic} and stands for italic. For some more
      examples refer to section 2.1. This is very useful for just making some notices in ASCII text,
      but to have them rendered in HTML.

4.2.3 Wayland the Web Designer
Finally, Wayland an experienced web designer has to create some templates for the entries Alice
wrote. As already mentioned, he can also use WITL to face this challenge.
  Therefore he writes a master template:

{ call : { header }}
{ call : { body }}
{ call : { footer }}

                                                                                                 19
4    Scenario: A Bookmark Collection Web Site

    In this template there are several nested templates, included with the function call: 10 :
     • header: A header template that has some tags with formal definitions needed for layout
       reasons and to make the whole page valid HTML.
     • body: The definition how to display a single entry. There is a sidebar on the left that holds
       the author, the date of creation and the keywords.
     • footer: In this example the footer created by Wayland is just for printing a closing tag of
       the HTML document. But when the website of the three gets more sophisticated, there can
       be for example copyright information included.
    The code for the header:
[ [ [ 

    ] ] ] { : dc : title } [ [ [ 
   
   < !−− some CSS −−>
   
 ] ] ]

    The code for the body:
[ [ [ 

   < !−− l e f t s i d e b a r , some i n f o s −−>
    ] ] ]
     {b : Author : { : dc : creator }}
[ [ [  ] ] ]
     {b : created on : { : dc : date }}
     {b : subjects : { : dc : subject }}
[ [ [ 

  < !−− c e n t e r e d c o n t e n t s −−>
  
     ] ] ]
        { h1 : Topic : { : dc : title }}
      [ [ [ < !−− t h e w i k i T e x t −−> ] ] ]
        { : dc : source }
 [ [ [ 
  
 ] ] ]

    The code for the footer:
[[[
< !−− some o t h e r text −−>
 ] ] ]

10 The   key in the most cases is an URL.

                                                                                                  20
4   Scenario: A Bookmark Collection Web Site

                   Figure 7: A screen shot of an entry displayed by the Wiki.

  Fig. 8 shows a scheme of how the templates are evaluated. On top there is the content of the
main page template. Each sub template is included and evaluated by the call function. Then the
functions included in the “called” templates are evaluated as well until you have got the desired
output, a web page coded in HTML. Fig. 7 shows an screen shot of the result.

                                                                                              21
4   Scenario: A Bookmark Collection Web Site

                                {call: header}
                  Page Template {call: body}
                                {call: footer}

                                         Body Template
                  {b:someBold}

                   {renderGet:dc:source}

                                                           Entry Content
                  these are my ~~java~~ links:
                  {queryTable:{book:cat:News}}

                                                       HTML Output
                   Topic: Java
                  these are my Java-links:
                  
                  title
                  source
                  
                  Radeox :: start
                  http://radeox.org/
                  
                  ...
Figure 8: This figure shows an extract how a template (Page template) is evaluated. First the
          body (Body Template) is looked up in the template list and then evaluated. In the body
          there is a function renderGet that fetches the property dc:source of the entry and tells
          the evaluator to render it. The text for the property (Entry Content) has embedded
          functions that are simplified evaluated - here in one step. The entry then is rendered to
          HTML (HTML output).

                                                                                                22
7   Conclusion

5 Related Work
We are not aware of any RDF-based Wikis. As an example for a Wiki with meta tags we can
mention SnipSnap. SnipSnap [6] uses a concept with so called labels to attach data to an entry
like a label for the creator of the entry, a person. A person then can also have several labels like
”name: Smith” of a type like ”NameLabel”. A label has a type, a name and a value. This approach
is rather clever, but the disadvantage is that they do not use an open standard. Worth mention is
that SnipSnap supports an export of an entry to rdf.
   Omnigator [10] provides a web interface for navigating topic maps. It supports importing RDF
models and browsing them. But Omnigator is not a Wiki.
   SEAL [7] is a framework for developing ontology-based portal application. It differs in these
main points: it is not a wiki, it concentrates on semantics, which WITL does not, and SEAL is
focused on protal application. It has with WITL in common that both can give views or they call
it lenses on RDF data.

6 Future Research
    • Blogs:
      Our metadata support makes this a very straightforward enhancement. The first step is
      to add meta-data about the creation data. The second step is to display multiple small
      wiki pages on one web page and to sort them chronologically. Finally, one can add further
      convenience functions such as partitioning the blog entries into pages, a calendar wiki, an
      RSS feed etc.
    • Integration into the Hyena RDF editing infrastructure:
      There are two ways in which Hyena [13] will benefit Wikked:
        – RDF and Wiki syntax editing with a graphical user interface (GUI):
          One frontend for Hyena is implemented as a collection of plugins for the integrated
          development environment Eclipse. Having this frontend edit Wikked’s data will provide
          a nice alternative to purely Web-based editing.
        – Remote publishing:
          Hyena provides an infrastucture of distributed servers that can publish and subscribe
          data among each other. One can therefore start one editor as a Hyena engine (with a
          GUI frontend) and Wikked as another (without a user interface). Publishing from the
          editor to Wikked uses a push protocol provided by Hyena.
    • Lightweight publishing for software engineers:
      We are currently extending Hyena with a set of tools that allows software engineers to man-
      age development-related information (such as bug-tracking lists, documentation and source
      code) with Hyena, in one integrated RDF database. Using Wikked as a presentation layer
      for that database allows them to effectively publish and edit that information on the Web.
      For example, one can write a wiki page that documents a certain aspect of the system and
      includes code samples that are retrieved via RDQL queries.

7 Conclusion
In this paper, we presented an RDF based Wiki. Therefore, we designed a scripting language called
WITL. Templates for Wiki pages use the same language as entries in the Wiki. Both entries and
templates are also described and maintained with RDF. Furthermore, templates can also be edited

                                                                                                 23
7   Conclusion

as a normal Wiki page. That assures highest flexibility for the look and feel of the Wiki. To keep
the advantages of ordinary Wikis, traditional Wiki markup is also supported. We developed a
three layer architecture for browsing RDF data and showed how the involved components interact
between each other. As a proof of concept we presented a scenario where a web site for maintaining
bookmark collection is developed.

                                                                                               24
References

References
 [1] Apache Jakarta Project. Velocity. http://jakarta.apache.org/velocity/.
 [2] L. Aronsson. Operation of a large scale, general purpose wiki website. Elpub 2002. Technology
     Interactions., 2002.
 [3] T. Berners-Lee, R. T. Fielding, and L. Masinter. Uniform Resource Identifier (URI): Generic
     Syntax. http://www.ietf.org/rfc/rfc3986.txt, January 2005.
 [4] F. Dawson and T. Howes. vCard MIME directory profile. ftp://ftp.isi.edu/in-notes/
     rfc2426.txt, September 1998.
 [5] Fraunhofer FIRST. Radeox. http://radeox.org/.
 [6] M. L. Jugel and S. J. Schmidt. SnipSnap. http://snipsnap.org/.
 [7] A. Maedche, S. Staab, N. Stojanovic, R. Studer, and Y. Sure. Seal - a framework for developing
     semantic web portals. In BNCOD 18: Proceedings of the 18th British National Conference on
     Databases, pages 1–22, London, UK, 2001. Springer-Verlag.

 [8] F. Manola and E. Miller. RDF Primer, W3C Recommendation. http://www.w3.org/TR/
     rdf-primer/, 2004.
 [9] Netscape Communications Corporation. Open Directory Project. http://www.dmoz.org/.
[10] Ontopia. Omnigator. http://www.ontopia.net/omnigator/.

[11] T. Parr. Antlr parser generator. http://antlr.org/.
[12] T. Parr. StringTemplate: Java Template Engine. http://www.stringtemplate.org/.
[13] A. Rauschmayer. Hyena: A Semantic Web Enabled Editor for Software Engineers. Submitted
     for publication.

[14] A. Rauschmayer and P. Renner. Knowledge-Representation-Based Software Engineering.
     Technical Report 0407, Ludwig-Maximilians-Universität München, Institut für Informatik,
     May 2004.
[15] A. Seaborne. RDQL - a query language for RDF. http://www.w3.org/Submission/RDQL/.

[16] Sun Microsystems. Jsr-000152 javaserver pages 2.0 specification - final release. http://jcp.
     org/aboutJava/communityprocess/final/jsr152/.

                                                                                                25
List of Figures

List of Figures
   1    RDF Example: Adress Book Entry . . . . . . . . .         . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .    5
   2    Wiki Markup . . . . . . . . . . . . . . . . . . . . .    . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .    8
   3    Abstract Syntax Tree for Bold Function . . . . . .       . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   14
   4    Three Layer Architecture for Wikked . . . . . . . .      . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   15
   5    Wikked Components . . . . . . . . . . . . . . . .        . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   16
   6    A graph representation for an entry in a bookmark        collection.   .   .   .   .   .   .   .   .   .   .   .   .   19
   7    A screen shot of an entry displayed by the Wiki. .       . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   21
   8    Template processing. . . . . . . . . . . . . . . . . .   . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   22

                                                                                                                               26
You can also read