"You say sea cow, I say dugong "1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment

Page created by Emily Campos
 
CONTINUE READING
"You say sea cow, I say dugong "1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment
VocabUsageScenario.doc                                                                                   14/5/08

                        “You say sea cow, I say dugong …”1
             :a usage scenario for the use of controlled vocabularies
                 in a federated registry/repository environment

                                         Chris Blackall (APSR)2

Background to the Usage Scenario
This usage scenario is the result of an impromptu discussion at a meeting to discuss how
controlled vocabularies might be integrated into repository/registry applications.3 After the
discussion I agreed to write the usage scenario and circulate it to attendees.
The workshop was organised by Rob Atkinson and was held on 1 May 2008 at the CSIRO
IM&T offices at Yarralumla, Canberra.
Note that a usage scenario is not equivalent to a ‘use-case’ as defined in UML. User scenario’s
are more discursive than UML use-cases and include narrative descriptions and other
information about users and their needs that provide richer contexts for gathering user
requirements.

Scope of the Usage Scenario
The usage scenario addresses the generic need of researchers and other data producers to
lower the cost/effort required to create surrogate metadata records for research publications
and data before they are ingested into ‘repositories’ (defined broadly here to include long-
term data storage facilities).4
More specifically, it addresses the need to improve the accuracy of surrogate metadata records
by providing data producers with automated mechanisms to populate and/or validate the
relevant descriptive sections of metadata records; for example, by filling in or validating Web
page forms containing ‘subject’ information with controlled vocabularies (e.g. Field of
Research Codes) taken from authoritative sources (e.g. Australian and New Zealand Standard
Research Codes 2008).5
Additionally, the usage scenario covers improving the accuracy of metadata for datasets by
using controlled vocabularies, but in association with the semi-automated production of data
product specifications6; specifically, ISO 19131 Geographic information - Data product
specification.7

1
  Sung to the tune of “Let's Call the Whole Thing Off” (Originally performed by Ginger Rodgers and Fred Astaire,
composed by George Gershwin and Ira Gershwin, for the 1937 film Shall We Dance)
http://www.youtube.com/watch?v=zZ3fjQa5Hls
2
  Chris Blackall, Business Analyst, Australian Partnership for Sustainable Repositories (APSR), W.K. Hancock
Building, Australian National University.
3
  https://www.seegrid.csiro.au/twiki/bin/view/AppSchemas/VocabularyBindingMechanismsWorkshop
4
  Simon Cox discussed this requirement at the open meeting the previous day.
5
  http://www.abs.gov.au/AUSSTATS/abs@.nsf/productsbyCatalogue/5D99AEA1DD8AA8E0CA2574180005421C?
OpenDocument
6
  This is my attempt at capturing Rob Atkinson’s requirements. See
7
  http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=36760

                                                       1
"You say sea cow, I say dugong "1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment
VocabUsageScenario.doc                                                                                           14/5/08

Usage Scenario: One-click repository ingest and controlled vocabularies
Jill Page8, Professor of Environmental Science, James Cook University9, leads a
multidisciplinary team of researchers studying marine mammals; in particular, the Dugong
(species: Dugong dugon) 10.
One area where Prof. Page’s team has excelled is using new information and communications
technologies to remotely capture data about dugong movements and behaviours. For
example, they have developed techniques for attaching GPS (Global Positioning Systems)
transmitters to individual dugongs and recording their location and movement data for later
analysis with GIS (Geographical Information Systems) software.11 Furthermore, they have
pioneered the use of blimps12 and Unmanned Aerial Vehicles13 (UAV) to remotely record
digital video of dugong populations and behaviours (see attached).
Thanks to these new data collection and analysis tools, Page and her team have created large
volumes of data that is stored across many computers, storage devices and media. Worryingly
for Page as team leader, this data has not been properly described, nor is under long-term
management.
Despite the poor management of data, many of Page’s publications are stored in the
University institutional repository (JCU ePrints)14, which uses the ePrints software15.
Not only does Page encourage her team to submit research articles into the repository because
of the evidence that it improves the impact of their research and contributes to community
outreach, but also because she anticipates that the Australian government will eventually
mandate the submission of publicly-funded research publications and data and will possibly
allocate research funds partly based on the statistics provided by the repository through the
Excellence in Research for Australia (ERA) initiative.16 Hence, she concludes that creating
accurate metadata about research publications and data will be of major strategic importance
for her team.
Although Page is convinced of the importance of submitting research publications to the
repository, she wants better mechanisms to enable her research team to archive new
research publications and primary data sets and to ensure the metadata is accurate.
Moreover, she wants research publications and data to be linked so that users can discover
and download the publication and its data.
Finally, Page wants the whole submission process to be streamlined as much as possible—a
‘one-click’ process as she describes it.
Put simply, Page just wants to manually fill in the publication ‘title’ and ‘abstract’ fields in the
Web form; the other information should be entered automatically from stored data, or

8
  Prof. Jill Page is a fictional identity; however, it is largely based on the profile and work of Prof. Helene Marsh.
http://dugong.id.au/
9
  http://www.jcu.edu.au/ Although James Cook University is small by Australian and world standards, its
proximity to the Great Barrier Reef, and its affiliations with leading marine research groups, means that it is an
important node within the worldwide network of marine mammal researchers.
10
   http://en.wikipedia.org/wiki/Dugong
11
   Pyper, Wendy. 2007. Getting a Fast Lock on Dugong location. Australian Antarctic Magazine, 13. 26.
12
   Hodgson, Amanda. 2007. “BLIMP-CAM”: Aerial video observations of marine animals. Marine Technology
Society Journal 41 (2):39-43.
13
   Pyper, Wendy. 2007. Population survey pilots unmaned aircraft. Australian Antarctic Magazine, 13. 15.
14
   http://eprints.jcu.edu.au/
15
   http://www.eprints.org/
16
   http://www.arc.gov.au/era/default.htm

                                                            2
"You say sea cow, I say dugong "1 :a usage scenario for the use of controlled vocabularies in a federated registry/repository environment
VocabUsageScenario.doc                                                                               14/5/08

entered from pick-lists, pull down menus, and other user interface elements that are
populated with information from controlled vocabularies and other authoritative sources.
Page’s requirements for a better repository submission process were based on her previous,
mostly negative, experiences of submitting research publications to the University repository.
The errors often arose because the submission process Web form lacked basic data entry
validation functions for key metadata fields. In order to fill these fields in correctly, Page had
to cut-and-paste from various documents into the Web form. Even when finished, she lacked
confidence that the information was correct.
To fix these limitations, Page’s wish list includes:
     1. A ‘smart’ Web form for repository metadata
           That is, Web forms with pick-lists, pull-down menus, and other user interface
           elements that would assist the user to be populate the form with information from
           controlled vocabulary registries. These, for example, might include information
           about:
          •   Researcher names, identifiers, affiliation information obtained from an
              institutional (LDAP) directory or a Researcher Name Registry17
          •   Field of Research (FOR) and Socio-Economic Outcome (SOE) codes and
              descriptors obtained from an Australian and New Zealand Standard Research
              Codes (ANZSRC) registry18
          •   Unique identifiers for species obtained from a Life Science Identifiers (LSID)
              registry19
          •   Geospatial coverage and place names obtained from a national gazetteer
              service/registry
          •   Research collection information obtained from the Online Research Collections
              Australia (ORCA) Registry20
          •   …
     2. A data product specification ‘wizard’
           That is, a web application, or wizard, that guides users through the creation a
           standard data product specification for submission to a repository as a Submission
           Information Package (SIP). The wizard would include controlled vocabularies to
           assist users to fill in specific metadata fields (as in 1 above). The resulting Wizard
           configuration/profile information would be stored and associated with users identity
           information so that the configuration/profile can be easily reused. Similarly, local
           instances of metadata schemas and profiles would be regularly updated and
           maintained through a central metadata registry.

17
   Possibly as part of the Australian Access Federation, http://www.aaf.edu.au/
18
   Not under developed, but suggested to the ABS as a service that they should develop.
19
   http://lsids.sourceforge.net/
20
   http://www.apsr.edu.au/orca/index.htm Note that the ORCA Registry is the basis of the proposed ANDS
Collections and Services Registry (see figure 1).

                                                      3
VocabUsageScenario.doc                                                                                14/5/08

Who would be the beneficiaries in this usage scenario?
Three groups that would primarily benefit:
    1. Producers and owners of the original research publications and data would have low
       cost/effort methods of creating metadata, at the same time fulfilling some of the
       administrative requirements of their host institutions and research funding bodies.
       The aggregation of this metadata by third parties would enable their work to be visible
       at national and international levels via search engines and discovery services. This
       would potentially improve its impact, and certainly its reach. The development of
       controlled vocabularies by specific research communities would also assist researcher
       cohesion and collaboration though standardized use of terms, categories and
       concepts.
    2. End-users would have access to accurate information about research publications and
       data that was described and organised using controlled vocabularies. The use of
       controlled vocabularies would enable users to navigate/browse through research
       collections using faceted browse and navigation functions.
    3. Research funding organizations and managers would benefit through access to up-
       to-date information and statistics about research publications and data that adopted
       controlled vocabularies to ensure reliable and consistent metadata.

The Architectural Context of the Scenario
The reference architecture for this scenario is the one described in Towards an Australian
Data Commons (TADC), which details a federated architecture for a national network of
repositories and registries.21 Following the TADC architecture, the usage scenario assumes
that ‘repositories’ are separate functional entities to ‘registries’; although, they are inseparable
in terms of the services that provide to end-users of the federation (see fig 1 for example).22
In the context of the TADC, ‘repositories’ are typically document-centric or data-centric. By
this I mean that document-centric repositories (e.g. Fedora, DSpace and Eprints) typically
hold research publications (e.g. PDF files) and associated digital objects (e.g. image and audio
file), but little in the way of research ‘data’. 23
Nevertheless, these repositories are evolving to operate in a service-oriented environment and
thus can communicate with any third-party ‘service’, including registries, via the standard
W3C/OASIS Web Services stack, or via REST protocols and interfaces. In other words, they
can be easily integrated with data-centric repositories; that is, just as long as both support the
same interoperability frameworks and standards.
Hence, I am assuming that the controlled vocabulary registry applications implied in the
usage scenario would be ‘loosely coupled’ to repositories via Web Services/REST. It follows
that the Smart forms and Data Wizards would take advantage of Web 2.0 technologies (REST,
AJAX, etc.) to dynamically provide controlled vocabulary items to users when filling out Web
forms.

21
   ANDS Technical Working Group. 2007. Towards the Australian Data Commons: A proposal for an Australian
National Data Service Canberra. Department of Education, Science and Training (DEST), Australian Government.
http://www.pfc.org.au/twiki/pub/Main/Data/TowardstheAustralianDataCommons.pdf
22
   Note that because ebXML Registry specification combines repository and registry functions this scenario may
need to be adapted to be more understandable to the ebXML community.
23
   Document-centric repositories generally follow the reference model established by the NASA Consultative
Committee for Space Data Systems in the Reference Model for an Open Archival Information System (OAIS). See
OAIS. 2001. Reference Model for an Open Archival Information System (OAIS).
http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html

                                                      4
VocabUsageScenario.doc                                                                    14/5/08

Also, to be clear, this assumption does not preclude the option proposed by Rob Atkinson of
creating local proxy versions of vocabulary data: indeed, these strategies are complementary.
In addition to some basic technical and administrative metadata, the metadata ingested into
document-centric repositories is mostly descriptive or bibliographic information that is used
for discovery and citation purposes by users. The metadata standard used by document-
centric repositories is the ‘unqualified’ version of Dublin Core Metadata Initiative (DCMI),
Dublin Core Metadata Element Set, Version 1.1.24 However, the usage scenario described
above would require ‘qualified’ DC metadata, which in turn would require community
agreements about metadata profiles and interchange formats.
In contrast, the metadata required for data-centric repositories varies a great deal as these are
often run along community- or discipline-specific lines and adopt local or de facto standards.
A further complication is that many data-centric repositories neither support the standard
W3C/OASIS Web Services stack, nor REST protocols and interfaces.

24
     http://dublincore.org/documents/dces/

                                                5
VocabUsageScenario.doc                                                           14/5/08

   Figure 1: Conceptual view of a (simplified) TADC Architecture including a generic
                                   vocabulary registry

                                          6
Australian Antarctic magazine issue 13: 2007

     Getting a fast lock on
     dugong locations
     New generation satellite tag technology that can locate and
     record the position of tagged animals faster and more efficiently
     than previous instrumentation, promises to vastly improve
     scientific understanding of dugong movement and habitat use.

                                                                                                                               Wildlife Computers
     Through the Australian Centre for Applied Marine                 ‘When dugongs are in deep water and/or moving
     Mammal Science, Dr Ivan Lawler of James Cook                     quickly, we get fewer location fixes using standard
     University, and Mr Dave Holley of Edith Cowan                    GPS technology, because the tags do not breach the
     University, will test the ability of new ‘Fastloc®’ GPS          surface for long enough,’ Dr Lawler says.                A Fastloc® tag, similar to this one produced by Wildlife
                                                                                                                               Computers in the US, but with a dugong-specific
     (Global Positioning System) technology (developed
                                                                      ‘This introduces a serious bias that can interfere       housing that allows the tag to be tethered to dugongs’
     by Wildtrack Telemetry Systems Ltd, UK) to track
                                                                      with modelling of dugong habitat use and our             tails, will be used to track the fine scale movements of
     the fine scale movements of dugongs in deep water
                                                                      ability to detect migratory corridors.                   dugongs in deep water and sub-tidal seagrass meadows.
     and sub-tidal seagrass meadows.
                                                                      ‘If we don’t know what routes dugongs take when
     Dugongs have traditionally been tracked with                                                                              ‘The habitat use of dugongs within inshore seagrass
                                                                      they move between areas, we don’t know what
     standard GPS tags, which need to remain above the                                                                         meadows is poorly understood at low tide because
                                                                      threats – such as nets – they could potentially be
     water’s surface long enough to download ‘ephemeris’                                                                       the animals are in deeper water than at high tides
                                                                      exposed to, and we can’t assess the importance of
     data relating to the positions of the passing GPS                                                                         when they move up into the intertidal shallows,’ Dr
                                                                      deep water seagrass beds to the animals. This has
     satellites. The longer a tag is submerged between                                                                         Lawler says.
                                                                      implications for the conservation and management
     one position fix and the next, and the further the                                                                        ‘So fewer locations are received from dugongs at
                                                                      of both dugongs and their habitat.’
     animal travels before resurfacing, the longer it takes                                                                    low tides than at high tides. We’ll compare the
     to record the next position. In practice, this often             The research team will test the effectiveness of         frequency of location fixes between these two areas
     means that the dugong (and tag) re-submerges                     Fastloc® tags in two very different habitats –           and if similar numbers of locations are received in
     before a location is calculated, leaving significant             Shoalwater Bay in central Queensland and Shark Bay       both habitats it will demonstrate that the Fastloc®
     gaps in the data. Fastloc® tags, in contrast, do                 in Western Australia. Both areas are important for       system can acquire position fixes from animals in
     not download ephemeris data and need only 0.02                   dugong conservation. However, Shoalwater Bay has         deep water.’
     seconds at the surface to record data that can be                a high tidal range of 7-8 m while Shark Bay has a
                                                                                                                               The tags will also be tested for their ability to
     processed to provide an animal’s position.                       tidal range of 1.7 m.
                                                                                                                               acquire location fixes from dugongs moving rapidly
                                                                                                                               between seagrass habitats in different bays.
                                                                                                                               The tag units will be deployed on five dugongs in
                                                                                                                               each region for 2-3 months, along with time-depth
                                                                                                                               recorders to measure the animals’ dive profiles. Tags
                                                                                                                               will be attached to the tail of the dugong via a
                                                                                                                               harness with a remotely triggered release. The Argos
                                                                                                                               satellite system will then be used to locate the tag
                                                                                                                               and to decode the dugong location information
                                                                                                                               recorded by it.

                                                                                                                               WENDY PYPER
                                                                                                                               Information Services, AAD
                                                                                                                                                                                     Paul Lavery
     Judy Davidson

                                                                                                                            A dugong is released with its tag (a traditional GPS unit)
     A dugong is restrained during attachment of a tag to its tail.                                                         attached.

26
MARINE MAMMAL SCIENCE

Population survey pilots
unmanned aircraft
Robotic aircraft or

                                                                                                                                                                                            Joshua Smith and Michael Noad
‘Unmanned Aerial Vehicles’
(UAVs) could soon take to
the skies in the name of
marine mammal research,
if a pilot project to test
the technology succeeds.

Through the Australian Centre for Applied Marine
Mammal Science, Dr Amanda Hodgson and Dr
Michael Noad, of the University of Queensland,
will conduct and compare traditional manned and
UAV surveys of dugongs and humpback whales,
to test whether UAVs can improve the safety,
cost-effectiveness and accuracy of marine mammal
population surveys.

‘Aircraft hire and personnel costs mean that
traditional manned aerial surveys are expensive,        can images be viewed in real time to enable                            ‘Migrating humpback whales usually travel singly
and eight people have died over the past 20 years       operators on the ground to alter the flight path                       or in pairs, and often you just see their blows
after aircraft crashed during aerial surveys,’ Dr       when animals are sighted; and how much post-                           before they submerge again. They’re spread out on
Hodgson says.                                           flight analysis of images is required?                                 a long migratory path, so you have to cover quite
                                                                                                                               a bit of ocean to find them.’
‘So we want to determine whether UAVs offer             Dugongs and humpback whales are being
a better way of monitoring marine mammal                                                                                       For dugongs, the UAV will fly transects over Moreton
                                                        targeted as they live in different environments,
                                                                                                                               Bay and Hervey Bay, in south-east Queensland, and
populations, by reducing the cost and the risk,         are sighted using different cues from the air,
                                                                                                                               when a herd is sighted - through the live video link
and by increasing the accuracy of species               and have very different movement habits and
                                                                                                                               – researchers will take over the controls and circle
detection, location and identification using            aggregation patterns.                                                  the herd to get an accurate count.
on-board imaging technology.’
                                                        ‘Dugongs sometimes congregate in large herds of                        Humpback whales will be located during their
UAVs have been around since the 1950s and               up to 300 individuals, and need to be circled to be                    winter migration past North Stradbroke Island, and
developed for a range of applications including         counted,’ Dr Noad says.                                                the UAV will again be tested at varying heights
defence, weather research, and search and rescue.                                                                              above the animals. Still and video images will then
They are largely untested in wildlife research,                                                                                be compared to see if there is any advantage of
but they have the potential to be used at night –                                                                              one over the other.
with infrared cameras attached – or in extreme
                                                                                                                               ‘Still images will likely have a better resolution
environments. Their lower cost would also enable
                                                                                                                               than video images, but it may be easier to detect
more aerial surveys to be conducted, improving                                                                                 whales from movement in the video,’ Dr Noad says.
                                                                                                           Aerocam Australia

population estimates.
                                                                                                                               If this first phase of the project proves successful,
The research team will use a large (5 m wingspan),                                                                             the researchers will move on to the second
commercially available UAV, supplied by Aerocam                                                                                phase – to directly compare the results of UAV
Australia and equipped with video and still cameras.                                                                           surveys with manned surveys.
                                                        Aerocam’s UAV ‘Shadow’
‘A larger UAV can carry more equipment and a                                                                                   The scientists admit this is a high-risk project.
lot more fuel – allowing us to cover the greater           Aerocam’s ‘Shadow’ specs                                            But even if the technology does not prove
distances necessary for whale surveys,’ Dr Noad says.      Wingspan:                  5.2 m                                    adequate today, with the pace of development,
                                                           Length:                    2.9 m                                    it may be in just a few years’ time.
The first phase of the project will test the basic
                                                           Max weight:                90 kg                                    ‘In the medium to long term, smaller UAVs could
capabilities of the UAV for viewing and surveying
                                                           Fuel load:                 12-24 l                                  reduce the cost of flights to just a few dollars an
marine mammals. It will ask a range of questions,
                                                           Max range:                 1500 km                                  hour, while better imaging software could negate
including: does the UAV provide video and still            Endurance:                 3-8 hr
images that can be easily analysed by researchers                                                                              the need for human analysis at all,’ Dr Noad says.
                                                           Speed:                     160-200 km/hr
or image analysis programs; what is the optimal            Max payload:               25 kg                                    WENDY PYPER
camera height and system for different species;                                                                                Information Services, AAD
                                                                                                                                                                                       25
You can also read