The Annodata Framework: A data-centric approach to enhance metadata - Deutsche Bundesbank

Page created by Barbara Lloyd
 
CONTINUE READING
The Annodata Framework: A data-centric approach to enhance metadata - Deutsche Bundesbank
The Annodata Framework:
       A data-centric approach to enhance metadata
       Technical Report 2019-14
       S. Bender, J. Blaschke, H. Doll, A. Gordon, C. Hirsch, D. Hochfellner, J. Lane

Disclaimer
The views expressed in this technical report are personal views of the authors and
do not necessarily reflect the views of the Deutsche Bundesbank or the Eurosystem.

Citation
Bender, S., J. Blaschke, H. Doll, A. Gordon, C. Hirsch, D. Hochfellner, J. Lane
(2019). The Annodata Framework: A data-centric approach to enhance metadata.
Technical Report 2019-14, Deutsche Bundesbank, Research Data and Service
Centre.
Deutsche Bundesbank, Research Data and Service Centre
                                                                              14 October 2019

The Annodata Framework: A data-centric approach to enhance metadata

Stefan Bender1, Jannick Blaschke1, Hendrik Doll1, Andrew Gordon2, Christian Hirsch1,
Daniela Hochfellner3, Julia Lane31

1.   INTRODUCTION ........................................................................................................................ 3

2.   EXISTING METADATA STANDARDS ...................................................................................... 3

3.   THE ANNODATA FRAMEWORK ............................................................................................. 4

4.   USE CASES: DATA PROVIDERS AND DATA USERS ......................................................... 10

5.   TECHNICAL DEPLOYMENT................................................................................................... 13

6.   CONCLUSION ......................................................................................................................... 13

7.   REFERENCES ......................................................................................................................... 14

Abstract: The new availability of data through secure Research Data Centers (RDC)
provides a new opportunity to capture metadata based on its access and use, in addition to
that, based on its production. The new information, which we call annodata, can be used to
improve the management of and access to restricted data. We identify two types of
annodata. The first is administrative metadata, such as legal requirements and data access
workflows. The second is usage metadata, such as the number of times it is used and the
research outputs that are produced. This paper presents an integrated annodata framework
and describes how annodata can automate the management of confidential data for RDC
managers, improve dataset search and discovery for users and increase knowledge of
dataset use for data providers.

Keywords: Annodata, administrative data, usage, metadata, machine-readable, data
stewardship, confidential data, data management, data description, big data

1
  Deutsche Bundesbank, email: {firstname}.{lastname}@bundesbank.de
2
  Columbia University
3
  New York University

The Annodata Framework: A data-centric approach to enhance metadata                                                      Page 2 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

1.     Introduction

The increased availability of new types of data has transformed the practice of social
science research in many ways. One of the most important ways is that data access is
much more likely to be through a secure facility, both because the data are more granular
and because it has become much easier to re-identify individuals and businesses. This
creates challenges for both data providers and users. Data providers have to find
appropriate ways of maintaining highest privacy protection standards, but they often do not
have the tools necessary to comply with regulatory practices such as those introduced by
the GDPR, particularly to track of who has access to what data. Data users have to search
for and find the datasets most appropriate for their research from an increasing variety of
data, with little information on what data have been used for which research questions and
by which other researchers.

We propose an approach inspired by the use of paradata in survey methodology (West,
2011) which captures auxiliary information about the interview process, including
interviewer and respondent behaviors. The approach, called annodata, refers to any
additional information on a dataset that is collected during the research cycle. This can be
administrative dataset information, such as legal requirements and data access workflows.
It can also include information collected on dataset usage, such as usage quantity or
research output. Just as paradata can be used to improve survey administration, annodata
on research access can produce more efficient data administration workflows and on
dataset usage can allow efficient user-to-user knowledge transfers.

2.     Existing metadata standards

To contextualize our work, we look at existing metadata approaches. It is important to note
here that the purpose of metadata overall is to serve the goals of the community that uses
it and the organizations that provide it (Willis, Greenberg & White, 2012). Different
communities and organizations have different goals that guide their collection, usage, and
sharing of data.

Many existing data repositories and archives have discussed their work in creating,
organizing, and disseminating descriptive metadata about datasets such that these
datasets might be discovered, shared, understood, and reused (Hancock, 2017; Dietrich,
2010; White, 2014; Moss et al., 2016; Rücknagel et al., 2015). That work has given rise to
generally used and flexible metadata schemas, such as schema.org, DataCite, and Dublin

The Annodata Framework: A data-centric approach to enhance metadata            Page 3 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Core, and so datasets can nowadays be described in a flexible and generally understood
way.

However, the information by which data are released, protected, controlled, and its access
and usage tracked and audited is less well defined in the literature, and there is no single,
easy solution to implement metadata standard or framework. Standards like the Metadata
Encoding & Transmission Standard (METS) provide a scaffolding by which to define
administrative metadata pertaining to intellectual property and copyrights, how objects were
created and stored and original source information, but might not be granular enough to
use for designating complicated usage restrictions by file, table, column, or specific fields
in datasets. Standards such as PROV-O and PREservation Metadata: Implementation
Strategies (PREMIS)—another metadata standard stewarded by the Library of Congress—
provide guidance for defining lineage and provenance around the creation and maintenance
in preserving digital objects. Gunia and Sandusky (2010) provide a detailed description of
how PREMIS can be used to preserve Earth Science data, but are not aimed at the
complicated chain of transformations that occur over the lifetime of a datasets usage.

Chao, Cragin, and Palmer (2014), however, have proposed a standard data curation
vocabulary, which incorporates not just a description of the data but practical steps of how
the data are used by the researchers that produced and shared it. In addition, Gail and Uhlir
(2016) state the importance for research to “share, access, and reuse data”, which “requires
effective technical, syntactic, semantic and legal interoperability rules and practices”
through metadata. The International Rights Statements Working Group (2015a) introduce
a standardized vocabulary to describe usage terms and copyright status of intellectual work,
which they coin “rights statements”.

This brief review suggests that the current literature uses schema primarily based on
metadata that describes data from the production side. We tackle this challenge by
enhancing the classical metadata concept in two directions: data administration and usage.
The combination is conceptualized in the annodata framework.

3.     The annodata framework

In this section, we propose two new items to comprehensively describe datasets beyond
metadata. Such new items are precisely data on dataset administration and data on dataset
usage. The annodata framework builds on classical metadata, which are enhanced by
administrative and usage annodata with the purpose to describe datasets extensively. Each
of these three framework attributes has a specific added value (as graphically depicted in
Figure 1). “Classical” metadata is well-established as a tool to categorize data.

The Annodata Framework: A data-centric approach to enhance metadata             Page 4 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Administrative annodata brings the added benefit of allowing automated workflows towards
the dissemination of data. Usage annodata allows fast and efficient knowledge transfers by
systematically capturing what others have done with the data.

                       Figure 1: Three types of dataset related metadata

3.1. The three-fold data centric approach

The proposed concept enhances rather than replaces the existing framework and suggests
additional attributes to be considered when designing data-centric systems so that
annodata can be integrated into existing standards. Existing standards such as the Data
Documentation Initiative (DDI), PREMIS, METS, DCAT, and Schema.org have varied
approaches to designating access rights, although these have not been widely deployed in
the context of a research data facility or other similar analytical environments.

Our experience is that a three-fold (classical, administrative, usage) informational basis is
both necessary and sufficient to implement virtually all processes along the data life cycle,
notably data handling, linkage and dissemination. Automated processes along the life cycle
are made possible through such standardized data descriptions. In the following, we
present each item of the three-fold framework in detail.

The Annodata Framework: A data-centric approach to enhance metadata                 Page 5 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

3.2. Classical metadata

Metadata from data producers can be used to describe and document how data is being
produced. Such metadata is well-described in many high-quality available metadata
standards and schemas. A notable example for such a standard in the social sciences is
the Data Documentation Initiative (DDI). An incomplete list of attributes that we categorize
under this type of metadata includes the description of collection methodologies, how raw
data is being processed to obtain the standardized dataset, and modes of data distribution.
Table 1 provides a selected few examples of classic metadata.

  Table 1: Examples for classic metadata items

  Name                Name of the dataset.

  Creator             Name(s) of the institution, and/or division, and/or department
                      responsible for developing, collecting and/or managing the dataset. The
                      first creator provides the name of the institution. The name of the creator
                      may or may not be identical to item "Name of Institute" (Data Owner).
                      The second creator gives information on the department (but not
                      individual persons).

  Description         Short description of the dataset.

  Sampled             The elementary units about which inferences are to be drawn and to
  universe            which analytic results refer.

3.3. Administrative annodata

The efficient governance of micro data requires a clear and machine-readable set of rules,
which are consistent across different datasets and potentially across different types of
facilities and repositories. This is vital for standardization of processes and wherever
possible automation of tasks and decisions within the data management process. Our
proposed approach would enhance existing processes by automatically collecting
information on how data are regulated and governed; this would significantly ease the work
of data stewards. Administrative annodata in this context thus includes to all information
that helps data resource management, such as providing information on governing data

The Annodata Framework: A data-centric approach to enhance metadata                    Page 6 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

dissemination, for example license restrictions, authentication, and availability (Medeiros et
al., 2011).

It is worth noting that the Dublin Core Metadata Initiative already includes the term
administrative metadata (Koch and Weibel, 2000) as “metadata-about-metadata”
necessary to manage data assets; as does the National Information Standards
Organization (Riley, 2017) as the information needed to manage a resource. Our extension
to regulatory information can substantially reduce the administrative burden for data
stewards, since administrative annodata would hold information on which datasets can be
linked, which degree of anonymization is required, which laws apply, and who can approve
data access and any other bureaucratic requirements.

Data providers can also use administrative annodata to data access by constructing fully
automated processes that use annodata to determine a user’s access to a given dataset
(Yarkoni, Tal, et al., 2019). Data Users will find out everything they need to know about
access rights and conditions and can use this information to plan their data access request.
Examples for administrative annodata items are given in Table 2.

  Table 2: Examples for administrative annodata items

  Data access            The legal framework under which this data is collected. From this,
  regime                 many other attributes deterministically follow.

  User                   All potential usage restrictions that apply to the usage of a specific
  restrictions           dataset depending on the type of researcher or project.

  Linkage                All dataset-specific restrictions that apply when combing it with a
  restrictions           selected second dataset.

  Analytical             Other dataset-specific restrictions apart from user or linkage
  restrictions           restrictions that may apply to analysis, e.g. to datasets usage
                         restrictions for certain topics.

  Anonymization The classification of the dataset’s degree of anonymization.

Figure 2 illustrates an example of administrate annodata designed to describe procedures
in a research data center that offer access to high quality administrative micro-level data for
research purposes. For this purpose, administrative annodata would fall into two different

The Annodata Framework: A data-centric approach to enhance metadata                  Page 7 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

layers. The first layer comprises access regime, database, and dataset, which describes
access to an individual dataset.

       Figure 2: Six types of annodata for the example of a research data center

The rationale behind this is as follows. Typically, a number of different modes are available
to access individual datasets. An access mode is a mode via which access to the data can
be granted. Examples include download of data or secure on-site access at the premises
of the data providing institution. Each access mode in turn may have a number of different
access protocols attached to it. Access protocols describe the criteria that have to be fulfilled
to be granted access under a specific access mode. The protocol-criteria often times are
imposed by a combination of the legal basis, the affiliation of the researcher (e.g. internal
vs. external) and the degree of anonymization (non-anonymized vs. fully anonymized) of
the requested data.

Note that this example distinguishes between a dataset and a database. For example, the
national credit register is a database whereas the national credit register for the period 1992
to 2017 constitutes a dataset. It follows from this definition that an important distinction
between databases and datasets is the frequency with which both are updated. While the
former is updated frequently because of new information the latter, once fixed, is never
updated making datasets better suited to ensure reproducibility of research results. To
reduce the burden to implement administrative annodata for each datasets separately,
datasets should inherit as much attributes as possible from databases.

The Annodata Framework: A data-centric approach to enhance metadata                 Page 8 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

To be able to fully construct automated processes, the dataset must be unambiguously tied
to an access regime. This requires, first, that database has to be partitioned into datasets
based on the degree of anonymization of the requested data. Second, each dataset needs
to be unambiguous identifiable for example by assigning persistent identifiers such as e.g.
Digital Object identifiers (DOI). One may also tie access regimes to databases. However,
two things are to be considered. First, different parts of a database may be governed by
different access regimes. Second, assigning persistent identifiers to databases may prove
conceptually hard as they may be in constant updating cycles.

The second layer of the administrative annodata describes the access to multiple linked
datasets that may be requested by researchers when conducting research. To be able to
handle these cases, this layer comprises of information on record linkage, restrictions
incurred when combining datasets, and decision rules that describe which data access
protocol must be applied in cases of multiple datasets.

Besides information on the technical feasibility, the record linkage element also includes
information on the methods applied and the underlying assumptions used when creating
this link. The latter is especially important to put researchers into the position to gauge the
quality of the link and take a decision whether they want to use this link in own research.

Information on any restrictions applying to specific datasets are collected in the combining
restriction element. Finally, to allow the possibility of linking datasets from different access
regimes administrative annodata also needs to define unambiguous decision rules, which
access protocol, applies in these cases. For example, different access regimes may call for
different contracts that the researcher has to sign before being granted data access. For
these cases, the decision rule’s element may specify a dedicated protocol that contain a
rule to which contract applies under the given circumstances.

3.4. Data usage annodata

Annodata on data usage can track the activity of current and prior researchers and projects
and use that information to better understand potential use cases, identify dataset-specific
characteristics and provide guidance to new researchers. Usage annodata can be
converted into user recommendation systems, which suggest specific studies or datasets
other researchers have used working in the same field, similar to what is being done in
private industry (Hu et al., 2006). A new effort has recently been launched to automate the
collection of context specific annodata in social science research (Lane, Mulvany, Nathan
2019): these data can be used to report usage statistics and improve the quality of data.

The Annodata Framework: A data-centric approach to enhance metadata                Page 9 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Annodata on data usage includes items such as publications that use a particular dataset,
defining data experts, as well as code snippets and user comments about the quality of a
given dataset. The combination of this information identifies datasets that are often used
together and provide additional information on possible data linkages. Such data
enrichment strategies can then be used to recommend data to new users allowing a more
efficient micro data usage and knowledge exchange with peers.

Annodata on data usage can be obtained by text mining techniques from research
publications, by obtaining user feedback in the form of comments and by analyzing
researcher code for empirical analyses. To analyze and learn from researchers’s comments
an integrated system allowing researchers to comment and share information is currently
being implemented at NYU, in collaboration with the Bundesbank and others.

  Table 3: Examples for usage annodata items

  Publications           Publications for which the dataset was used.

  Code                   Code specifically written to process or analyse the dataset.

  User feedback          Dataset-specific feedback and comments from data users.

Table 3 presents examples for what we consider usage annodata. We continue by
outlining two use-cases that benefit or are made possible from our introduced three-fold
produce-administrate-use framework.

4.     Use cases: Data providers and data users

4.1. Data providers

A recent trend in empirical research is the proliferation of research data centers (RDCs),
which offer access to high quality administrative micro-level data for research purposes.
RDCs, as opposed to Research Data Repositories, are defined here as institutions
responsible for making their data available to researchers and their own staff in a manner
often governed by specific privacy of confidentiality regimes. Examples include the United
States Federal Reserve System or National Central Banks in the Eurosystem such as the
Deutsche Bundesbank.

RDCs provide researchers and analysts with access to selected micro data in the context
of independent scientific research projects. They have twin imperatives. The first of these

The Annodata Framework: A data-centric approach to enhance metadata                 Page 10 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

is administrative in nature: to maintain highest privacy protection standards, since access
to micro data is subject to legal and data protection requirements. The second is to provide
the highest quality access to researchers, so that the best possible use is made of the data.
We believe that high quality annodata will enable RDCs to respond to each of these
imperatives.

The Federal Reserve Board has provided a useful overview of the administrative challenges
as a result of their attempts to set up a data service following the 2008 economic crisis
(Cannon, 2014). They quickly found that one-size-fit-all standards and technology for
automation of data management and data governance are lacking or non-existent. Indeed,
data management and data governance practices are not traditionally conceived as
technology, infrastructure and metadata challenges, but challenges of business process
and operation (Datoo, 2019). In particular, RDCs must keep track of administrative dataset
attributes; such as who has access to data in which degree of granularity and
anonymization, and which data can be linked to which.

Administrative annodata on the dataset-specific legal, organizational or technical rules and
requirements that are needed for the performance of data stewardship related tasks will
support the data steward in RDC administration and help improve data access. For
example, the privacy principles and regulations that have been imposed by lawmakers can
be attached to a given dataset and the resulting annodata can be used by the RDC to
automate privacy protection mechanisms at all times during a research lifecycle. Thus,
administrative annodata that contain machine-readable and detailed information on
dataset-related access rules and restrictions can be used to design workflows providing
access to confidential micro data and support data governance in general.

Some work has already been done in this area. For example, the National Information
Exchange Model (NIEM) has proposed automating reasoning around access to datasets
pertaining to privacy policy (Yuan, 2016). In addition, established system architectures such
as Amazon Web Services Identity and Access Management (IAM), while not exactly a
metadata standard, provide a framework or model whereby granular user access and
permissions is established and employed as a set of roles that can be applied to users at
multiple levels of granularity (i.e., at resource level or even at levels such as columns in
databases).

Usage annodata can help support RDCs who are also tasked with providing information
and consulting researchers on data selection, data content and analytical approaches. For
this task, usage annodata (from publications, researcher code, and researcher comments)
may be used to enhance existing applications in a user-centric way, which leads to obtaining

The Annodata Framework: A data-centric approach to enhance metadata             Page 11 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                               14 October 2019

refined services. Better services in turn allow better research, because the available micro
data is better described and can be used for effectively.

4.2. Harmonizing data access across institutions

One example of harmonized producer metadata from producers is the INEXDA 2 Metadata
Schema, which is mainly based on the da|ra metadata schema (Helbig et al 2014, in turn
in line with the DataCite Metadata Schema). Adapting an existing metadata schema to fit
the purpose of INEXDA provides a level of standardization for micro data coming from
different countries, institutions, and collection purposes. The schema facilitates a
comprehensive inventory of existing granular datasets conducted in the member
institutions.

The INEXDA metadata schema was designed specifically for micro data datasets where a
dataset is defined as a snapshot of a database at a certain point in time (INEXDA, 2019).
For example, information in the national credit register for the period 1992 to 2017 would
constitute a dataset. The national credit register without any restrictions on time would be
the database. Databases may also be described by the INEXDA metadata schema.
However, this may involve establishing conventions for some metadata items (e.g.
“Publication date”). This structure with smaller “datasets” inheriting attributes from larger
“databases” makes sense for our proposed annodata schema, too.

4.3. Data users

Implementing the proposed annodata framework allows establishing machine-readable
rules to automate processes. For the data users these automated process are a valuable
asset as well. Having a built out module that displays transparent access requirements and
offers a way to easily request data and manage research projects will help navigate the
research project at any given point in time.

Furthermore, recommendation systems built upon data usage annodata can automatically
give advice to new users which datasets to use and who dataset experts are. Like online
retailers, users see, “based on your preferences, you might also like”.

Collecting comments of users to specific datasets and providing code to harmonize data
and clean data fosters knowledge exchange within the research community.

2
    The International Network for Exchanging Experience on Statistical Handling of Granular Da-ta (INEXDA) provides a
    platform for exchanging experiences on statistical handling of granu-lar data for central banks, national statistical institutes
    and international organisations. As such, it supports the G20 process, notably the Data Gaps Initiative 2 recommendation
    aiming to promote the exchange of (granular) data as well as metadata. The network was founded in January 2017. More
    information on INEXDA can be found at https://www.bis.org/ifc/publ/ifcwork18.htm

The Annodata Framework: A data-centric approach to enhance metadata                                                Page 12 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

5.     Technical Deployment

The challenge is to build and deploy a system to administer the annodata framework. Our
approach has been to develop a “Data Stewardship” web application that can be used by
various data providers (Henke and Hochfellner, 2019). The goal of the application is to
provide an automated and structured workflow that can be used by Research Data Centers,
as well as other stewards of confidential data, to manage approvals, track data access and
manage use by analysts and researchers. It is also designed to automate the onboarding
of researchers in terms of dataset search and discovery and approval. It facilitates the data
access and approval process, reduces administrative time, and improves resource
utilization. It provides data stewards with reports on data access and usage metrics.

The main features of the module for data owners are (i) management of dataset policies
and data stewards, (ii) management of data access requests and approval workflows, (iii)
management of the legal paperwork, and (iv) reporting on dataset usage, listing projects,
and user-generated metadata. The main features of the module for researchers are (i)
automated data search and discovery, (ii) automated application process, (iii) transparent
approval process and (iv) automated export process.

This is a first step towards integrating the annodata framework into a research facility that
governs legally protected administrative data.

6.     Conclusion

In the current paper, we map out a three-fold approach for thinking about dataset attributes.
Besides classical metadata, we propose administrative- and usage annodata as further
dataset characteristics. In existing metadata schemas, some items of these novel aspects
show up incidentally, however, we offer a structured way of thinking on the categorization
of these elements, and also which further items to include. It is absolutely fundamental that
these new elements are combinable with existing metadata schema.

Our proposed “Produce, Administer, Use” three-fold schema maps well to the model of the
empirical knowledge generating process (Bender et al, forthcoming). The cycle describes
information flow in empirical research using confidential micro data. Information flows from
data services to researchers to publications and – if properly structured –back to data
services. Such information feedback from usage allows data services to improve. From
there on, the empirical knowledge generating process starts over, but on a higher level, with
improved data services allowing better research in turn.

The Annodata Framework: A data-centric approach to enhance metadata             Page 13 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

The information flows generated from the knowledge generating process are encapsulated
in metadata and annodata. Data services to research include metadata from the producer
side, allowing the researcher to make an informed decision on whether to apply for the
confidential data. Administrative annodata both informs data services (informing
researchers), as well as governs research (degree of access restrictions). In turn,
researchers generate publications as well as implicit knowledge. If publications and implicit
knowledge are properly extracted and structured, we obtain usage annodata, allowing the
empirical knowledge generating process to be complete.

We argue that the three metadata and annodata items Produce – Administer– Use are both
necessary and sufficient conditions to inform the design of workflows governing automated
and user-centric data usage applications in fields such as research data centers, data
archives and institutional repositories, online retailers, and institutional data exchange.
Having the annodata framework allows easy and automated integration of information into
for the management and use of new data.

7.     References

Bender, Stefan, and Hausstein, Brigitte and Hirsch, Christian: An Introduction to INEXDA’s
Metadata Schema. IFC Bulletin No 29, 2019b.

Bender, Stefan; Doll, Hendrik Christian and Hirsch, Christian. Who’s Waldo: Conceptual
issues when characterizing data in empirical research. In: Where’s Waldo. Sage. 2019a.

Block, William C.; Bilde Andersen, Christian; Bontempo, Daniel E;Gregory, Arofan;
Howald, Arofan; Kieweg, Douglas;Radler, Barry T. Documenting a wider variety of data
using the data documentation initiative 3.1. DDI Working Paper Series – Longitudinal best
practice, no.1. 2011.

Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American
Society for Information Science & Technology, 63(6), 1059-1078.
http://dx.doi.org/10.1002/asi.22634

Cannon, San and Degn Pan. 2016. First Forays into Research Data Dissemination: A
Tale from the Kansas City Fed, IASSIST Quarterly. pp. 35-39

Clement, Gail and Uhlir, Paul. Legal Interoperability of Research Data: Principles and
Implementation Guidelines. RDA-CODATA Legal Interoperability Interest Group, 2016.
DOI: 10.5281/zenodo.162241

The Annodata Framework: A data-centric approach to enhance metadata             Page 14 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Chao, T. C., Cragin, M. H. and Palmer, C. L. (2015), Data Practices and Curation
Vocabulary (DPCVocab). J Assn Inf Sci Tec, 66: 616-633. doi:10.1002/asi.23184

“Controlling Data Usage Using Structured Data Governance Metadata.” 2017.
http://search.ebscohost.com/login.aspx?direct=true&db=edspap&AN=edspap.201703087
15&site=eds-live.

“Data Management for Combined Data Using Structured Data Governance Metadata.”
2018.
http://search.ebscohost.com/login.aspx?direct=true&db=edspap&AN=edspap.201803417
80&site=eds-live.

Datoo, A. (2019). Data, Data Modelling and Governance. In Legal Data for Banking, A.
Datoo (Ed.). doi:10.1002/9781119357216.ch4

Dietrich, Dianne. 2010. Metadata Management in the Data Staging Repository, Journal of
Library Metadata, Vol. 10 Issue 2, pp.79-98 DOI: 10.1080/19386389.2010.506376

Henke, Graham & Daniela Hochfellner, The NYU Data Stewardship Module (2019),
(available at https://coleridgeinitiative.org/assets/docs/ADRF White Paper_ Data
Stewardship).

Gunia, Betsy and Sandusky, Robert J. (2010), Designing metadata for long‐term data
preservation: DataONE case study. Proc. Am. Soc. Info. Sci. Tech., 47: 1-2.
doi:10.1002/meet.14504701435

Hancock, Andrew. 2017. The Modernisation of Statistical Classifications to Knowledge
and Information Management Systems, The Electronic Journal of Knowledge
Management Vol. 15, Issue 2. pp. 126-144

Helbig, Kerstin, Hausstein, Brigitte, Koch, Ute, Meichsner, Jana, Kempf, Andreas Oskar.
da|ra Metadata Schema, gesis technical report 2014|17, 2014

Hu, Xiao; Downie, J. Stephen; Ehmann, Andreas F. Exploiting Recommended Usage
Metadata: Exploratory Analyses. In: ISMIR. 2006. S. 19-22.

INEXDA. INEXDA–The granular data network. IFC Bulletin No 29, 2019.

International Rights Statements Working Group. Recommendations for Standardized
International Rights Statements, 2015a.

International Rights Statements Working Group. Requirements for the Technical
Infrastructure for Standardized International Rights Statements, 2015b.

The Annodata Framework: A data-centric approach to enhance metadata           Page 15 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C., ... & Usher, A. (2015).
Big data in survey research: AAPOR task force report. Public Opinion Quarterly, 79(4),
839-880.

Korhonen, Janne & Melleri, Ilkka & Hiekkanen, Kari & Helenius, Mika. (2012). Designing
Data Governance: An Organizational Perspective. The GSTF Journal of Computing. 2.
10.5176/2251-3043_2.4.203.

Lane, Julia & Mulvany, Ian & Nathan, Paco (2019) Rich Search and Discovery for
Research Datasets: Building the next generation of scholarly infrastructure, SAGE
publications.

Smith, Ken & Seligman, Len & Rosenthal, Arnon & Kurcz, Chris & Greer, Mary & Macheret,
Catherine & Sexton, Michael & Eckstein, Adric. (2014). "Big Metadata": The Need for
Principled Metadata Management in Big Data Ecosystems. 10.1145/2627770.2627776.

Staunton, C., Slokenberga, S., & Mascalzoni, D. (2019). The GDPR and the research
exemption: considerations on the necessary safeguards for research biobanks. European
Journal of Human Genetics, 1.

Koch, Traugott and Weibel, Stuart L. The Dublin Core Metadata Initiative. D-Lib Magazine,
Volume 6, Number 12. 2000.

Koch, Ute and Akdeniz, Esra and Meichsner, Jana and Hausstein, Brigitte and Harzenetter,
Karoline: da|ra Metadata Schema: Documentation for the Publication and Citation of Social
and Economic Data. Version 4.0. GESIS Papers 2017/25.)

Medeiros, N., Bills, L., Blatchley, J., Pascale, C., & Weir, B. Managing administrative
metadata. Library Resources & Technical Services, 47(1), 28-35. 2011.

Moss, Elizabeth, Christin Cave, and Jared Lyle. 2015. "Sharing and citing research data: A
repository's perspective." West Academic Publishing.

Riley, Jenn. 2017 "Understanding metadata." Washington DC, United States: National
Information Standards Organization
(http://www.niso.org/publications/press/UnderstandingMetadata.pdf): 23.

Rücknagel, J., Vierkant, P., Ulrich, R., Kloska, G., Schnepf, E., Fichtmüller, D., Reuter, E.,
Semrau, A., Kindling, M., Pampel, H., Witt, M., Fritze, F., van de Sandt, S., Klump, J.,
Goebelbecker, H.-J., Skarupianski, M., Bertelmann, R., Schirmbacher, P., Scholze, F.,
Kramer, C., Fuchs, C., Spier, S., Kirchhoff, A. (2015): Metadata Schema for the Description
of Research Data Repositories: version 3.0, 29 p. DOI: http://doi.org/10.2312/re3.008

The Annodata Framework: A data-centric approach to enhance metadata              Page 16 of 17
Deutsche Bundesbank, Research Data and Service Centre
                                                                        14 October 2019

Schönberg, Tobias. Data Access to Micro Data of the Deutsche Bundesbank. Technical
Report 2018-01, Deutsche Bundesbank, Research Data and Service Centre, 2018.

Singson, Jaime (Cambridge, MA), and Xiaohua (Palo Alto, CA) Chen. 2005. “Method and
Apparatus for Facilitating Data Stewardship for Metadata in an ETL and Data Warehouse
System.”
http://search.ebscohost.com/login.aspx?direct=true&db=edspap&AN=edspap.200500440
97&site=eds-live.

Taylor, Sean J. 2013. “Real Scientists Make Their Own Data.” Sean J. Taylor Blog,
January 25. Available at http://bit.ly/15XAq5X

West, Brady, Paradata in survey research. Surv. Pract. 4, 1–8 (2011)

White, Hollie C. 2014. Descriptive Metadata for Scientific Data Repositories: A
Comparison of Information Scientist and Scientist Organizing Behaviors, Journal of
Library Metadata, Vol. 14, Issue 1, pp. 24-51, DOI: 10.1080/19386389.2014.891896

Willis, Craig, Greenberg, Jane and White, Hollie. (2012), Analysis and synthesis of
metadata goals for scientific data. J Am Soc Inf Sci Tec, 63: 1505-1520.
doi:10.1002/asi.22683

Yuan, Ben Z. 2016. Adapting NIEM input for automated privacy policy reasoning. pp. 1-5.
DOI:10.1109/THS.2016.7568953

The Annodata Framework: A data-centric approach to enhance metadata               Page 17 of 17
You can also read