AI Methods for Digital Heritage: An Introduction to the Workshop - Prof. Dr. Günther Görz Department Informatik, AG Digital Humanities, FAU ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
AI Methods for Digital Heritage: An Introduction to the Workshop Prof. Dr. Günther Görz Department Informatik, AG Digital Humanities, FAU Erlangen-Nürnberg
KÜNSTLICHE INTELLIGENZ 4 / 2009 : Focus on Cultural Heritage and AI ● Building knowledge networks from cultural heritage data by federating data bases of memory institutions ● Knowledge transfer and education (Fostering group conversations in the museum café) ● Disclosure of texts through linguistic annotation and analysis with emphasis on their semantic content, also by means of virtual working environments ● Classifying named entities and time specifications in text corpora ● Conceptual modelling for the documentation of architecture ● Besides “close reading” there were first approaches to “distant reading”, but not yet for images / collections G. Görz, FAU, Informatik DH 3
An attempt to explore the potential of AI for research in a humanities discipline from about the same time...
J. Barceló: Computational Intelligence in Archeology (2009) Interdisciplinarity: science and technology AND hermeneutics Imagining an “Automated Archeologist” relying essentially on machine learning ● Discovering the function of tools ● Reconstructing incomplete data ● Understanding what an archaeological site was ● Explaining ancient societies ● General common sense understanding G. Görz, FAU, Informatik DH 5
Current situation ● Mass digitization, indexing, networking, mediation: ● Dramatic increase of digital data corpora, both through retro-digitization and genuine digital generation ● especially also of image collections ● 3D models ● multimedia data ● In progress: Federation of corpora by integrators such as Europeana or PHAROS (photo archives), but still insufficient on the semantic side. ● Still a lot of manual annotation is going on (cf. Amazon Mechanical Turk); for training sets there are severe problems with biases ● Discussion point: Consequent and consistent use of controlled vocabularies / formal ontologies ?? G. Görz, FAU, Informatik DH 6
Current situation ● Great progress has been made in methods of Machine Learning (DeepLearning) ● applications in OCR for handwriting and deciphering of closed books / scrolls ● object recognition ● automatic annotation of works of art ● First approaches to Explainable AI ...which in my opinion can only be achieved by hybrid systems (cf. modelling). ● Discussion point: Methodological problems with unsupervised learning ● what could the (theoretical) framework – at least the terminology – be for explanation?? ● Role of “curated knowledge” in cultural heritage (institutions) ● Labeling vs. Semantics: comes in with a reasoning framework ● “Sense-relational” encoding of meaning in structures ● Resoning to reveal implicit knowledge – beyond previously stored associations in links G. Görz, FAU, Informatik DH 7
Current situation ● Modeling: Continuous development of CIDOC CRM (v. 7 à ISO), a general reference ontology for the cultural heritage sector with extensions for specific purposes ● Recent work is focussing on ontology design patterns (cordh, Linked ART,...) and Linked Open Data ● Problems of vagueness, uncertainty and inconsistency in the sources ● Implementation with Semantic Web techniques opens up a potential for inferencing ● mass data cause significant performance problems G. Görz, FAU, Informatik DH 8
Ontology-Based Knowledge Extraction: Ideal Case (VTM) G. Görz, FAU, Informatik DH 10
Linked Open Data Cloud G. Görz, FAU, Informatik DH 11
Making Data Fit for the Linked Open Data Cloud ● But... for “big data” : data integrity and semantics ● Many resources with (sometimes sligthly) different data models and vocabularies ● ...different spellings, naming conventions, time specifications, multilingualism, etc ● AI methods could help a lot (pattern recognition, parsing, learning, etc.), but actually ● semiautomatic steps (example taken from cordh/PHAROS, International Consortium of Photo Archives) © Minadakis cordh G. Görz, FAU, Informatik DH 12
Research Data, Research Questions and Knowledge Transfer ● Change of research goals and research questions with the availability of big Linked Open (Usable) Data?? ● Up to now: Research questions in humanities primarily solved by “close reading”, i.e. case studies, etc. ● What has changed with the amendment by “distant reading”? ● First of all: degrees of granularity ... but not only ● New research goals and questions? ● Hybrid systems (cf. B. Ludwig, 3/2020) ● Operationalization problem: From high-level questions down to “data” Computational Thinking Foster 2011: How Computation Changes Research (in: Switching Codes) G. Görz, FAU, Informatik DH 13
Transdisciplinarity ● Already in our journal special issue (2009) transdisciplinarity in the true sense (Mittelstrass) had been addressed w.r.t. decisive contributions of AI techniques ● Federation of cultural heritage and science data as a starting point for transdisciplinary research reaching beyond the capabilities of particular disciplines ● Modelling and simulation of complex systems such as medieval cities ● in contrast to interdisciplinary work, the disciplines involved will themselves change through synergisms ● The treatment of difficult questions in a holistic dimension will lead to new problem-oriented ways of knowledge generation, development and transfer G. Görz, FAU, Informatik DH 14
Downright paradigmatic in our context G. Görz, FAU, Informatik DH 15
European Time Machine ● A few decisive steps towards a broad target portfolio, all of which require AI methods: ● Comprehensive digitization of a variety of historical sources requires a series of extraction processes, including document segmentation and “understanding” ● Alignment of named entities ● Simulation of hypothetical spatiotemporal 4D reconstructions ● Data acquisition goes hand in hand with modeling – in particular of events, actors, place, and time – and long term preservation. ● Important contributions of AI: computer vision and pattern recognition, natural language processing, machine learning, knowledge representation and processing, simulation G. Görz, FAU, Informatik DH 16
European Time Machine © TMO 2020 G. Görz, FAU, Informatik DH 17
Core Challenges ● Diverse data ● Uncertain data ● imprecise – imprecision vs. inaccuracy ● incomplete (unknown attribute values) ● ambiguous ● vague ● inconsistent ● Plurality of access methods and audiences ● Strategy for sustainability ● Availability and stability ● Long term: data formats, standards, software, hardware G. Görz, FAU, Informatik DH 18
Cultural Heritage Research Data Ecosystem ? Diverse data: Challenges on the institutional side Quoting Robert Sanderson, CNI Keynote 2020: On the institutional side, in particular with memory institutions (GLAM), there are still problems with the diversity of institutions, cultures and objects ● Libraries: Many non-unique information-carrying objects ● Archives: Many unique information-carrying objects ● Museums: Relatively few unique image-carrying objects ● Conservation (science): Activities to research and preserve (unique) objects © Sanderson G. Görz, FAU, Informatik DH 19
Important Tasks ● Engineering effective and efficient hybrid systems (architectures) capable to deal with big data ● Building hypotheses by finding “interesting” regularities – also by inductive reasoning – and testing them against resilient data ● Unified access to European history as Linked Open Data through the Semantic (“Epistemic”) Web ● Other fields of activity for AI are with the mediation of culture: ● Education has requirements in providing localized and customized data and extremely enhanced levels of detail G. Görz, FAU, Informatik DH 20
Important Tasks ● Need for new smart algorithms for meaningful extraction of information and creation of knowledge from noisy, heterogeneous and complex data at a massive scale ● Cultural tourism as an example: Information is required in different levels of granularity for the preparation of visits, the visits on site, and follow-up processing ● High expectations in providing plausible and reliable explanations (“Explainable AI”) in causal chains ● !! Carefully distinguish causality from correlation ● Coping with incompleteness, ambiguity, vagueness ... and errors will be our steady companion G. Görz, FAU, Informatik DH 21
guenther.goerz@fau.de https://wwwdh.cs.fau.de/ http://erlangen-crm.org/ http://wiss-ki.eu/
You can also read