Communication Analysis through Visual Analytics:
                                                               Current Practices, Challenges, and New Frontiers
                                                 Maximilian T. Fischer, Frederik L. Dennig, Daniel Seebacher, Daniel A. Keim, and Mennatallah El-Assady

                                                                                                         III         Visual Interface

                                                                                                               Properties of the representation
                                                                                                               Features of the interaction

                                                     I                                                                                                     IV
                                                                                                               Particularities of the refinement process
                                                          Input: Data and Information                                                                              Knowledge Generation

                                                            Information about the message and data                                                                 Type of output
                                                            Context and participant information                                                                    Evaluation and factors for verification

                                                            Properties / strategies of the environment                                                             Intended knowledge gain and power
arXiv:2106.14802v1 [cs.HC] 28 Jun 2021

                                                                                                               Processing and Models

                                                                                                               Analytical goals of the approach
                                                                                                               Scope of the models used
                                                                                                               Processing methods and properties

                                                Fig. 1. Four main dimensions of our conceptual framework for communication analysis systems along with their characterization,
                                                which is based on the visual analytics process model by Keim et al. [40] and adapted by us to the communication analysis domain.

                                                Abstract— The automated analysis of digital human communication data often focuses on specific aspects like content or network
                                                structure in isolation, while classical communication research stresses the importance of a holistic analysis approach. This work aims to
                                                formalize digital communication analysis and investigate how classical results can be leveraged as part of visually interactive systems,
                                                which offers new analysis opportunities to allow for less biased, skewed, or incomplete results. For this, we construct a conceptual
                                                framework and design space based on the existing research landscape, technical considerations, and communication research that
                                                describes the properties, capabilities, and composition of such systems through 30 criteria in four analysis dimensions. We make the
                                                case how visual analytics principles are uniquely suited for a more holistic approach by tackling the automation complexity and leverage
                                                domain knowledge, paving the way to generate design guidelines for building such approaches. Our framework provides a common
                                                language and description of communication analysis systems to support existing research, highlights relevant design areas while
                                                promoting and supporting the mutual exchange between researchers. Additionally, our framework identifies existing gaps and highlights
                                                opportunities in research areas that are worth investigating further. With this contribution, we pave the path for the formalization of
                                                digital communication analysis through visual analytics.
                                                Index Terms—Communication analysis, visual analytics, conceptual framework, design space, state-of-the-art.

                                         1   I NTRODUCTION
                                         Human communication has been fundamentally transformed within                             nication (in the following: communication analysis systems). For the
                                         the digital era. While digital forms of communication have slowly in-                     purpose of this paper, we define these systems as largely automated
                                         creased in the second half of the last century, the pace has fundamentally                applications that also employ visual components to interactively ana-
                                         accelerated within the last two decades. Nowadays, digital communica-                     lyze digital human communication. Most commonly, this can be chats,
                                         tion is the norm rather than the exception, offering cost-effectiveness                   e-mail, etc. We do not consider approaches that focus primarily on the
                                         and instant access anywhere, leading to a change in communication                         content itself (e.g., only sentiment analysis on a document collection
                                         behavior [57] – possibly permanently.                                                     alone), but the approach must thereby focus on distributed information
                                            With this transformation to digital communication [70], new research                   exchanges by several participants. Consequently, public exchanges
                                         opportunities have emerged in a wide variety of different domains, rang-                  (e.g., Twitter or forums), even though they are somewhat related, do
                                         ing from engineering to social sciences to business. For example, it                      not fit directly as all information is available to all participants directly
                                         has been studied how visualization can show the evolution of dynamic                      after a post (similar to a single group communication).
                                         communication networks [75], how discourse analysis for digital com-                         For such communication analysis systems, we notice a peculiar
                                         munication can be enhanced [34], or how team performance in business                      oddity: As we highlighted in previous work [18], and summarize in
                                         settings can be evaluated based on communication behavior [20]. For                       this paragraph shortly, the research into such systems is split into two
                                         such analyses, digital analysis methods are often used to aid and support                 fractions: the vast majority of existing approaches focus on either the
                                         the (semi-)manual, domain-specific research methodologies.                                content of communication or on the network aspect in isolation in-
                                            In this paper, we focus on the field of interactive human communi-                     stead of taking into account the fundamental dynamics holistically. In
                                         cation analysis and specifically on automated and interactive commu-                      contrast, this distinction between content and network is not present
                                         nication analysis systems that primarily target written human commu-                      in the seminal works on human communication research [55, 72, 78].
                                                                                                                                   Neither does it occur in more recent textbooks [62] or in current com-
                                                                                                                                   munication research [20, 54]. Even when digital methods are used to
                                          • Maximilian T. Fischer, Frederik L. Dennig, Daniel Seebacher, Daniel A.                 aid the manual analysis, for example, in psychology, often a holistic
                                            Keim, and Mennatallah El-Assady are with the University of Konstanz.                   view is taken. Indeed, the holistic analysis of the network structure,
                                            E-mail: {max.fischer, frederik.dennig, daniel.seebacher, keim,                         the communication patterns as well as the content plays an integral
                                            mennatallah.el-assady}                                                part [68, 78] in this research field. As such, the analysis of the con-
                                         This is a preprint and pre-peer-reviewed version of this article.                         tent or the network/metadata aspect alone can sometimes lead to an
                                                                                                                                   incomplete or even biased view, skewing the analysis results. Using
independent approaches often introduces discontinuities, makes tasks         2 BACKGROUND
more complicated, requires more manual work, and makes the detection         Communication analysis can use a variety of different techniques to
of cross-matches more difficult. However, the development of such            analyze communication behavior. Like communication itself is an
approaches is difficult as human communication data can be very com-         interactive exchange between sender and receiver, its analysis is not
plex, consisting of meta-data, network relations, and content, including     bound by a strict set of instructions. It builds upon the iterative analysis
explicit and implicit connotations, with some meaning only evident           and progressive understanding of the meaning of the communication
when considering everything together and in context. Nevertheless, this      and its context, which makes automation difficult. As such, the area of
comparisons highlights a gap in such communication analysis systems.         semi-automated communication analysis fits very well with the concept
   We have begun to address these shortcoming in previous work [18]          of visual analytics, which is also based on the concept of iterative and
by presenting a prototype for one of the first holistic approaches to        interactive knowledge generation. In the following, we first describe the
communication analysis.                                                      previous work in human sensemaking using visual analytics principles
   However, the research in such systems has not yet received enough         and how it integrates with communication analysis as well as the origins
interest yet, in part due to the fact that the area is not yet accurately    and main concepts of communication analysis itself.
and systematically described. The need for such a formalization has
been recognized [76] in communication sciences, but so far the oppor-        2.1 Visual Analytics and Knowledge Generation
tunities, challenges, and pitfalls have primarily been described from        Visual analytics [40] describes the concept of combining human ana-
a application-oriented perspective in the social sciences. While an          lytical reasoning with computational data analysis through interactive
effort has recently been made to map digital communication systems           visualizations to gain knowledge. This knowledge generation process
as a whole [19], the focuses was largely on content, infrastructure,         has been formalized by Sacha et al. [66] by explicitly describing the
and policy aspects, while leaving out the analysis and technical as-         human cognition as three circular loops that build on each other: (1)
pects. The same is true for the classical communication analysis re-         exploration, (2) verification, and (3) knowledge generation. In the first
search [55, 62, 72, 78]. One can base considerations on these historic       loop (exploration), the user explores the data and the initial results the
works, but on the one hand it involves many aspects with no direct           system presents. In communication analysis, this could, for example,
counterpart in digital communication. For example, facial expressions        encompass tasks such as finding the communication participants most
and gestures are not retained and accessible for the recipient of written    central in a network or find those messages which contain specific
exchanges. On the other hand and on a more fundamental level, digital        semantic concepts. In the second loop (verification), a confirmatory
communication has also transformed the way we communicate and                analysis is conducted to either confirm or reject a hypothesis. Specific
the modalities we use [59] Digital communication has its own partic-         tasks here could include comparison between different results, further
ularities and characteristics, which are, naturally, not discussed in the    inspection, or aggregation. At last, in the third loop (knowledge gener-
seminal works. However, when we instead focus on approaches de-              ation), the user gains new knowledge and insights about the data and
scribing actual systems including the technical aspects, many findings       the analysis. Every step in this iterative process is essential, as it allows
from communication research like the importance of a holistic view           the analysts to form connections between different analysis modalities.
are not heeded. Instead, the vast majority of approaches fall into the       This is especially important, because these different modalities are very
two fractions discussed above, focusing either on concepts from natural      common in communication analysis [39], which makes visual analytics
language processing [53] for content analysis or discussing the field of     especially suited for a holistic, non-specialized analysis.
social network analysis [71] for the network aspects.
   In this work, we want to bridge the gap between communication             2.2 Communication Analysis
research and modern communication analysis system development. As            The origins of communication research can be traced backed to ancient
evident in a few academic works [18, 28, 42, 79] and recent commercial       times, with the study of rhetoric and oratory as well as persuasion in
systems [13, 58, 60], visualization and interactive user steering is a       ancient Greece and Rome. However, the formal study of communica-
promising way [33] to begin to tackle the gap between the two different      tion as a process did not really start until the early 20th century. Of
analysis modalities. So far, the field of communication analysis systems     particular importance in this early phase are the works of Cooley [11]
is not clearly defined from a scientific, technical perspective, lacking a   and Lippmann [48], which provided one of the first formal definitions
common language to describe it. Consequently, so far, it has not been        of communication and established the influence of social aspects in
systematically explored how visual analytics principles are – and could      communication, as well as the work of Simmel [72], studying social
be – employed in communication analysis, how such systems can be             relationships and their influence. These human interrelations were
described and categorized, for example, by their data, methodology, and      further studied by Moreno [55], who is often credited with laying the
information handling, and what a relevant taxonomy would look like.          foundations for social network analysis (SNA). These early works were
This lack of formalization makes developing such systems complicated         then further extended by Bavelas [5] and Leavitt [46], who introduced a
and hinders comparison between approaches.                                   taxonomy for communication patterns in groups. Only a few years later,
   As part of this work, we aim to formalize digital communication           Savage and Deutsch [68] paved the way to analyze communication
analysis by constructing a design space of interactive digital communi-      through computer aided modeling and processing. However, manual
cation analysis systems, making the following contributions:                 analysis and research was still the leading methodology. In the coming
                                                                             years, Watzlawick et al. [78] developed a theory of communication,
                                                                             while other work [69] further described the particularities of human
   • The creation of a conceptual framework of communication anal-           communication. Around this time, the field was further extended by
     ysis systems, based on communication research, technical consid-        Roger [65] and adapted to the modern understanding of communication
     erations, and systematically reviewing the properties of existing       networks. Recent works highlight the main findings [62].
     approaches and the associated tasks.                                       With the advent of digital processing and computational power came
   • A survey and comparison of existing approaches to assess their          the shift from laboursome manual analysis [25] first to digitally sup-
     maturity and coverage, using our framework.                             ported [50] and later to highly automated analysis. All three levels are
   • A discussion on the open challenges and a list of future research       fully justified and cover important areas in their fields. However, it is
     opportunities on communication analysis systems.                        noticeable, while not completely surprising, that the completeness of
                                                                             the analysis developed in the opposite direction of automation level,
                                                                             with solutions being ever more specialized and focusing on just specific
   Our work fills the gap in the formalization of communication analy-       aspects the more they are automated. For example, modern systems
sis systems. With this contribution, we aim to better guide the research     allow us to analyze communication behavior and social ties using cen-
when assessing and developing approaches for communication analysis          trality measures [52] or describe complete artificial networks in social
using visual analytics principles.                                           sciences [6]. Specialized toolkits have been developed to analysis such
network structures, like Pajek [4] or Gephi [3]. All these approaches
primarily focus on the network aspects, omitting most of the meta-                           Automatic    1285
                                                                                                                 Manual        58
                                                                                                                                    Manual   44
                                                                                                                                                   Manual      37
                                                                                              Filtering          Filtering          Coding        Validation
data and especially the content. Other approaches instead focus only
on it, like approaches that use keyword-based searches [81] to filter
communication for certain content, or those which aim to improve the             Fig. 2. The paper collection and coding process. It consists of four
understanding of the meaning of communication using concepts such                main steps: (1) Automated filtering, (2) manual filtering, (3) manual
as sentiment analysis [61] or topic modeling [64].                               coding, and (4) manual validation.
   From a visual perspective, several approaches primarily follow a
node-link-diagram-based approach, like Gephi [3], and many commer-
cial solutions like IBM’s i2 Analyst’s Notebook [38], Pajek [4], Palantir            Venue                               #Coll.     #Filtered     #Coded            #Final
Gotham [60], DataWalk [13], and Nuix Discover and Nuix Investi-
gate [58]. Another class of approaches uses matrix-based approaches                  IEEE TVCG                               790        35           27              23
to analyze the communication relations, for example, MatrixExplorer                  Computer Graphics Fo.                   495        17           11               8
[31] or NodeTrix [32]. Another set of approaches uses timeline designs               Commercial Systems                       -         6            6                6
like CloudLines [43], while others like Fu et al. [24] modify graph pre-             Total                                   1285       58           44              37
sentations through multiple planes. For a detailed discussion, we refer
to previous work [18], where we discuss the existing visual analytics            Table 1. Publications per venue and paper count for collection step.
research landscape for communication analysis systems extensively.
The goals and methods employed in communication analysis are diverse             these approaches would go beyond the scope of this paper, given its
and manifold, as illustrated by the broadness of existing approaches.            size constraints.
In the following, we aim towards tackling the central question of a                 To broaden the perspective, in a second step, we consulted with eight
common description of communication analysis: How can the different              domain experts about approaches used in practice. The experts be-
approaches in communication analysis systems be described within a               long to the field of law enforcement, working for various European law
common, conceptual framework?                                                    enforcement agencies, and each has extensive experience with digital
    Framework Basis — We propose to base such a framework on three               investigations, including communication analysis, working in the field
areas of consideration: (1) Communication research offers decades                from ten to over 30 years. They contributed a collection of six actual
of research on the particularities of (often non-technical) communica-           systems [3, 4, 13, 38, 58, 60] applied in the field (including commercial).
tion analysis. For this work, we integrate concepts primarily from later         Those approaches were included in the Selection Methodology from
summary works [62, 69, 78]. We discuss the concepts and ideas which              Step 2 onward and were not simultaneously discovered during the seed
are based on communication research in detail in the relevant parts              paper selection. However, some approaches [3, 38] can be considered
of Section 4. (2) Technical considerations of possible approaches al-            to be universally known in the community.
low for production-based selection criteria, taking into account design             Selection Methodology — For the actual paper selection method-
properties such as analyzable data types, data representation, and flow,         ology, we follow a four-step approach (see also Figure 2 and Table 1).
or limitations like scalability from a technical standpoint. These are           First, we conducted a keyword-based seed search for the words commu-
closely linked to the last category. (3) The existing research landscape         nication and analysis on the titles, abstract, index terms, and contents
of interactive communication analysis systems provides a foundation              of publications in each of the venues described above. Secondly, we
for the classification of approaches based on measures such as analyt-           went through all these papers’ titles and abstracts manually, discarding
ical goals, visualization and interaction methods, or the power of the           those which clearly are not concerned with communication analysis
knowledge generation process. While the first two considerations are             systems, reducing the selection significantly. In this step, we included
references directly when describing the framework in Section 4, the              approaches suggested by the domain experts. Third, we manually
third requires a thorough preparation:                                           looked at the remaining papers and decided if they indeed describe
    Seed Papers for Existing Research Landscape — To analyze                     a communication analysis system. In the final step, we validated the
state-of-the-art interactive communication analysis systems and con-             results by checking borderline cases, consequently removing seven
tribute a first set of classification criteria, we conducted a seed literature   papers. Or final collection includes 37 approaches.
survey to get an initial idea of a large set of existing approaches.
    First, we collected papers using a keyword-based search, and as we           4     C ONCEPTUAL F RAMEWORK
are considering only approaches from visual data analysis, restrict this
search to the following high-quality journals and conferences:                   In the following section, we aim to construct a possible conceptual
                                                                                 framework that encompasses discerning aspects of communication
   • IEEE Transactions of Visualization and Computer Graphics                    analysis systems. As with any taxonomy, the development took dozens
     (TVCG, including IEEE VIS proceedings)                                      of iterations, and we cannot cover its complete evolution here. Instead,
   • Computer Graphics Forum (including EuroVis and EuroVA)                      we highlight the main considerations and, for each property, give a
                                                                                 justification. For the complete conceptual framework, see Figure 3). In
We consider approaches published in the last 15 years (from 2006 to              the following, we focus on the argumentation and reasoning, categoriz-
2021) to focus on recent developments for all venues. Note that we               ing the existing approaches just in an exemplary manner. For the full
intentionally restricted our search space to these venues, considering           classification of the selected approaches from above, consult Table 2.
only well known information visualization publications. The goal of
this paper is not to construct a complete survey of approaches – we aim             Main Considerations When designing a conceptual framework,
for a large and diverse enough sample to make justifiable conclusions            the structuring methodology is of central importance. One standard
(for a more detailed discussion and a possible future survey, see also           methodology is to use a task-based grouping. However, when analyzing
Section 5). Indeed, we are well aware that other valid approaches ex-            the different approaches, we realized that sometimes very different
ists, being published in other venues. For example, we did not include           methods are employed for the same task. For example, to discover key
papers from human–computer interaction issues (e.g., CHI [22, 83]).              persons within a communication network, one can either use an SNA-
Also, the survey on text visualization by Kucher et al. [44], especially         based approach using node-link visualizations and centrality measures
the group discourse analysis, provides some further approaches that              or instead leverage a geometric deep learning model that is interactively
have been published in IUI [36, 37] or TiiS [21]. Further, other venues          visualized using a matrix-based method. These methods have very
like PacificVis [24, 83] or domain specific journals like Digital Inves-         different side effects, visualization, and interaction techniques and
tigation [28] also contributed valid approaches. The inclusion of all            make very different assumptions and requirements on the data.
The primary goal is to design a conceptual framework of commu-             sentiment analysis) while judging support for explicit (e.g., keyword-
nication analysis in visual analytics from a communication theoretical        based search like [38]) and also implicit (e.g., named entity recognition
standpoint, but one which can be easily adapted and applied to such ap-       like [15]) content. Most approaches consider the explicit level, and
proaches. Therefore, we base our framework on the established visual          several, especially text-based ones, also consider the implicit level.
analytics process model by Keim et al. [40] to cover the main areas.
We slightly modify the terminology, proposing four main dimensions,           4.1.2   Communication Participants
which are characterized further in Figure 1: (I) Input: Data and Infor-       A second, very important aspect in communication research, is the
mation (4.1), (II) Processing and Models (4.2), (III) Visual Interface        relationship of the communication participants. It is usually analyzed
(4.3), and (IV) Knowledge Generation (4.4).                                   both from the perspective of scope or by their power relationship.
   In our work, we develop these dimensions specifically for commu-              Target Scope — The target scope can be oneself (e.g., in a note to
nication analysis. To achieve this, we include properties relevant in         oneself), a single person, or a group of persons. The analysis of the
communication research as well as discerning technical properties,            existing approaches shows that most systems focus on a single target.
based on the existing research landscape. In the following, we discuss           Power Relationship — It comes again from classical communica-
each dimension individually. As a note of caution, we stress that many        tion research [78], which proposes a distinction into symmetrical (equal
properties we present in the following are themselves multifaceted and,       grounds) or complementary (dependence). The relationship context is
for a detailed analysis, benefit from a more nuanced discussion in more       rarely taken into account explicitly in existing research (e.g., [8]).
detail than we can include within the limited space available here.
                                                                              4.1.3   Environment
4.1     Input: Data and Information                                           The context of communication is very important [78], because it can
This category focuses on information, context and environment of the          influence the exchange itself, such that message properties have to be
communication. In contrast to the process model by Keim et al. [40],          interpreted differently.
we specifically understand this category to focus on the information             Collection Type — The collection type defines how the data was
theoretical aspects and the collection process itself, not so much on any     acquired and how it is leveraged. We propose to categorize it into
preprocessing steps. As such, parts of the structuring are partly based       a quadrant, consisting of the targeting methodology on the one side
on classical communication research [62, 69, 78], which has extensively       and the anonymity level on the other side (see the relevant part in
discussed the content and meaning, context, and relationship aspects          Figure 3). The former can be either targeted (specific communica-
of communication. Building on this, we propose three main subcat-             tion from a restricted set of users due to a trigger event) or untargeted
egories: message, communication participant, and environment. We              (unfound bulk collection). The latter can either be high (anonymized
complement these aspects with properties relevant for digital systems,        or pseudo-anonymized) or low (identifiable). Regarding the analy-
like the data type and collection properties.                                 sis, this can make a significant difference. For example, the targeted
                                                                              analysis of identifiable communication participants can focus on the
4.1.1    Message                                                              actual exchange and leverage context and relationship information.
                                                                              The untargeted analysis of pseudo-anonymized communication instead
Communication can be regarded as the exchange of symbols. In com-             often results in a search for the needle in the hay-stack and can not al-
puter science, the term message is more common. The main considera-           ways leverage background information. This results in vastly different
tions are which meaning it contains and how it is transported:                application development, scalability, and inference capabilities.
   Data Type — The data type refers to the content type of the data              Analysis Coupling — Closely related is the analysis coupling,
from a technical point of view. When looking at data classification in        which can influence the communication coding due to privacy con-
information visualization [10], we can identify several data types which      siderations. Again, we categorize it into a quadrant, this time between
are relevant for communication analysis: text, audio, image/video,            (expected) knowledge about the analysis, which can be either known or
network, and time-series. Based on the usage in current approaches            unknown to be true. Note that the inverse, known to be false, is only
(see Section 3), the two most relevant ones are text data (e.g., [12])        achievable with a specific degree of certainty. The coupling refers to
and network (structure) data (e.g., [3]), encompassing the content and        the effect that the (expected) analysis can influence the measurement.
the communication network, respectively. However, the other data              If participants expect their communication to be analyzed, they might
types are also relevant. For example, communication can happen via            communicate differently, being less honest, implicit, or communicate
audio (e.g., telephone or VoIP) or via video chats, comprising audio          not at all. This effect also depends on the targeting: If it is known that
and moving images, i.e., video data. While our framework focuses              all e-mails in the world are collected and analyzed (untargeted), they
primarily on this written (i.e., text) communication, it is agnostic to the   might have less influence than knowing that their e-mails are monitored
data type of information. For example, it would be possible to include        for sure (targeted), as this can influence the degree of scrutiny and so-
the detection of facial expressions using deep learning [56] to analyze       phistication of the analysis employed. For example, diplomatic cables
the analogical code and set it in context. The last data type is time-        are often very frank, mainly because they are not expected to be leaked
series data (e.g., [43]), which is often counted to the meta-data and, for    (assume untargeted, identifiable), while public policy discussions are
example, can contain information about communication regularity.              much more considerate (targeted, identifiable).
   Code/Intent — The meaning is the central aspect of the commu-                 Confidentiality — The previous point relates to the confidentiality
nication, as evident by the plethora of approaches that aim to learn it.      of the communication channel, which can influence the communication
Based on Watzlawick et al. [78], the communication can be regarded            coding, and therefore should be taken into account.
as code, either digital or analogical (sic!). In a simplified expression,        Strategy — A technical consideration which influences the analysis
the digital code refers to the actual meaning of the transmitted infor-       possibilities is the collection strategy. Is the data leveraged just a one-
mation in a symbolic system, while the analogical code refers to how          off dataset that contains past information only (static S ) like [4], or
something is communicated, including cues, and context information.           does the data collection reach the present, possibly is actively running,
While the former is, by definition, present in virtually all approaches       and the information increases over time (dynamic D ) like [60]. They
we compared (e.g., [31, 42]), analogical analysis is rare (e.g., [35]).       differ as the extracted knowledge might (significantly) change over
   Expressiveness — Similar, but conceptually slightly different, is the      time with new information available, affecting the analytical process.
expressiveness, which describes the nature of the information extracted          Continuity — Along with the previous consideration, one can con-
from the content. It can be explicit factual information that is evident      sider the technical realization of a dynamic data collection. The options
directly from the content or information implicitly contained and must        are either a one time only O approach, where a fixed dataset is loaded
be inferred, for example from the semantic. Code and expressiveness           once (e.g., [23]), a batch B approach that supports an extension with
together allow to classify systems by their capability to leverage both       a new data (e.g., [38]) or, a streaming S approach for a continuous
digital (e.g., actual content information) and also analogical codes (e.g.,   integration (e.g., [8]). However, most approaches only support O .
Input: Data and Information                                      Processing and Models                                                Visual Interface                             Knowledge Generation
                                             → Section 4.1                                                            → Section 4.2                                    → Section 4.3                                          → Section 4.4

     Message                                                         Analytical Goal                                                  Representation                                    Output

     Properties                                                      Supported methodologies                                          Properties                                        Direct Outcome

                  Data Type                                                    Representation                                                       Method                                          Explanation

                          Text                                                                                                                             Node-link based                                          Numerical
                          Audio                                                                                                                            Timeline-based                                           Textual

                          Image/Video                                                                                                                      Matrix-based                                             Graphical
                          Network                                                                                                                          Other                                                    None

                          Time-series                                                                                                                      Multiple
                                                                                                                                                                                                    Transition Function
                  Code / Intent                                                                                                                     Pane
                                                                     Model Scope
                                                                                                                                                                                                                    Machine Model
                          Digital                                    Aspects                                                                               2D                                                       Mental Model
                          Analogical                                                                                                                       3D                                                       None
                                                                               Modality                                                                    VR
                                                                                                                                                                                        Verification Approach
                          Explicit (Factual)                                                Meta-Data                                 Interaction
                          Implicit (Semantic)                                               Network                                   Model-View Relationship
     Comm. Participants                                                        Collection                                                           Operation Method
     Target and relationship                                                                                                                                                                                        Confidence
                                                                                             Single-Environment                                            Select
                                                                                             Multi-Environment                                             Explore
                  Target Scope                                                                                                                             Reconfigure
                                                                                                                                                           Encode                                    Evaluation
                          Self (note)                                          Data-Mapping
                                                                                                                                                           Abstract / Elaborate
                                                                                                                                                           Filter                                                   Case Study
                                                                                                                                                           Connect                                                  Comparison
                  Power Relationship                                                                                                                                                                                Qualitative
                                                                     Methods and Properties                                                                                                                         Quantitative
                          Symmetrical                                                                                                               Manipulation
                                                                                                                                                           Direct                       Knowledge Gain
                                                                                Group Communic.
                                                                                                                                                           (e.g., data objects,
     Environment                                                                                                                                                  visual objects)       Inferred Knowlegde






     Collection properties



                                                                                                            Target                                         (e.g., parameters)
                                                                               Individual                                                                                                           Time Dimensionality

                  Collection Type
                                                     en ized






                                                                               Nested G.





                                      Targeted                                                                                        Model-View Relationship                                                             Prediciton

                                                                               Latency                                                                                                                Present

                  Analysis Coupling                                                                                                                 Goal                                               Future             X     X      X





                                                                                            Live                                                           Model Tuning

                                                                                            Delayed                                                        Data Tuning                              Predictive Power






                                                                                            Data Ingress
                                                                                            Search / Analysis                                              Active Learning
                                                                                                                                                           Progressive                              Domain-spec. Asp.
                          Static (Past Only)
                          Dynamic (Current/Future)


                          One-time Only

Fig. 3. Our conceptual framework of communication analysis systems. It consists of the four main dimensions Input: Data and Information,
Processing and Models, Visual Interface, and Knowledge Generation, described in more detail in Section 3 and also in Figure 1. Each category
contains several properties and sub-properties, which allow for systematical analysis of such systems. Note that the graphic highlights some of the
most relevant aspects for individual properties, which can be used for a simplified comparison. However, the properties themselves are multifaceted
and, for a detailed analysis, should be discussed more nuanced and in more detail than indicated by these examples.
Table 2. Communication analysis system classification summarizing the different properties of the approaches. The classification criteria are
formed on a subset based on the conceptual framework we developed in Section 4 and which is shown in Figure 3. The selected approaches and
most categories are also available on a dedicated website at (for details, see Section 5.2).

Generic Properties: Approach supports a specific property or not #, or support is only very basic or limited G
                                                                                                             # (e.g., show as associated entry without analysis).
† : Indicates approaches which are commercial systems or widely used in the industry.
The following encodings have special symbols:
Strategy: The approach supports a static S or a dynamic D environment.
Continuity: The approach supports either a one-time only data loading O , iterative batches of data B or a data stream S .
Collection: The approach supports either the analysis of single environments S or of multiple environment M simultaneously.
Group Communication: The approach supports different types of group communication analysis, as encoded in a 3x3 matrix. The rows (source) and columns
(target) specify individual, group, and nested groups, respectively. For example,           encodes that the approach supports only the analysis of communication
between individuals and, additionally, of individual to groups. Groups as source are not supported and nested groups not at all.
Latency: The approach supports either the analysis of in-person communication I , the live L analysis of communication or a delayed D analysis a posteriori.
Scalability: The approach supports analysis of tens (I), thousands (II), or millions (IIII) of cases for analysis.
Evaluation: Case study or example          , comparison with other techniques      , and expert study    .
Time Dimensionality: The approach supports different time dimensionalities, as encoded in a 2x3 matrix. The basis of the knowledge is encoded in the rows
(either past or present), and the prediction columns specify if knowledge about the past, present, or future is inferred. For example,       encodes that the approach
only uses past knowledge to learn about the past, e.g., to find a historic pattern. The encoding          instead indicates the approach uses past and present data to
predict future developments.
Predictive Power: The approach’s predictive power encodes in a 2x3 matrix the type of output (explanation or transition, as rows) in relation to the time
dimensionalities past, present, and future, as columns. For example,          encodes a explanation about the past, while         can also give explanations about the
future. Instead, the encoding         indicates that the approach supports both a explanation but also a transition function to explain past as well as classify present
and predict future developments.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Predictive Power
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Transition Func.
                                                                                                                                            Analysis Coup.


                                                                                                                                                                                                                                                                                                                         Group Comm.

                                                                                                                                                                                                                                                                                                                                                                                                                          Operation M.

                                                                                          Code / Intent

                                                                                                                                            Power Rel.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Time Dim.

                                                          Data Type




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Active Learning

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Machine Model

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Model Tuning

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Mental Model


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Data Tuning





























                 Pajek†             [4] # # #                                   #                 #                             # # # S                                      O                                                               # #
                                                                                                                                                                                                                                               G                                 # S                      #                            D           IIII II                                   #                            #
                                                                                                                                                                                                                                                                                                                                                                                                                          G                                                                  #                     #                                           #             #                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                 Gephi†             [3] # # #                                   #                 #                             # # # S                                      B                                                               # #
                                                                                                                                                                                                                                               G                                 # M                      #
                                                                                                                                                                                                                                                                                                          G                            D           IIII II                            #
                                                                                                                                                                                                                                                                                                                                                                                      G #                                 #
                                                                                                                                                                                                                                                                                                                                                                                                                          G                                                                  #                     #                                           #             #                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                 SaNDVis            [63]               # #                      #                 #                             # # # S                                      O                                                               # #                                           S              #                            D           IIII II                            # #                                 #                                                                  #                     #                               # #                       #                                     # # #
                 Ghani et al.       [26]               # #                      #                 #                             # # # S                                      B                                                               #                                   #
                                                                                                                                                                                                                                                                                 G S                      #                            D           II II                              # #                                 #                                                                  #                     #                               # #                       #                                     # # #

                 Elzen et al.       [77] #
                                         G # #                                  #                 #                             # # # S                                      B                                                               #                                   #
                                                                                                                                                                                                                                                                                 G S                      #                            D           II II                              # #                                 #                                                                  #                     #                                           #             #                                     # # #
                 NEREx              [15]               # # # #                                    #                                         # # S                            O                                                               #                        #                    S              #                            D           IIII II                            # #                                                                                                    #                                                     #                         #                                     # # #
                 i2 Analyst’s NB† [38]                 #
                                                       G #
                                                         G                      #
                                                                                G                 #                             # # # S                                      B                                                                                                             M                                           D           IIII I                             # #                                                                                                    #                                                                                                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       G                       -
                 Palantir Gotham† [60]                 #
                                                       G #
                                                         G                      #
                                                                                G                 #                             # # # D                                      S                                                                                                             M                                           L D         IIII I                             # #                                                                                                    #                                                                                                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       G                       -
                 DataWalk†        [13]                 #
                                                       G #
                                                         G                      #
                                                                                G                 #                             # # # S                                      B                                                                                                             M                                           D           IIII I                             # #                                                                                                    #                                                                               #                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       G                       -
                 Nuix D./I.†        [58]               #
                                                       G #
                                                         G                      #
                                                                                G                 #                             # # # D                                      B                                                                                                             M                                           L D         IIII II                            # #                                                                                                                                                                                                                          #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     G #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       G                       -

                 Reccurence Plots   [1]                # # # #                                    #
                                                                                                  G                             # # # S                                      O                                                               # # #                                         S              #                            D           II II                              # #                                 # #                                          #                     # # # #                                               #                         #                                     # # #

                 GestaltMatrix      [7] # # #                                   #                 #                             # # # S                                      O                                                # # #                                              # S                      #                            D           II                   I             # # #                 # # # # # # # # # #                                                                                                                 # # #                        #                                     # # #
                 SmallMultiPiles    [2] # # #                                   #                 #                             # # # S                                      O                                                               # #                                 # S                      #                            D           II II                              # #                                 #                                                                  #                     #                               # #                       #                                     # # #
                 HyperMatrix        [17]               # #                      #                 #                             # #
                                                                                                                                  G # S                                      O                                                                            #
                                                                                                                                                                                                                                                          G                                S              #                            D           II                   I             # #                                 #
                                                                                                                                                                                                                                                                                                                                                                                                                          G                                                                                                                                    #                                                                          #

                 TextFlow           [12]               # # #                                      #
                                                                                                  G                             # # # S                                      O                                                               #                        #                    S              #                            D           II II                              # #                                                                              #                     #                     #                               #                         #                                     # # #
                 CloudLines         [43]               # # #                                      #                             # # # S                                      O                                                               #                        # #
                                                                                                                                                                                                                                                                        G S                               #                            D           IIII I                             # #                   # #                                                        #                     #                     #                            # # #                        #                                     # # #
                 DecisionFlow       [27] #
                                         G # # #                                                  #                             # # # S                                      O                                                               #                        # #
                                                                                                                                                                                                                                                                        G S                               #                            D           II II                              # #                                 #                                                                  #                     #                                           #             #                                     # # #
                 OpinionFlow        [79]               # # # #
                                                             G                                    #
                                                                                                  G                                         # # S                            O                                                                            #
                                                                                                                                                                                                                                                          G #                              S              #                            D           IIII I                             # #                   # #                                                                              #                     #                               #                                                               # # #

                 Han et al.         [30] # # # #                                                  #
                                                                                                  G #
                                                                                                    G                                       # # S                            O                                                                            #
                                                                                                                                                                                                                                                          G # #
                                                                                                                                                                                                                                                              G S                                         #                            D           IIII II                            # #                                 #                                            #                     #                     #                               # #                       #                                     # # #
                 Liu et al.         [49]               # # #                                      #                             # # # S                                      O                                                               #                        #                    S              #                            D           II                   I             # #                   # #                                                        #                     #                     #                               # #                       #                                     # # #
                 ThreadReconst.     [16]               # # #                                      #                                         # # S                            O                                                               #                        #                    S              #                            D           IIII II                            # #                                                                                                                                                          #                                                               #                      #
                 netflower          [74]               # #                                        #                             # # # S                                      O                                                               #                        # #
                                                                                                                                                                                                                                                                        G S                               #                            D           II                   I             # #                                 #                                                                  #                     #                                                         #                                     # # #
                 WeSeer             [47]               # #
                                                         G #
                                                           G                                      #
                                                                                                  G                             # # # S                                      O                                                                                        #
                                                                                                                                                                                                                                                                      G                    S              #                            D           IIII II                            # #                                                                                                    #                     #                                                         #                                     # # #

                 MatrixExplorer     [31]               # #                      #                 #                             # # # S                                      O                                                               #                                   # S                      #                            D           II                   I             # #                                                                                                    #                     #                                           #             #                                                  # #
                 NodeTrix           [32]               # #                      #                 #                             # # # S                                      O                                                               #                                   # S                      #                            D           II                   I             # #                                                                                                    #                     #                                           #             #                                                  # #
                 Hadlak et al.      [29] # # #                                                    #                             # # # S                                      O                                                               #                                   # S                      #                            D           II                   I             # #                                                                              #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                       G                     #                     #                                           #             #                                     # # #

                 ConVis             [35]               # #                                                                                             # S                   O                                                               #                                             S              #                            D           II II                              # #                                                                                                    #                     #                                                         #                                     # # #
                 TargetVue          [9]                # #
                                                         G                      #
                                                                                G                 #
                                                                                                  G                                         # # S                            O                                                               #                                             S              #                            D           II II                              # #                                                                                                    #                     #                                                         #                                     # # #
                 MediaDiscourse     [51]               # # #                                      #                             # # # S                                      O                                                               #                        #                    S              #                            D           II                   I             # #                                                                                                    #                     #                                                         #                                     # # #
                 VisOHC             [45]               # # #                                      #
                                                                                                  G                                         # # S                            O                                                               #                        #                    S              #                            D           II                   I             # #                                                                                                                          #                                                         #                                                  # #
                 iForum             [23]               # #                      #
                                                                                G                 #
                                                                                                  G                                         # # S                            O                                                               #                                             S              #                            D           II II                              # #                                                                                                    #                     #                                                         #                                     # # #
                 VASSL              [41]               # #
                                                         G #
                                                           G                                      #                             # # # S                                      S                                                               #                        #                    S              #                            D           II                   I             # #                                                                                                    #                     #                                           #             #                                     #
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   G # #
                 CommAID            [18]               # #
                                                         G                                        #                                         #
                                                                                                                                            G #
                                                                                                                                              G S                            O                                                               #                                             M              #
                                                                                                                                                                                                                                                                                                          G                            D           II                   I             # #                                                                                                                                                                      #                                                                          #

                 Zhao et al.        [82]               # # # #
                                                             G                                    #
                                                                                                  G                             # # # S                                      O                                                               #                        #                    S              #                            D           II                   I             # #                                                                                                    #                     #                               #                         #                                                            #
                 Whisper            [8]                # #
                                                         G                                        #
                                                                                                  G                             #                      # S                   O                                                               #                                             S              #                            D           IIII I                             # #                                 #                                                                  #                     #                                                         #                                     # # #

                 ConToVi            [14]               # # #
                                                           G #                                    #                                                    # S                   O                                                               # # #                                         S              #                            D           IIII II                            # #                                 #                                                                  #                     #                               #                         #                                     # # #
                 Beagle             [42]               # #                      #
                                                                                G                 #                                         # # S                            O                                                               #                                             S              #                            D           IIII II                            # #                   # #                                                                              #                     #                                                         #                                     # # #
4.2 Processing and Models                                                      4.3     Visual Interface
This category focuses on the particularities of the processing and the         Here, we describe the visual interface on an abstract level.
models, mainly from a technical perspective in visual interactive data
analysis, in the three subcategories, namely the analytical goal (why?),       4.3.1    Representation
the model scope (what?) and the processing (how?).                             The central aspect of visualization systems are its visualizations.
                                                                                  Method — We follow the established nomenclature of visualization
4.2.1 Analytical Goal
                                                                               techniques for the visualization method. However, we only chose
One essential distinguishing criteria between approaches are the sup-          the three most common ones, which account for the vast majority of
ported methodologies to achieve the analytical goal. We align this             approaches, comping up with the following selection: node-link-based
classification on the standard definition of analytical tasks in visual ana-   (e.g., [4]), timeline-based (e.g., [27]), and matrix-based (e.g., [17]).
lytics [67]. This includes the category representation for a fixed analysis    Alternatively, there is an option for other (e.g., [14]) techniques and
task, confirmatory analysis for a directed search and exploratory analy-       multiple (e.g., [23]) for multi-paradigm approaches.
sis for a undirected search (supported by most approaches). Another               Pane — The different visualization methods can be employed in
goal in communication analysis involves predictive analysis (e.g., [79]),      different visualization panes. We consider the three major ones, namely
which could also be regarded as a sub-goal of all methodologies.               2D, 3D, and VR. For example, a communication network can be visual-
                                                                               ized as a node-link diagram in 2D or in 3D, or in a virtual environment,
4.2.2 Model Scope                                                              and each choice may influence the interaction concepts.
Here, we discuss generic capabilities of the analysis models.
   Modality — The analysis modality refers to the employed concepts            4.3.2    Interaction
and categorizes into the three common aspects: meta-data (e.g., [43]),         The interaction methods are of central importance in visual analytics.
network (e.g., [3]), and content (e.g., [16]), as used in many technical          Operation Method — We classify the approaches based on
descriptions of communication analysis.                                        their interaction method according to the classification developed
   Collection — The different approaches provide various degrees               by Yi et al. [80], namely Select, Explore, Reconfigure, Encode, Ab-
of integration level regarding multiple collection environments. For           stract/Elaborate, Filter, and Connect. In visual analytics, some are
example, many systems can only consider a single dataset (e.g., a set of       extremely common, while others like encode depend on the capabilities.
e-mails) and therefore support only single-environment S (e.g., [26]),            Manipulation — The manipulation of the elements can be either
while a few also support the integration of multiple data sources, which       direct, for example when interacting with data or visual objects. Alterna-
makes them multi-environment M systems (e.g., [38]).                           tively, it can be indirect, for example when only modifying parameters.
   Data-Mapping — While most systems require a fixed data format,              Most approaches support both manipulation methods.
some also support a flexible import system which allows to map prop-
erties and, in case they support multiple environments, to correlate           4.3.3    Refinement
different aspects, for example e-mail sender and usernames.
                                                                               In addition to the interaction concept, other discerning factors are the
4.2.3 Processing                                                               particularities of the refinement. For these concepts, we also align with
                                                                               standard visual analytics concepts.
Due to the breadth of different methods, we focus on generic aspects,             Goal — Two primary goals can be differentiated: is the goal to tune
namely the support for group communication analysis, and considera-            an underlying model (e.g., [17]) or the data (e.g., [49])?
tions for latency and scalability.
                                                                                  Strategy — Depending on the approach, the strategy can be very
   Group Communication — One rather interesting aspect we dis-
                                                                               different: does it follow an active learning approach (e.g., [17]), or
covered during the analysis of the approaches is the varying support           is the process either iterative (e.g., [43]) or progressive (e.g., [14])
of group communications. Practically all approaches support an indi-
vidual to individual analysis. We propose to structure the approaches
based on their support to analyze the communication between a source           4.4     Knowledge Generation
and a target in a 3x3 matrix, e.g., individual to individual is encoded
as      (e.g., [9]). These entities can be either individuals, groups, or      The last category is concerned with the particularities of the knowledge
groups within groups. The difference between the two latter is subtle          that can be generated and learned. We propose the three subcategories
but involves the support of boundary restrictions on sub-groups instead        output, verification approach, and knowledge gain.
of overlapping group assignments. One might expect the matrix to be
                                                                               4.4.1    Output
symmetric, but this might not be always the case.
   Latency — Research [62] indicates that latency in the communica-            Based on the classification of Spinner et al. [73], we propose two
tion can significantly affect it. The two primary options are (nearly)         distinct categories for the learned knowledge type: An explanation
instantaneous communication, like in an active live L chat (e.g., [60])        can consist, for example, of numerical (e.g., [3]), textual (e.g., [74]),
or delayed D communication (e.g., [45]), such as e-mail or as a doc-           or graphical representations (e.g., [7]). It represents knowledge but
ument. Note that the choice is not purely categorical, and from a              in a factual representation that is not easily transferable and can be
technical perspective, this could also be considered a numerical prop-         regarded as a (final) result of the existing data. In contrast to this,
erty. Differentiation into these two groups [62] is often enough for           another type of result can be a transition function, which is closer to an
most differences in reaction and behavior, although the latency can, of        actual model, one example being the analysts mental model. Another
course, play a role (e.g., answering under time pressure). In consider-        type is a machine model (e.g., [79], for example, a trained classifier
ation of analogical codes, however, it makes sense to include a third,         that encapsulates learned knowledge and can be applied to new data.
orthogonal category: In-person I communication (a subgroup of live
communication) can exhibit markedly different results compared to,             4.4.2    Verification Approach
for example, live chat conversations.                                          The presentation of knowledge raises the question of its verification.
   Scalability — The scalability can be defined on two levels: First,             Factors — Visual analytics is very well suited to address important
on the data-ingress level, which defines the amount of input data the          factors like confidence, trust and privacy. For example, probability
system can import, analyze, and visualize initially. The second, more          scores could be used to estimate the results’ confidence stemming from
subtle aspect is the scalability on the search and analysis side, for          automatic processes (e.g., [17]) and visualize it to the expert. Other
example, during exploratory analysis. For example, how many results            examples include analysis logfiles, integrity protection, traceability, and
can be shown simultaneously? For both aspects, we roughly categorize           verify-ability, as well as a provenance history. Lack of such certainty
them into few (less than ten, I), medium (order of hundredths-thousands,       measures might exclude systems from sensitive areas. While essential,
II), and huge (more than 10k, IIII).                                           many approaches do not consider this as much as would be desirable.
Evaluation — To evaluate approaches, the paper authors own eval-            when network data is included, the visualization is often node-link-
uation can be considered. It can be described as an in-situ analysis of        based or multi-paradigm, while for time-series, it is either timeline or
the approach itself. Several options are possible: Either a convincing         multi-paradigm. Given the scope of the survey, the lack of audio and
example      or a case study (e.g., [32]). A second option would be to         image is not surprising. Given that all the approaches belong to the
conduct a comparison       with similar, existing approaches (e.g., [17]).     category of visual analytics, it is also unsurprising that virtually all
A third option would be a qualitative interview          (e.g., [18]). A       support representative, confirmatory and exploratory analysis as their
fourth option would be a quantitative user study      .                        analysis methodology, their operation methods consist of many of
                                                                               the possible options, and their explanation is at least always graphical.
4.4.3    Knowledge Gain                                                           More interesting, however, are the differences and the research
As a final step in the learning process, the question arises which knowl-      opportunities we can conclude from their discrepancies, which we
edge is actually gained and how powerful the process is.                       highlight in the following for each category (see Figure 1). We highlight
    Time Dimensionality — The time dimensionality describes the                the, in our consideration, top five opportunities with an star: .
relationship between data and knowledge generation. A 2x3 matrix
shows the possible combinations of data basis and prediction type, each               I.1 Analysis of the Meaning and Analogical Code
with the entries past, present, and future. For example        (e.g., [29]),      Only a subset of approaches analyzes the implicit meaning of the
a system can use past data and predict past data, for example for a               communication (e.g., [15,23,35]). However, almost none analyze
search. Another example           (e.g., [47]) would be a article analysis        the analogical code of the communication, which can offer im-
and prediction system which has been trained on past data to analyze a            portant cues and influence the communication significantly, and
text, either an existing one or one on the fly in the present and future.         is an important research gap.
Yet another example        for a future prediction would be a model that
forecasts communication activity based on past events. Note that the              I.2 Include Power Relations
future cannot be the basis due to causality.                                      Again, almost no approach considers the power relations between
    Predictive Power — A second important consideration describes                 the participants, which can similarly influence the communication
the predictive power of the knowledge generated, which is represented             semantics, meaning, and modalities.
as a 2x3 matrix, where the result (explanation or transition function)
and the time are combined. For example            , can a system explain          I.3 Dynamic Analysis
(i.e. show) past events (virtually all systems)? Or is it able           to       While some might consider this a production problem, the devel-
provide factual information for future events (e.g., [47], internal model         opment of systems that support the dynamic analysis of commu-
is inaccessible). A more powerful example           is a model that can be        nication data and batch/stream approaches also sets considerable
controlled and which can explain and predict (e.g., [79]).                        hurdles to established analysis and visualization methods, which
    Domain-specific Aspects — Last, but not least, depending on dif-              makes it an interesting academic research problem.
ferent analysis tasks, more specific options might be of interest. Listing
these here would be out of scope and space, as discussed above. How-
                                                                                  I.4 Research the Analysis Coupling
ever, we could imagine this as future work (see Section 5).
                                                                                  There is practically no research on how a system can mitigate a
                                                                                  possible analysis coupling.
This section discusses the main findings and lessons learned before               II.1 Multi-Environment Inclusion
reflecting on the difficulties in creating a conceptual framework for             Many approaches can only handle a single data source (environ-
communication analysis. In particular, we discuss the potential impli-            ment) and offer no data mapping (like, e.g., [38]). Instead, these
cations and opportunities for future research while highlighting the still        steps are outsourced to preprocessing. Their inclusion could make
existing shortcomings in the current formalization.                               leveraging multiple data sources simultaneously less complicated.
   We have defined four main dimensions containing 30 different prop-
erties that provide a conceptual framework to describe interactive com-
                                                                                      II.2 Analyze Group Communication
munication analysis systems which employ visual analytics principles.
                                                                                  Only a few approaches support the analysis of group communica-
   We imagine several applications for this framework: Its primary                tion (e.g., [14]), and almost none support nested groups. However,
goal is to structure the research field and provide a common language             much communication actually happens inside or between groups,
for the community. Simultaneously, it can aid researchers in identifying          which can involve specific particularities.
research gaps, thereby supporting the transfer of theoretical concepts
into practical systems. Further, it allows practitioners to chose the
best fitting approach depending on their needs or checking out similar            III.3 Explore Visualization Panes
approaches, while enabling developers to judge the domain cover and               Practically all approaches use the 2D pane for visualizations. It
maturity of their systems. Lastly, it can form the foundation for a more          might be worthwhile to explore 3D and VR concepts. Especially,
detailed survey in the field of automated communication analysis.                 for example, for network analysis, VR might offer benefits.

5.1     Survey Findings and Research Opportunities                                IV.1 Model / Transfer Function / Knowledge Gain
                                                                                  Few approaches contain an actual, powerful machine model to an-
Visual analytics is especially suited to support semi-automatic com-
                                                                                  alyze communication (e.g., [17, 58]). Using such models can po-
munication analysis [33, 39]. The complexity, multi-modality, and
                                                                                  tentially support the analysis, such as active learning, intelligent
ambiguity of communication make it an ideal companion to interac-
                                                                                  filtering, or confidence-based predictions. Transfer Functions
tively combine domain knowledge from experts and computing power.
                                                                                  would allow for a more universal machine learning, applying
Concepts like interactive learning allows to refine models, while visu-
                                                                                  knowledge to new problems, increasing the predictive power.
alizations that support uncertainty awareness support the judgment of
automatic results, fostering user trust and possibly identifying bias.
   Findings — To apply the framework, we have taken the 37 selected                   IV.2 Confidence, Trust, and Privacy
approaches (see Section 3) and coded them according to our conceptual             These factors are insufficiently considered the majority of ap-
framework. Based on the results in Table 2, we can discover several               proaches, leading to a black-box analysis. Here it would be possi-
interesting aspects. For once, for the data type, while the analysis of           ble to include confidence estimates (e.g., [17]), logs, provenance
text data seems mostly universal across representation methods, this              histories (e.g., [18]), data minimization, and other concepts.
is not the case for the other data types. Somewhat unsurprisingly,
IV.3 Comparisons and Quantitative User Studies                             approaches and justified when required. However, given sparse taxon-
   While many approaches include case studies and (qualitative)               omy and non-standardized vocabulary, some groupings and namings
   expert interviews, almost none do actual comparisons with related          could arguably have been chosen differently with the same validity. To
   approaches or conduct quantitative user studies.                           advance research in this area, we decided to propose our framework as
                                                                              a first possible draft and one step towards a universally accepted frame-
       O.1 Holistic Approaches                                                work. We, therefore, invite the research community to give feedback to
   Only a few approaches work towards a holistic analysis by consid-          stimulate the scientific discussion and extend upon the framework in
   ering multiple analysis aspects together and in context, covering          further iterations, which can be enhanced further by more input from
   all modalities. Even fewer, for example, support cross-matches.            diverse research communities. As part of this process, the individual,
   As communication analysis is highly heterogeneous, such a per-             multi-faceted aspects can also be formalized in more detail.
   spective can increase the analysis capability and considerably.                Another aspect is the extension of the framework to non-human
                                                                              communication. Several aspects of the framework could similarity be
                                                                              applied to communication in general, for example, machine to machine.
       O.2 Context / Analysis Reference Window
                                                                              Indeed nothing in the framework is specifically tailored to a human
   Similar to the holistic analysis, a specific focus on the communica-
                                                                              communicator, as all the properties could also apply to other types
   tion context in reference to each other should be explored further
                                                                              of communication. However, human communication is often more
   for both inter- and intra-modality analysis. Few approaches con-
                                                                              nuanced than machine communication, making parts of the framework
   sider other modalities or external factors to explain particularities.
                                                                              less relevant, while other features (e.g., structuring, exchange content
   For example, a break in a communication sequence might appear
                                                                              scope) might be missing so far.
   as a gap, but when combined with location information (e.g.,
   same building) might indicate that the participants might have             6 C ONCLUSION
   met for lunch and continued their conversation there, indicate
   an air travel period, etc. In summary, the correct interpretation          In the last decades, communication analysis has experienced a shift
   of communication is extremely context depended. This context,              away from manual analysis to computer-aided or even highly auto-
   however, is variable and many different possible contexts exist.           mated approaches. However, to the same extend as automation levels
   It seems worthwhile to explore concepts to support such analyses           increased, the analysis itself has often become more specialized, mov-
   better, because existing support is often very basic.                      ing away from an overarching exploration. This trend is in contradiction
                                                                              to traditional communication research, which stresses the importance
                                                                              of an holistic approach to capture the full meaning and context of com-
5.2 Limitations and Future Work                                               munication. As a result, many modern digital communication analysis
While our conceptual framework offers exciting research opportunities         systems are highly adapted to a narrow range of tasks, often either
for more powerful communication analysis systems and the application          in the area of content or in network analysis. While this might be
of visual analytics concepts, the area itself is under active research with   perfectly sufficient and suitable for their intended use, such an isolated
several limitations and unsolved challenges.                                  analysis can sometimes lead to an less effective exploration and lead to
    One problem with the described taxonomy is the basis it is designed       incomplete or biased results. Using separate approaches requires more
upon (completeness). We are confident we included the essential re-           manual work, often complicates analysis tasks, can introduce domain
sults from classical communication research, technical considerations,        discontinuities, and increase the struggle domain experts face when
and many aspects from the surveyed approaches. However, one issue             trying to integrate their domain knowledge. Further, isolated analysis
is the completeness of the surveyed approaches. A significant prob-           may not be sufficient to capture the full available information and can
lem in this research area is that relevant approaches are rarely labeled      make the automatic as well as the manual detection of cross-matches
as belonging to communication analysis and are not easily definable           more difficult. The development of more holistic and advanced ap-
by a set of keywords. We initially thought about compiling a list of          proaches for automated communication analysis systems is thereby
domain-specific keywords to select papers by, for example, social net-        hindered by the lack of a clear framework and the absence of a common
work analysis, sentiment analysis, e-mail analysis, etc. However, we          language which combines both technical aspects as well as results from
found it highly likely that such a selection would be highly biased by        traditional communication research.
our knowledge of relevant topics, which is why we decided to go for a            We address this challenge by developing and formalizing a design
more extensive drag-net search to include previously unknown areas.           space for digital communication analysis systems based on the existing
However, while we took great care when defining the seed search and           tool landscape and communication research, while making the case how
then selecting the approaches, it is inadvertently likely we missed indi-     visual analytics principles can be employed for a more holistic approach.
vidual ones, not least by restricting the target journals and conferences.    By systematically discussing and structuring the different analysis areas
Also, it could happen that a few approaches fell through our automatic        and aspects of the design space we arrive at an conceptual framework to
or manual search pattern (see Section 3), while still being applicable        provide an overview and assess the maturity of communication analysis
to communication analysis. Some missing approaches may contain                systems. As part of an initial survey, we have also categorized a large
novel aspects which might widen the framework and introduce new               set of existing approaches using our framework.
elements or pose problems for existing ones. However, as this paper              By bridging the gap in the formalization of digital communication
is not a complete survey and the collection of approaches is primarily        analysis systems by describing a design space for communication anal-
used as one of three parts to design a helpful formalization scheme, the      ysis, we aim to provide researchers with a common language, provide
collection only has to cover most techniques and different methods and        guidelines for building and assessing the maturity of such approaches,
still be beneficial for many approaches. Nevertheless, to address the         as well as point out gaps in the literature which offer exciting research
issue of missing approaches, we created an accompanying survey-like           opportunities. The results of this work are widely applicable in a va-
website available at, which lists the         riety of domains which are concerned with communication analysis
approaches we considered and also allows readers to submit methods            like civil security, the digital humanities, or business intelligence, both
they consider relevant and that have not yet been included.                   from a theoretical point of view as well as for the development of more
    While the previous limitations were concerned with the selected           powerful communication analysis systems.
approaches, another possible limitation concerns our framework it-
self. It may contain overlaps, and the basis it provides is not entirely
orthogonal. Given the complexity of the topic, we think it is chal-
lenging to create a wholly consistent yet easily usable taxonomy, and
there is a need to balance the different trade-offs. The choices we
made for selecting the categories are based on existing research and
