A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv

Page created by April Craig
 
CONTINUE READING
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
arXiv Report 2021                                                                                                                           Volume 0 (1981), Number 0

                                              A Bounded Measure for Estimating the Benefit of Visualization:
                                                        Case Studies and Empirical Evaluation

                                                                               Min Chen1           , Alfie Abdul-Rahman2           , Deborah Silver3 , and Mateu Sbert4
arXiv:2103.02502v1 [cs.HC] 3 Mar 2021

                                                           1 University   of Oxford, UK,      2   King’s College London, UK,   3   Rutgers University, USA, and   4   University of Girona, Spain

                                        Figure 1: The London underground map (right) is a deformed map. In comparison with a relatively more faithful map (left), there is a
                                        significant amount of information loss in the deformed map, which omits some detailed variations among different connection routes between
                                        pairs of stations (e.g., distance and geometry). One common rationale is that the deformed map was designed for certain visualization tasks,
                                        which likely excluded the task for estimating the walking time between a pair of stations indicated by a pair of red or blue arrows. In one
                                        of our experiments, when asked to perform such tasks using the deformed map, some participants did rather well. Can information theory
                                        explain this phenomenon? Can we quantitatively measure some relevant factors in this visualization process?

                                                 Abstract
                                                 Many visual representations, such as volume-rendered images and metro maps, feature a noticeable amount of information loss.
                                                 At a glance, there seem to be numerous opportunities for viewers to misinterpret the data being visualized, hence undermining
                                                 the benefits of these visual representations. In practice, there is little doubt that these visual representations are useful. The
                                                 recently-proposed information-theoretic measure for analyzing the cost-benefit ratio of visualization processes can explain
                                                 such usefulness experienced in practice, and postulate that the viewers’ knowledge can reduce the potential distortion (e.g.,
                                                 misinterpretation) due to information loss. This suggests that viewers’ knowledge can be estimated by comparing the potential
                                                 distortion without any knowledge and the actual distortion with some knowledge. In this paper, we describe several case studies
                                                 for collecting instances that can (i) support the evaluation of several candidate measures for estimating the potential distortion
                                                 distortion in visualization, and (ii) demonstrate their applicability in practical scenarios. Because the theoretical discourse
                                                 on choosing an appropriate bounded measure for estimating the potential distortion is yet conclusive, it is the real world
                                                 data about visualization further informs the selection of a bounded measure, providing practical evidence to aid a theoretical
                                                 conclusion. Meanwhile, once we can measure the potential distortion in a bounded manner, we can interpret the numerical
                                                 values characterizing the benefit of visualization more intuitively.

                                        1. Introduction                                                                              and technological advancements, but also encountered some seri-
                                                                                                                                     ous contentions due to instrumental, operational, and social con-
                                        This paper is concerned with the measurement of the benefit of                               ventions [Kle12]. While the development of measurement systems,
                                        visualization and viewers’ knowledge used in visualization. The                              methods, and standards for visualization may take decades of re-
                                        history of measurement science shows that the development of                                 search, one can easily imagine their impact to visualization as a
                                        measurements in different fields has not only stimulated scientific                          scientific and technological subject.

                                        © 2021 The Author(s) with LATEX template from
                                        Computer Graphics Forum © 2021 The Eurographics Association and John
                                        Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

   “Measurement ... is defined as the assignment of numerals to ob-            2. Related Work
jects or events according to rules” [Ste46]. Rules may be defined
based on physical laws (e.g., absolute zero temperature), observa-             This paper and its preceding paper (in the supplementary materials)
tional instances (e.g., the freezing and boiling points of water), or          are concerned with information-theoretic measures for quantifying
social traditions (e.g., seven days per week). Without exception,              aspects of visualization, such as benefit, knowledge, and potential
measurement development in visualization aims to discover and                  misinterpretation. The preceding paper focuses its review on previ-
define rules that will enable us to use mathematics in describing,             ous information-theoretic work in visualization. In this section, we
differentiating, and explaining phenomena in visualization, as well            focus our review on previous measurement work in visualization.
as predicting the impact of a design decision, diagnosing shortcom-            Measurement Science. There is currently no standard method
ings in visual analytics workflows, and formulating solutions for              for measuring the benefit of visualization, levels of visual abstrac-
improvement.                                                                   tion, the human knowledge used in visualization, or the potential
   In a separate and related paper (see the supplementary materi-              to misinterpret visual abstraction. While these are considered to be
als), Chen and Sbert examined a number of candidate measures for               complex undertakings, many scientists in the history of measure-
assigning numerals to the notion of potential distortion, which is             ment science would have encountered similar challenges [Kle12].
one of the three components in the information-theoretic measure                  In their book [BW08], Boslaugh and Watters describe measure-
for quantifying the cost-benefit of visualization [CG16]. They used            ment as “the process of systematically assigning numbers to objects
several “conceptual rules” to evaluate these candidates, narrowing             and their properties, to facilitate the use of mathematics in study-
them down to five candidates. In this work, we focus on the remain-            ing and describing objects and their relationships.” They empha-
ing five candidate measures and evaluate them based on empirical               size in particular measurement is not limited to physical qualities
evidence. We use two synthetic case studies and two experimental               such as height and weight, but also abstract properties such as in-
case studies to instantiate values that may be returned by the candi-          telligence and aptitude. Pedhazur and Schmelkin [PS91] assert the
dates. The main “observational rules” used in this work include:               necessity of an integrated approach for measurement development,
                                                                               involving data collection, mathematical reasoning, technology in-
a. Does the numerical ordering of observed instances match with
                                                                               novation, and device engineering. Tal [Tal20] points out that mea-
   the intuitively-expected ordering?
                                                                               surement is often not totally “real”, involves the representation of
b. Does the gap between positive and negative values indicate a
                                                                               ideal systems, and reflects conceptual, metaphysical, semantic, and
   meaningful critical point (i.e., zero benefit in our case)?
                                                                               epistemological understandings. Schlaudt [Sch20] goes one step
Although one might consider observational rules are subjective and             further, referring measurement as a cultural technique.
thus undesirable, we can appreciate their roles in the development
                                                                                  This work is particularly inspired by the historical develop-
of many measurement systems used today (e.g., number of months
                                                                               ment of temperature scales and seismic magnitude. The former at-
per year, and sign of temperature in Celsius). It would be hasty to
                                                                               tracted the attention of many well-known scientists, benefited from
dismiss such rules, especially at this early stage of measurement
                                                                               both experimental observations (e.g., by Celsius, Delisle, Fahren-
development in visualization.
                                                                               heit, Newton, etc.) and theoretical discoveries (e.g., by Boltzmann,
   In addition, we use the data collected in two visualization case            Thomson (Kelvin), etc.). The latter started not long ago as the
studies to explore the relationship between the benefit of visualiza-          Richter scale was outlined in 1935, and since then there have
tion and the viewers’ knowledge used in visualization. As shown in             been many schemes proposed relating different physical properties.
Figure 1, in one case study, we asked participants to perform tasks            Many scales in both applications are related to some logarithmic
for estimating the walking time (in minutes) between two under-                transformations one way or another.
ground stations indicated by a pair of red or blue arrows. Although
                                                                               Metrics Development in Visualization.               Behrisch et al.
the deformed London underground map was not designed to per-
                                                                               [BBK∗ 18] presented an extensive survey of quality metrics for in-
form visualization tasks, many participants performed rather well,
                                                                               formation visualization. Bertini et al. [BTK11] described a sys-
including those that had very limited experience of using the Lon-
                                                                               temic approach of using quality metrics for evaluating visualiza-
don underground. We use different candidate measures to estimate
                                                                               tion in high-dimensional data visualization focusing on scatter plots
the supposed potential distortion. When the captured data shows
                                                                               and parallel coordinates plots. A variety of quality metrics have
that the amount of actual distortion is lower than the supposed dis-
                                                                               been proposed to measure many different attributes, such as ab-
tortion, this indicate that the viewers’ knowledge has been used in
                                                                               straction quality [Tuf86, CYWR06, JC08], quality of scatter plots
the visualization process to alleviate the potential distortion. While
                                                                               [FT74, TT85, BS04, WAG05, SNLH09, TAE∗ 09, TBB∗ 10], quality
we use the experiments to collect practical instances to evaluate the
                                                                               of parallel coordinates plots [DK10], cluttering [PWR04, RLN07,
candidate measures empirically, they also demonstrate that we are
                                                                               YSZ14], aesthetics [FB09], visual saliency [JC10], and color map-
getting closer to be able to estimate the “benefit” and “potential
                                                                               ping [BSM∗ 15, MJSK15, MK15, GLS17]. In particular, Jänicke et
distortion” of practical visualization processes.
                                                                               al. [JC10] first considered a metric for estimating the amount of
   Readers are encouraged to consult the preceding paper in the                original data that is depicted by visualization and may be recon-
supplementary materials for information about the mathematical                 structed by viewers. Chen and Golan [CG16] used the abstract form
background, the formulation of some new candidate measures, and                of this idea in defining their cost-benefit ratio. While the work by
the conceptual evaluation of the candidate measures. Nevertheless,             Jänicke et al. [JC10] relied computer vision techniques to recon-
this paper is written in a self-contained manner.                              struction, this work focused on collecting and analysing empirical

                                                                                                                           © 2021 The Author(s) with LATEX template from
                                                                                   Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

data because human knowledge has a major role to play in infor-                                  a1
                                                                                                                                                    Rules
mation reconstruction.                                                                                                                        
Measurement in Empirical Experiments. Almost all empiri-                                                                                       
cal studies in visualization involve measuring participants’ perfor-                                                                       
mance in visualization processes. Such measured data allow us to
                                                                                                 b1                b2                b3
assess the benefit of visualization or potential misinterpretation,
typically in terms of accuracy and response time. Many uncon-
trolled empirical studies also collect participants’ experience and
opinions qualitatively. The empirical studies related to the key com-
ponents are those on the topics of visual abstraction and human
knowledge in visualization. Isenberg [Ise13] presented a survey of                               c1                c2                c3
evaluation techniques on non-photorealistic and illustrative render-
ing. Isenberg et al. [INC∗ 06] reported an observational study com-
paring hand-drawn and computer-generated non-photorealistic ren-
dering. Cole et al. [CSD∗ 09] performed a study evaluating the ef-
fectiveness of line drawing in representing shape. Mandryk et al.                               c4                c5                c6                 c7
[MML11] evaluated the emotional responses to non-photorealistic
generated images. Liu and Li [LL16] presented an eye-tracking
study examining the effectiveness and efficiency of schematic de-
sign of 30◦ and 60◦ directions in underground maps. Hong et al.
[HYC∗ 18] evaluated the usefulness of distance cartograms map “in                               c8                c9                c10               c11
the wild”. These studies confirmed that visualization users can deal
with significant information loss due to visual abstraction in many
situations.
   Tam et al. [TKC17] reported an observational study, and discov-
ered that machine learning developers entered a huge amount of                                  c12               c13               c14               c15
knowledge (measured in bits) into a visualization-assisted model
development process. Kijmongkolchai et al. [KARC17] reported a
study design for detecting and measuring human knowledge used
in visualization, and translated the traditional accuracy values to
information-theoretic measures. They encountered some undesir-
                                                                                         Figure 2: Three alphabets illustrate possible metro maps (letters)
able properties of the Kullback-Leibler divergence in their cal-
                                                                                         in different grid resolutions.
culations. In this work, we collect empirical data to evaluate the
mathematical solutions proposed to address the issue encountered
in [KARC17].                                                                             version is:
                                                                                             Benefit Alphabet Compression − Potential Distortion
                                                                                                     =                                                       (1)
3. Overview, Notations, and Problem Statement                                                 Cost                     Cost

Brief Overview. Whilst hardly anyone in the visualization com-                              Appendix A provides more detailed explanation of this measure,
munity would support any practice intended to deceive viewers,                           while Appendix B explains in detail how tasks and users are con-
there have been many visualization techniques that inherently cause                      sidered by this measure in abstract.
distortion to the original data. The deformed London underground
                                                                                         Mathematical Notations. Consider a simple metro map consists
map in Figure 1 shows such an example. The distortion in this ex-
                                                                                         of only two stations in Figure 2. We consider three different grid
ample is largely caused by many-to-one mappings. A group of lines
                                                                                         resolutions, with 1 × 1 cell, 2 × 2 cells, and 4 × 4 cells respectively.
that would be shown in different lengths in a faithful map are now
                                                                                         The following set of rules determine whether a potential path is
shown with the same length. Another group of lines that would be
                                                                                         allowed or not:
shown with different geometric shapes are now shown as the same
straight line. In terms of information theory, when the faithful map                     • The positions of the two stations are fixed on each of the three
is transformed to the deformed, a good portion of information has                          grids and there is only one path between the red station and the
been lost because of these many-to-one mappings.                                           blue station;
                                                                                         • As shown on the top-right of Figure 2, only horizontal, and di-
   The common phrase that “the appropriateness of information
                                                                                           agonal path-lines are allowed;
loss depends on tasks” is not an invalid explanation. Partly by a
                                                                                         • When one path-line joins another, it can rotate by up to ±45◦ ;
similar conundrum in economics “what is the most appropriate res-
                                                                                         • All joints of path-lines can only be placed on grid points;
olution of time series for an economist”, Chen and Golan proposed
an information-theoretic cost-benefit ratio for measuring various                         For the first grid with 1 × 1 cell, there is only one possible path.
factors involved in visualization processes [CG16]. Its qualitative                      We define an alphabet A to contain this option as its only letter a1 ,

© 2021 The Author(s) with LATEX template from
Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

i.e., A = {a1 }. For the second grid with 2 × 2 cells, we have an               has the PMF Qu , we have DKL (P||Qu ) ≈ 1.562 bits; and (ii) if the
alphabet B = {b1 , b2 , b3 }, consisting of three optional paths. For           original design alphabet B has the PMF Qv , we have DKL (P||Qv ) ≈
the third grid with 4 × 4 cells, there are 15 optional paths, which             0.138 bits. There is more divergence in the case (i) than case (ii).
are letters of alphabet C = {c1 , c2 , . . . , c15 }. When the resolution of    Intuitively we can guess this as P appears to be similar to Qv .
the grid increases, the alphabet of options becomes bigger quickly.
                                                                                   Recall the qualitative formula in Eq. 1. In [CG16], the benefit of
We can imagine it gradually allows the designer to create a more
                                                                                a visual analytics process is defined as:
faithful map. To ensure the calculation below is easy to follow, we
consider only the first two grids below.                                           Benefit = AC − PD = H (Zi ) − H (Zi+1 ) − DKL (Z0i ||Zi )                          (2)
   To a designer of the underground map, at the 1 × 1 resolution,               where Zi is the input alphabet to the process and Zi+1 is the output
there is only one choice regardless how much the designer would                 alphabet. Z0i is an alphabet reconstructed based on Zi+1 . Z0i has the
like to draw the path to reflect the actual geographical path of the            same set of letters as Zi but likely a different PMF.
metro line between these two stations. At the 4 × 4 resolution, the
designer has many options. Hence there is more uncertainty associ-                In terms of Eq. 2, we have Zi = B with PMF Qu or Qv , Zi+1 = A
ated with the third grid. This uncertainty can be measured by Shan-             with PMF Q, and Z0i = B0 with PMF P. We can thus calculate the
non entropy, which is defined as:                                               benefit in the two cases as:
                      n                                   n
                                                                                        Benefit of case (i) = H (B) − H (A) − DKL (B0 ||B)
       H (Z) = − ∑ pi log2 pi        where pi ∈ [0, 1], ∑ pi = 1
                    i=1                                  i=1                                                     = H (Qu ) − H (Q) − DKL (P||Qu )
where Z is an alphabet, and can be replaced with A, B, or C. To                                                  ≈ 1.585 − 0 − 1.562 = 0.023bits
calculate Shannon entropy, the alphabet Z needs to be accompanied
by a probability mass function (PMF), which is written as P(Z).                        Benefit of case (ii) = H (Qv ) − H (Q) − DKL (P||Qv )
Each letter zi ∈ Z is thus associated with a probability value pi ∈ P.
                                                                                                                 ≈ 0.569 − 0 − 0.138 = 0.431bits
Note: In this paper, to simplify the notations in different contexts,
for an information-theoretic measure, we use an alphabet Z and its              In the case (ii), because the viewers’ expectation is closer to the
PMF P interchangeably, e.g., H (P(Z)) = H (P) = H (Z). Ap-                      original PMF Qv , there is more benefit in the visualization process
pendix C provides more mathematical background about informa-                   than the case (i) though the case (ii) has less AC than case (i).
tion theory, which may be helpful to some readers.                                 However, DKL has an undesirable mathematical property. If we
   Let first consider the single-letter alphabet A and its PMF Q.               consider a third case, (iii), where the original PMF Qw is strongly
Because n = 1 and q1 = 1, we have H (A) = 0 bits. The alphabet                  in favour of b2 , such as q1 = ε, q2 = 1 − ε, q3 = ε, where 0 < ε < 1
is 100% certain, reflecting the fact that the designer has no choice.           is a small positive value. If ε = 0.001, DKL (P||Qw ) = 9.933 bits.
                                                                                If ε → 0, DKL (P||Qw ) → ∞. Since the maximum entropy (uncer-
   The alphabet B has three design options b1 , b2 , and b3 . If they           tainty) for B is only about 1.585, it is difficult to interpret that view-
have an equal chance to be selected by the designer, we have a                  ers’ divergence can be more than that maximum, not mentioning
PMF Qu with q1 = q2 = q3 = 1/3, and thus H (Qu (B)) ≈ 1.585                     the infinity.
bits. When we examine the three options in Figure 2, it is not un-
reasonable to consider a second scenario that the choice may be in              Problem Statement. When using DKL in Eq. 1 in a relative or
favour of the straight line option b1 in designing a metro map ac-              qualitative context (e.g., [CGJM19, CE19]), the unboundedness of
cording to the real geographical data. If a different PMF Qv is given           the KL-divergence does not pose an issue. However, this does be-
as q1 = 0.9, q2 = q3 = 0.05, we have H (Qv (B)) ≈ 0.569 bits. The               come an issue when the KL-divergence is used to measure PD in
second scenario features less entropy and is thus of more certainty.            an absolute and quantitative context.
   Consider that the designer is given a metro map designed using                  In the preceding paper (see the supplementary materials), Chen
alphabet B, and is asked to produce a more abstract map using al-               and Sbert showed that conceptually, it is the unboundedness is
phabet A. To the designer, it is a straightforward task, since there is         not consistent with a conceptual interpretation of KL-divergence
only one option in A. When a group of viewers are visualizing the               for measuring the inefficiency of a code (alphabet) that has a fi-
final design a1 , we could give these viewers a task to guess what              nite number of codewords (letters). They propose to find a suitable
may be the original map designed with B. If most viewers have                   bounded divergence measure to replace the DKL term. They exam-
no knowledge about the possible options b2 and b3 , and almost all              ined eight candidate measures, analysed their mathematical prop-
choose b1 as the original design, we can describe their decisions               erties with the aid of visualization, and narrowed down to five mea-
using a PMF P such that p1 = 0.998, p2 = p3 = 0.001. Since P is                 sures using multi-criteria decision analysis (MCDA) [IN13]. The
not the same as either Qu or Qv , the viewers’ decisions diverge from           upper half of Table 1 shows their conceptual evaluation based on
the actual PMF associated with B. This divergence can be measured               five criteria. In this work, we continue their MCDA process by in-
using KL-divergence:                                                            troducing criteria based on the analysis of instances obtained when
                             n                            n                     using the remaining five candidate measures in different case stud-
                                                                      pi
   DKL (P(Z)||Q(Z)) =       ∑ pi (log2 pi − log2 qi ) = ∑ pi log2 qi            ies, which correspond to criteria 6 – 9 respectively in Table 1.
                           i=1                           i=1
                                                                                   For self-containment, we give the mathematical definition of
  Using DKL , we can calculate (i) if the original design alphabet B            the five candidate measures below. In this paper, we treat them as

                                                                                                                            © 2021 The Author(s) with LATEX template from
                                                                                    Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

 Table 1: A summary of multi-criteria analysis. Each measure is scored against a criterion using an integer in [0, 5] with 5 being the best.
Criteria                                    Importance       0.3DKL  DJS H (P|Q) Dnew     k=1   Dnew
                                                                                                   k=2 Dncm
                                                                                                        k=1 Dncm
                                                                                                              k=2  Dk=2
                                                                                                                    M                                         Dk=200
                                                                                                                                                               M
1. Boundedness                                 critical         0      5         5        5         5   5      5     3                                          3
  I 0.3DKL is eliminated but used below only for comparison. The other scores are carried forward.
2. Number of PMFs                            important          5      5         2        5         5   5      5     5                                          5
3. Entropic measures                         important          5      5         5        5         5   5      5     1                                          1
4. Curve shapes                                helpful          5      5         1        2         4   2      4     3                                          3
5. Curve shapes                                helpful          5      4         1        3         5   3      5     2                                          3
  I Eliminate H (P|Q), DM   2 , D 200 based on criteria 1-5
                                 M                            sum:    24        14       20        24   20    24    14                                          15
6. Scenario: good and bad (Figure 3)           helpful          −      3        −         5         4    5     4    −                                           −
7. Scenario: A, B, C, D (Figure 4)             helpful          −      4        −         5         3    2     1    −                                           −
8. Case Study 1 (Section 5.1)                important          −      5        −         1         5    5     5    −                                           −
9. Case Study 2: (Section 5.2)               important          −      3        −         1         5    3     3    −                                           −
  I Dnew
       k=2 has the highest score based on criteria 6-9 (1-9)  sum: 15(39)              12(32) 17(41) 15(35) 13(37)

black-box functions, since they have already undergone the concep-                             which captures the non-commutative property of DKL .
tual evaluation in the preceding paper. For more detailed concep-
                                                                                                  As DJS , Dnew
                                                                                                             k , and D k
                                                                                                                       ncm are bounded by [0, 1], if any of them is
tual and mathematical discourse on these five candidate measures,
                                                                                               selected to replace DKL , Eq. 2 can be rewritten as
please consult the preceding paper in the supplementary materials.
                                                                                                     Benefit = H (Zi ) − H (Zi+1 ) − Hmax (Zi )D(Z0i ||Zi )      (6)
   The first candidate measure is Jensen-Shannon divergence
[Lin91]. The conceptual evaluation yield a promising score of 24                               where Hmax denotes maximum entropy, while D is a placeholder
in Table 1. It is defined as:                                                                  for DJS , Dnew
                                                                                                          k , or D k .
                                                                                                                  ncm
                   1
   DJS (P||Q) = DKL (P||M) + DKL (Q||M) = DJS (Q||P)
                                              
                   2
                                                             (3)                               4. Synthetic Case Studies
                   1 n
                                                       
                               2pi                2qi
                = ∑ pi log2           + qi log2
                   2 i=1      pi + qi           pi + qi                                        The work in this section and the next section is inspired by the
                                                                                               historical development of temperature scales, in which many pio-
where P and Q are two PMFs associated with the same alphabet Z                                 neering scientists collected observational data to help reason about
and M is the average distribution of P and Q. Each letter zi ∈ Z is                            the candidate scales. We first consider two synthetic case studies,
associated with a probability value pi ∈ P and another qi ∈ Q. With                            which allow us to define idealized situations, from which collected
the base 2 logarithm as in Eq. 3, DJS (P||Q) is bounded by 0 and 1.                            data do not contain any noise. We use these case studies to see if
   The second and third candidate measures are two instances of a                              the values returned by different divergence measures make sense.
new measure Dnewk   proposed by Chen and Sbert in the preceding                                In many ways, this is similar to testing a piece of software using
work. The two instances are Dnewk (k = 1) and D k (k = 2). They                                pre-defined test cases. Nevertheless, these test cases feature more
                                                  new
received scores of 20 and 24 respectively in the conceptual evalua-                            complex alphabets than those considered by the conceptual evalu-
tion. Dnew
       k is defined as follows:                                                                ation reported in the preceding paper. In our analysis, we followed
                                                                                               the best-worst method of MCDA [Rez15]. Our main “observational
                                 1 n                                                           rules” are:
            Dnew
             k
                                       (pi + qi ) log2 |pi − qi |k + 1
                                                                       
                 (P||Q) =          ∑                                                     (4)
                                 2 i=1
                                                                                               a. Does the numerical ordering of observed instances match with
where k > 0. Because 0 ≤ |pi − qi ≤ 1, we have|k                                                  the intuitively-expected ordering?
                                                                                               b. Does the gap between positive and negative values indicate a
1 n                                       1 n
  ∑   (pi +qi ) log2 (0+1) ≤ Dnew
                              k
                                  (P||Q) ≤ ∑ (pi +qi ) log2 (1+1)                                 meaningful critical point (i.e., zero benefit in our case)?
2 i=1                                     2 i=1
                                                                                               Criterion 6. Let Z be an alphabet with two letters, good and
Since log2 1 = 0, log2 2 = 1, ∑ pi = 1, ∑ qi = 1, Dnew
                                                   k (P||Q) is thus
                                                                                               bad, for describing a scenario (e.g., an object or an event), which
bounded by 0 and 1.                                                                            has the probability of good is p1 = 0.8, and that of bad is p2 = 0.2.
                                                                                               In other words, P = {0.8, 0.2}. Imagine that a biased process (e.g.,
   The fourth and five candidate measures are two instances of a
                                                                                               a distorted visualization, an incorrect algorithm, or a misleading
non-commutative version of Dnew k . It is denoted as D k , and the
                                                       ncm                                     communication) conveys the information about the scenario always
two instances are Dncm
                    k (k = 1) and D k (k = 2). They also received
                                      ncm                                                      bad, i.e., a PMF Rbiased = {0, 1}. Users at the receiving end of the
scores of 20 and 24 respectively in the conceptual evaluation. Dncm
                                                                k
                                                                                               process may have different knowledge about the actual scenario,
is defined as follows:
                                                                                               and they will make a decision after receiving the output of the pro-
                                       n
                                                                                               cess. For example, there are five users and we have obtained the
                 Dncm
                  k
                                                     |pi − qi |k + 1
                                                                    
                      (P||Q) =        ∑ pi log2                                          (5)
                                      i=1                                                      probability of their decisions as follows:

© 2021 The Author(s) with LATEX template from
Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

1.0                                                                                                                                                    ground truth distribution P
      Divergence                                                               Contributions to the divergence from different letters                        misleading Rbiased
0.9                                                                                                                                                                 reconstructed Q
                                                                                                    Good        Bad
0.8
                                                                                                                                                                    LD: a little
0.7                                                                                                                                                                  doubt
0.6                                                                                                                                                                 FD: a fair
                                                                                                                                                                     amount
0.5                                                                                                                                                                  of doubt
0.4
                                                                                                                                                                    RG: random
0.3                                                                                                                                                                  guess
0.2
                                                                                                                                                                    UC: under-
0.1                                                                                                                                                                  correction
0.0
      LD DF RG UC OC            LD DF RG UC OC             LD DF RG UC OC              LD DF RG UC OC                 LD DF RG UC OC                                OC: over-
                                                                                                                                                                     correction
              DJS                    Dncm (k=1)                 Dncm (k=2)                  Dnew (k=1)                    Dnew (k=2)

Figure 3: An example scenario with two states good and bad has a ground truth PMF P = {0.8, 0.2}. From the output of a biased process
that always informs users that the situation is bad. Five users, LD, DF, RG, UC, and OC, have different knowledge, and thus different
divergence. The five candidate measures return different values of divergence. We would like to see which set of values are more intuitive. The
illustration on the right shows two transformations of the alphabets and their PMFs, one by the misleading communication and the other by
the reconstruction.
0.4                                                                                                                                                    ground truth distribution P
      Divergence                           Contributions to the divergence from different letters                                                            correct/biased R
                                                          A       B      C       D                                                                                  reconstructed Q
                                                                                                                                                                    CG: correct
0.3                                                                                                                                                                  pr. & guess
                                                                                                                                                                    CU: correct
                                                                                                                                                                     pr. & useful
                                                                                                                                                                     knowledge
0.2                                                                                                                                                                 CB: correct
                                                                                                                                                                     pr. & biased
                                                                                                                                                                     reasoning
                                                                                                                                                                    BG: biased
0.1                                                                                                                                                                  pr. & guess
                                                                                                                                                                    BS: biased
                                                                                                                                                                     pr. & small
                                                                                                                                                                     correction
0.0
                                                                                                                                                                    BM: biased
      CG CU CB BG BS BM         CG CU CB BG BS BM         CG CU CB BG BS BM           CG CU CB BG BS BM             CG CU CB BG BS BM                                pr. & major
              DJS                    Dncm (k=1)                 Dncm (k=2)                  Dnew (k=1)                    Dnew (k=2)                                 correction

Figure 4: An example scenario with four data values: A, B, C, and D. Two processes (one correct and one biased) aggregated them to two
values AB and CD. Users CG, CU, CB attempt to reconstruct [A, B, C, D] from the output [AB, CD] of the correct process, while BG, BS,
and BM attempt to do so with the output from the biased processes. The bar chart shows divergence values of the six users computed using
the five candidate measures. The illustration on the right shows two transformations of the alphabets and their PMFs, one by the correct or
biased process (pr.) and the other by the reconstruction.

• LD — The user has a little doubt about the output of the process,                   slightly more divergence than UC (0.014 vs. 0.010). DJS returns
  and decides bad 90% of the time, and good 10% of the time, i.e.,                    relatively low values than other measures. For UC and OC, DJS ,
  with PMF Q = {0.1, 0.9}.                                                            Dncm
                                                                                         k=1 , and D k=2 return small values (< 0.02), which are a bit dif-
                                                                                                     new
• FD — The user has a fair amount of doubt, with Q = {0.3, 0.7}.                      ficult to estimate.
• RG — The user makes a random guess, with Q = {0.5, 0.5}.
                                                                                         Dncm
                                                                                           k=1 and D k=2 show strong asymmetric patterns between good
                                                                                                     ncm
• UC — The user has adequate knowledge about P, but under-
                                                                                      and bad, reflecting the probability values in Q. In other words,
  compensate it slightly, with Q = {0.7, 0.3}.
                                                                                      the more decisions on good, the more good-related divergence.
• OC — The user has adequate knowledge about P, but over-
                                                                                      This asymmetric pattern is not in anyway incorrect, as the KL-
  compensate it slightly, with Q = {0.9, 0.1}.
                                                                                      divergence is also non-commutative and would also produce much
                                                                                      stronger asymmetric patterns. An argument for supporting commu-
We can use different candidate measures to compute the diver-
                                                                                      tative measures would point out that the higher probability of good
gence between P and Q. Figure 3 shows different divergence val-
                                                                                      in P should also influence the balance between the good-related
ues returned by these measures, while the transformations from P to
                                                                                      divergence.
Rbiased and then to Q are illustrated on the right margin of the figure.
Each value is decomposed into two parts, one for good and one for                        We decide to score DJS 3 because of its lower valuation and its
bad. All these measures can order these five users reasonably well.                   non-equal comparison of OU and OC. We score Dncm   k=1 and D k=1
                                                                                                                                                   new
The users UC (under-compensate) and OC (over-compensate) have                         5; and Dncm and Dnew 4 as the values returned by Dncm
                                                                                                k=2       k=2                             k=1 and D k=1
                                                                                                                                                   new
the same values with Dnewk and D k , while D considers OC has
                                    ncm            JS                                 are slightly more intuitive.

                                                                                                                                  © 2021 The Author(s) with LATEX template from
                                                                                          Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

Criterion 7. We now consider a slightly more complicated sce-                             Question 5: The image on the right depicts a computed
                                                                                          tomography dataset (arteries) that was rendered using a
nario with four pieces of data, A, B, C, and D, which can be de-                          maximum intensity projection (MIP) algorithm. Consider
                                                                                          the section of the image inside the red circle (also in the
                                                                                          inset of a zoomed-in view). Which of the following
fined as an alphabet Z with four letters. The ground truth PMF is                         illustrations would be the closest to the real surface of
P = {0.1, 0.4, 0.2, 0.3}. Consider two processes that combine these                       this part of the artery?

into two classes AB and CD. These typify clustering algorithms,
downsampling processes, discretization in visual mapping, and so                          A                                B
                                                                                                    Curved,                             Curved,
on. One process is considered to be correct, which has a PMF for                                 rather smooth                 with wrinkles and bumps

AB and CD as Rcorrect = {0.5, 0.5}, and another biased process
with Rbiased = {0, 1}. Let CG, CU, and CH be three users at the                           C                                D
receiving end of the correct process, and BG, BS, and BM be three                                     Flat,
                                                                                                 rather smooth
                                                                                                                                         Flat,
                                                                                                                               with wrinkles and bumps
other users at the receiving end of the biased process. The users
with different knowledge exhibit different abilities to reconstruct                      Figure 5: A volume dataset was rendered using the MIP method. A
the original scenario featuring A, B, C, D from aggregated infor-                        question about a “flat area” in the image can be used to tease out
                                                                                                                                               Image by Min Chen, 2008
mation about AB and CD. Similar to the good-bad scenario, such                           a viewer’s knowledge that is useful in a visualization process.
abilities can be captured by a PMF Q. For example, we have:
•   CG makes random guess, Q = {0.25, 0.25, 0.25, 0.25}.
                                                                                         compression. Some major algorithmic functions in volume visual-
•   CU has useful knowledge, Q = {0.1, 0.4, 0.1, 0.4}.
                                                                                         ization, e.g., iso-surfacing, transfer function, and rendering integral,
•   CB is highly biased, Q = {0.4, 0.1, 0.4, 0.1}.
                                                                                         all facilitate alphabet compression, hence information loss.
•   BG makes guess based on Rbiased , Q = {0.0, 0.0, 0.5, 0.5}.
•   BS makes a small adjustment, Q = {0.1, 0.1, 0.4, 0.4}.                                  In terms of rendering integral, maximum intensity projection
•   BM makes a major adjustment, Q = {0.2, 0.2, 0.3, 0.3}.                               (MIP) incurs a huge amount of information loss in comparison with
Figure 4 compares the divergence values returned by the candi-                           the commonly-used emission-and-absorption integral [MC10]. As
date measures for these six users, while the transformations from                        shown in Figure 5, the surface of arteries are depicted more or less
P to Rcorrect or Rbiased , and then to Q are illustrated on the right.                   in the same color. The accompanying question intends to tease out
We can observe that Dncm   k  and Dnew
                                     k=2 return values < 0.1, which                      two pieces of knowledge, “curved surface” and “with wrinkles and
seem to be less intuitive. Meanwhile DJS shows a large portion                           bumps”. Among the ten surveyees, one selected the correct answer
of divergence from the AB category, while Dncm   k=1 and D k=2 show                      B, eight selected the relatively plausible answer A, and one selected
                                                           ncm
more divergence in the BC category. In particular, for user BG,                          the doubtful answer D.
Dncm
  k=1 and D k=2 do not show any divergence in relation to A and B,
             ncm                                                                            Let alphabet Z = {A, B, C, D} contain the four optional answers.
though BG clearly has reasoned A and B rather incorrectly. Dnew   k=1
                                                                                         One may assume a ground truth PMF Q = {0.1, 0.878, 0.002, 0.02}
and Dnew show a relatively balanced account of divergence associ-
      k=2
                                                                                         since there might still be a small probability for a section of
ated with A, B, C, and D. On balance, we give scores 5, 4, 3, 2, 1                       artery to be flat or smooth. The rendered image depicts a mis-
to Dnew
     k=1 , D , D k=2 , D k=1 , and D k=2 respectively.
            JS   new      ncm        ncm                                                 leading impression, implying that answer C is correct or a false
                                                                                         PMF F = {0, 0, 1, 0}. The amount of alphabet compression is thus
5. Experimental Case Studies                                                             H (Q) − H (F) = 0.225.
To complement the synthetic case studies in Section 4, we con-                              When a surveyee gives an answer to the question, it can also be
ducted two surveys to collect some realistic examples that feature                       considered as a PMF P. With PMF Q = {0.1, 0.878, 0.002, 0.02},
the use of knowledge in visualization. In addition to providing in-                      different answers thus lead to different values of divergence as fol-
stances of criteria 8 and 9 for selecting a bounded measure, the                         lows:
surveys were also designed to demonstrate that one could use a few                       Divergence for:                                          DJS    Dnew
                                                                                                                                                          k=1     Dnew
                                                                                                                                                                   k=2    Dncm
                                                                                                                                                                           k=1    Dncm
                                                                                                                                                                                   k=2
simple questions to estimate the cost-benefit of visualization in re-
lation to individual users.                                                              A (Pa = {1, 0, 0, 0}, Q):                              0.758    0.9087   0.833   0.926   0.856
                                                                                         B (Pb = {0, 1, 0, 0}, Q):                              0.064    0.1631   0.021   0.166   0.021
                                                                                         C (Pc = {0, 0, 1, 0}, Q):                              0.990    0.9066   0.985   0.999   0.997
5.1. Volume Visualization (Criterion 8)                                                  D (Pd = {0, 0, 0, 1}, Q):                              0.929    0.9086   0.858   0.986   0.971
This survey, which involved ten surveyees, was designed to collect
some real-world data that reflects the use of knowledge in view-                            Without any knowledge, a surveyee would select answer C, lead-
ing volume visualization images. The full set of questions were                          ing to the highest value of divergence in terms of any of the three
presented to surveyees in the form of slides, which are included                         measures. Based PMF Q, we expect to have divergence values in
in the supplementary materials. The full set of survey results is                        the order of C > D > A  B. DJS , Dnew  k=2 , D k=1 , and D k=2 have
                                                                                                                                        ncm         ncm
given in Appendix C. The featured volume datasets were from                              produced values in that order, while Dnew
                                                                                                                                k=1 indicates an order of A

“The Volume Library” [Roe19], and visualization images were ei-                          > D > C  B. This order cannot be interpreted easily, indicating
ther rendered by the authors or from one of the four publications                        a weakness of Dnew
                                                                                                          k=1 .

[NSW02, CSC06, WQ07, Jun19].
                                                                                            Together with the alphabet compression H (Q) − H (F) =
   The transformation from a volumetric dataset to a volume-                             0.225 and the Hmax of 2 bits, we can also calculate the informative
rendered image typically features a noticeable amount of alphabet                        benefit using Eq. 6. For surveyees with different answers, the lossy

© 2021 The Author(s) with LATEX template from
Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation
Benefit for:                                           DJS             Dnew
                                                                        k=1          Dnew
                                                                                      k=2   Dncm
                                                                                             k=1   Dncm
                                                                                                    k=2
                                                                                                           Meanwhile, the voxels for soft tissue and muscle are not depicted at
A (Pa = {1, 0, 0, 0}, Q): −0.889 −1.190 −1.038 −1.224 −1.084                                               all, which can also been regarded as using a hollow visual channel.
B (Pb = {0, 1, 0, 0}, Q):  0.500 0.302 0.586 0.296 0.585                                                   The visual representation has been widely used, and the viewers
C (Pc = {0, 0, 1, 0}, Q): −1.351 −1.185 −1.097 −1.369 −1.366                                               are expected to use their knowledge to infer the 3D relationships
D (Pd = {0, 0, 0, 1}, Q): −1.230 −1.189 −1.088 −1.343 −1.314                                               between the two iso-surfaces as well as the missing information
 Question 3: The image on the right depicts a computed
                                                                                                           about soft tissue and muscle. The question that accompanies the
 tomography dataset (head) that was rendered using a
 ray casting algorithm. Consider the section of the image
                                                                                                           figure is for estimating such knowledge.
 inside the orange circle. Which of the following
 illustrations would be the closest to the real cross
 section of this part of the facial structure?                                                                Although the survey offers only four options, it could in fact of-
                                                                                                           fer many other configurations as optional answers. Let us consider
                                                                                                           four color-coded segments similar to the configurations in answers
 A                               B
                                                                                                           C and D. Each segment could be one of four types: bone, skin, soft
                                                                                                           tissue and muscle, or background. There are a total of 44 = 256 con-
                                                                                                           figurations. If one had to consider the variation of segment thick-
 C                              D                                                                          ness, there would be many more options. Because it would not be
                                                                                                           appropriate to ask a surveyee to select an answer from 256 options,
                                                                                                           a typical assumption is that the selected four options are represen-
            background          bone         skin           soft tissue and muscle
                                                                                                           tative. In other words, considering that the 256 options are letters
Figure 6: Two iso-surfaces of a volume dataset wereImage by Min Chen, 1999
                                                      rendered        us-                                  of an alphabet, any unselected letter has a probability similar to one
ing the ray casting method. A question about the tissue configura-                                         of the four selected options.
tion in the orange circle can tease out a viewer’s knowledge about                                           For example, we can estimate a ground truth PMF Q such that
the translucent depiction and the missing information.                                                     among the 256 letters,
                                                                                                           •   Answer A and four other letters have a probability 0.01,
depiction of the surface of arteries brought about different amounts
                                                                                                           •   Answer B and 64 other letters have a probability 0.0002,
of benefit:
                                                                                                           •   Answer C and 184 other letters have a probability 0.0001,
   All five sets of values indicate that only those surveyees who                                          •   Answer D has a probability 0.9185.
gave answer C would benefit from such lossy depiction produced
by MIP, signifying the importance of user knowledge in visualiza-                                          We have the entropy of this alphabet H (Q) = 0.85. Similar to the
tion. However, the values returned for A, C, and D by Dnew k=1 are                                         previous example, we can estimate the values of divergence as:
almost indistinguishable and in an undesirable order.                                                      Divergence for:                            DJS      Dnew
                                                                                                                                                                k=1       Dnew
                                                                                                                                                                           k=2       Dncm
                                                                                                                                                                                      k=1       Dncm
                                                                                                                                                                                                 k=2

   One may also consider the scenarios where flat or smooth sur-                                                     .... .... ....
                                                                                                           A: P = {1, 4 , 0, 64 , 0, 184 , 0}        0.960      0.933      0.903      0.993      0.986
faces are more probable. For example, if the ground truth PMF were                                                   .... .... ....
                                                                                                           B: P = {0, 4 , 1, 64 , 0, 184 , 0}        0.999      0.932      0.905      1.000      1.000
Q0 = {0.30, 0.57, 0.03, 0.10} and H (Q0 ) = 1.467, the amounts of                                                    .... .... ....
                                                                                                           C: P = {0, 4 , 0, 64 , 1, 184 , 0}        0.999      0.932      0.905      1.000      1.000
benefit would become:                                                                                                .... .... ....
                                                                                                           D: P = {0, 4 , 0, 64 , 0, 184 , 1}        0.042      0.109      0.009      0.113      0.010
Benefit for:                  DJS Dnew k=1 D k=2
                                                new   Dncm
                                                         k=1  Dncm
                                                                k=2

A (Pa = {1, 0, 0, 0}, Q0 ):  0.480 0.086                                             0.487 −0.064 0.317    where ....
                                                                                                                   n denotes n zeros. DJS , Dnew , Dncm and Dncm returned
                                                                                                                                               k=2    k=1          k=2

B (Pb = {0, 1, 0, 0}, Q0 ):  0.951 0.529                                             1.044 0.435 0.978     values indicating the same order of divergence, i.e., C ∼ B > A 
C (Pc = {0, 0, 1, 0}, Q0 ): −0.337 −0.038                                            0.212 −0.489 −0.446   D, which is consistent with PMF Q0 . Only Dnew  k=1 returns an order

D (Pd = {0, 0, 0, 1}, Q0 ): −0.049 −0.037                                            0.257 −0.385 −0.245   A > B ∼ C  D. This reinforces the observation by the previous
                                                                                                           example (i.e., Figure 5) about the characteristics of ordering of the
Because the ground truth PMF Q0 would be less certain, the knowl-                                          five measures.
edge of “curved surface” and “with wrinkles and bumps” would
                                                                                                              For both examples (Figs. 5 and 6), because both DJS , Dnew k=2 ,
become more useful. Further, because the probability of flat and
                                                                                                           Dncm
                                                                                                            k=1  and Dncm have consistently returned sensible values, we give
                                                                                                                        k=2
smooth surfaces would have increased, an answer C would not be
                                                                                                           a score of 5 to each of them in Table 1. Dnew
                                                                                                                                                     k=1 appears to be often
as bad as when it is with the original PMF Q. Among the five mea-
                                                                                                           incompatible with other measures in terms of ordering, we score
sures, DJS , Dnew
              k=2 , D k=1 , and D k=2 returned values indicating the
                     ncm          ncm                                                                      Dnew
                                                                                                             k=1 1.
same order of benefit, i.e., B > A > D > C, which is consistent
with PMF Q0 . Only Dnewk=1 orders C and D differently.

   We can also observe that these measures occupy different ranges                                         5.2. London Underground Map (Criterion 9)
of real values. Dnew
                  k=2 appears to be more generous in valuing the
                                                                                                           This survey was designed to collect some real-world data that re-
benefit of visualization, while Dncm
                                  k=1 is less generous. We will ex-
                                                                                                           flects the use of knowledge in viewing different London under-
amine this phenomenon with a more compelling example in Sec-
                                                                                                           ground maps. It involved sixteen surveyees, twelve at King’s Col-
tion 5.2.
                                                                                                           lege London (KCL) and four at University of Oxford. Surveyees
  Figure 6 shows another volume-rendered image used in the sur-                                            were interviewed individually in a setup as shown in Figure 7.
vey. Two iso-surfaces of a head dataset are depicted with translu-                                         Each surveyee was asked to answer 12 questions using either a
cent occlusion, which is a type of visual multiplexing [CWB∗ 14].                                          geographically-faithful map or a deformed map, followed by two

                                                                                                                                                       © 2021 The Author(s) with LATEX template from
                                                                                                               Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

       Figure 7: A survey for collecting data that reflects the use of some knowledge in viewing two types of London underground maps.

further questions about their familiarity of a metro system and Lon-                     approximate the PMF as Q = {qi | 1 ≤ i ≤ 256}, where
don. A £5 Amazon voucher was offered to each surveyee as an ap-                                   
preciation of their effort and time. The survey sheets and the full                               
                                                                                                   0.01/236 if 1 ≤ i ≤ ξ − 8          (wild guess)
                                                                                                  
                                                                                                  0.026       if ξ − 7 ≤ i ≤ ξ − 3    (close)
                                                                                                  
set of survey results are given in Appendix D.                                                    
                                                                                                  
                                                                                              qi = 0.12        if ξ − 2 ≤ i ≤ ξ + 2    (spot on)
                                                                                                  
                                                                                                  0.026       if ξ + 3 ≤ i ≤ ξ + 12 (close)
                                                                                                  
                                                                                                  
   Harry Beck first introduced geographically-deformed design of                                  
                                                                                                  
                                                                                                    0.01/236 if ξ + 13 ≤ i ≤ 256      (wild guess)
                                                                                                  
the London underground maps in 1931. Today almost all metro
maps around the world adopt this design concept. Information-                            Using the same way in the previous case study, we can estimate the
theoretically, the transformation of a geographically-faithful map                       divergence and the benefit of visualization for an answer in each
to such a geographically-deformed map causes a significant loss of                       range. Recall our observation of the phenomenon in Section 5.1
information. Naturally, this affects some tasks more than others.                        that the measurements by DJS, Dnew  k=1 , D k=2 , D k=1 and D k=2 oc-
                                                                                                                                     new    ncm       ncm
                                                                                         cupy different ranges of values, with Dnew k=2 be the most generous

                                                                                         in measuring the benefit of visualization. With the entropy of the
   For example, the distances between stations on a deformed map                         alphabet as H (Q) ≈ 3.6 bits and the maximum entropy being 8
are not as useful as in a faithful map. The first four questions in the                  bits, the benefit values obtained for this example exhibit a similar
survey asked surveyees to estimate how long it would take to walk                        but more compelling pattern:
(i) from Charing Cross to Oxford Circus, (ii) from Temple and Le-
icester Square, (iii) from Stanmore to Edgware, and (iv) from South                      Benefit for:      DJS       Dnew
                                                                                                                      k=1        Dnew
                                                                                                                                  k=2       Dncm
                                                                                                                                             k=1        Dncm
                                                                                                                                                         k=2

Rulslip to South Harrow. On the deformed map, the distances be-                          spot on        −1.765      −0.418      0.287      −3.252     −2.585
tween the four pairs of the stations are all about 50mm. On the                          close          −3.266      −0.439      0.033      −3.815     −3.666
faithful map, the distances are (i) 21mm, (ii) 14mm, (iii) 31mm,                         wild guess     −3.963      −0.416     −0.017      −3.966     −3.965
and (iv) 53mm respectively. According to the Google map, the es-
timated walk distance and time are (i) 0.9 miles, 20 minutes; (ii) 0.8                   Only Dnew
                                                                                                 k=2 has returned positive benefit values for spot on and

miles, 17 minutes; (iii) 1.6 miles, 32 minutes; and (iv) 2.2 miles, 45                   close answers. Since it is not intuitive to say that those surveyees
minutes respectively.                                                                    who gave good answers benefited from visualization negatively,
                                                                                         clearly only the measurements returned by Dnew   k=2 are intuitive. In

                                                                                         addition, the ordering resulting from Dnew k=1 is again inconsistent

   The average range of the estimations about the walk time by the                       with others.
12 surveyees at KCL are: (i) 19.25 [8, 30], (ii) 19.67 [5, 30], (iii)
                                                                                           For instance, surveyee P9, who has lived in a city with a metro
46.25 [10, 240], and (iv) 59.17 [20, 120] minutes. The estimations
                                                                                         system for a period of 1-5 years and lived in London for several
by the four surveyees at Oxford are: (i) 16.25 [15, 20], (ii) 10 [5,
                                                                                         months, made similarly good estimations about the walking time
15], (iii) 37.25 [25, 60], and (iv) 33.75 [20, 60] minutes. The values
                                                                                         with both types of underground maps. With one spot on answer
correlate better to the Google estimations than what would be im-
                                                                                         and one close answer under each condition, the estimated benefit
plied by the similar distances on the deformed map. Clearly some
                                                                                         on average is 0.160 bits if one uses Dnew k=2 and is negative if one
surveyees were using some knowledge to make better inference.
                                                                                         uses any of the other four measures. Meanwhile, surveyee P3, who
                                                                                         has lived in a city with a metro system for two months, provided all
                                                                                         four answers in the wild guess category, leading to negative benefit
   Let Z be an alphabet of integers between 1 and 256. The range
                                                                                         values by all five measures.
is chosen partly to cover the range of the answers in the survey, and
partly to round up the maximum entropy Z to 8 bits. For each pair                          Similarly, among the first set of four questions in the survey,
of stations, we can define a PMF using a skew normal distribution                        Questions 1 and 2 are about stations near KCL, and Questions 3
peaked at the Google estimation ξ . As an illustration, we coarsely                      and 4 are about stations more than 10 miles away from KCL. The

© 2021 The Author(s) with LATEX template from
Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation - arXiv
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

local knowledge of the surveyees from KCL clearly helped their                    This cost-benefit measure was developed in the field of visual-
answers. Among the answers given by the twelve surveyees from                  ization, for optimizing visualization processes and visual analytics
KCL,                                                                           workflows. Its broad interpretation may include data intelligence
                                                                               workflows in other contexts [Che20]. The measure has now been
• For Question 1, four spot on, five close, and three wild guess —
                                                                               improved by using visual analysis and with the survey data col-
  the average benefit is −2.940 with DJS or 0.105 with Dnewk=2 .
                                                                               lected in the context of visualization applications.
• For Question 2, two spot on, nine close, and one wild guess —
  the average benefit is −3.074 with DJS or 0.071 with Dnewk=2 .                  The history of measurement science [Kle12] informs us that pro-
• For Question 3, three close, and nine wild guess — the average               posals for metrics, measures, and scales will continue to emerge in
  benefit is −3.789 with DJS or −0.005 with Dnew k=2 .                         visualization, typically following the arrival of new theoretical un-
• For Question 4, two spot on, one close, and nine wild guess —                derstanding, new observational data, new measurement technology,
  the average benefit is with −3.539 DJS or 0.038 with Dnewk=2 .               and so on. As measurement is one of the driving forces in science
                                                                               and technology. We shall welcome such new measurement devel-
   The average benefit values returned by Dnewk=1 , D k=1 , and D k=2
                                                     ncm         ncm           opment in visualization.
are all negative for these four questions. Hence, unless Dnew  k=2 is

used, all other measures would semantically imply that both types                 The work presented in this paper and the preceding paper does
of the London underground maps would have negative benefit. We                 not indicate a closed chapter, but an early effort to be improved
therefore give Dnew
                 k=2 a 5 score and D , D k=1 , and D k=2 a 3 score
                                     JS    ncm         ncm                     frequently in the future. For example, future work may discover
each in Table 1. We score Dnewk=1 1 as it also exhibits an ordering            measures that have better mathematical properties than Dnewk   and
issue.                                                                         DJS , or future experimental observation may evidence that DJS of-
                                                                               fer more intuitive explanations than Dnew
                                                                                                                      k  in other case studies. In
   When we consider answering each of Questions 1∼4 as perform-                particular, we would like to continue our theoretical investigation
ing a visualization task, we can estimate the cost-benefit ratio of            into the mathematical properties of Dnew
                                                                                                                     k .
each process. As the survey also collected the time used by each
surveyee in answering each question, the cost in Eq. 1 can be ap-                 “Measurement is not an end but a means in the process of de-
proximated with the mean response time. For Questions 1∼4, the                 scription, differentiation, explanation, prediction, diagnosis, deci-
mean response times by the surveyees at KCL are 9.27, 9.48, 14.65,             sion making, and the like” [PS91]. Having a bounded cost-benefit
and 11.40 seconds respectively. Using the benefit values based on              measure offers many new opportunities of developing tools for aid-
Dnew
  k=2 , the cost-benefit ratios are thus 0.0113, 0.0075, -0.0003, and          ing the measurement and using such tools in practical applications,
0.0033 bits/second respectively. While these values indicate the               especially in visualization and visual analytics.
benefits of the local knowledge used in answering Questions 1 and
2, they also indicate that when the local knowledge is absent in the
case of Questions 3 and 4, the deformed map (i.e., Question 3) is              References
less cost-beneficial.                                                          [BBK∗ 18] B EHRISCH M., B LUMENSCHEIN M., K IM N. W., S HAO L.,
                                                                                 E L -A SSADY M., F UCHS J., S EEBACHER D., D IEHL A., B RANDES U.,
                                                                                 P FISTER H., S CHRECK T., W EISKOPF D., K EIM D. A.: Quality metrics
6. Conclusions                                                                   for information visualization. Computer Graphics Forum 37, 3 (2018),
                                                                                 625–662. 2
This paper is a follow-on paper that continues an investigation                [BS04] B ERTINI E., S ANTUCCI G.: Quality metrics for 2D scatterplot
into the need to improve the mathematical formulation of an                      graphics: Automatically reducing visual clutter. In Smart Graphics, Butz
information-theoretic measure for analyzing the cost-benefit of vi-              A., Krüger A., Olivier P., (Eds.), vol. 3031 of Lecture Notes in Computer
sualization as well as other processes in a data intelligence work-              Science. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 77–
flow [CG16]. The concern about the original measure is its un-                   89. 2
bounded term based on the KL-divergence. The preceding paper                   [BSM∗ 15] B ERNARD J., S TEIGER M., M ITTELSTÄDT S., T HUM S.,
(in the supplementary materials) studied eight candidate measures                K EIM D. A., KOHLHAMMER J.: A survey and task-based quality as-
                                                                                 sessment of static 2d color maps. In Proc. of SPIE 9397, Visualization
and use conceptual evaluation to narrow the options down to five,                and Data Analysis (2015). 2
providing important evidence to the multi-criteria analysis of thees
                                                                               [BTK11] B ERTINI E., TATU A., K EIM D.: Quality metrics in high-
candidate measures.
                                                                                 dimensional data visualization: An overview and systematization. IEEE
   From Table 1, we can observe the process of narrowing down                    Trans. Visualization & Computer Graphics 17, 12 (2011), 2203–2212. 2
from eight candidate measures to five measures, and then to one.               [BW08] B OSLAUGH S., WATTERS P. A.: Statistics in a Nutshell: A
Taking the importance of the criteria into account, we consider that             Desktop Quick Reference Paperback. O’Reilly, 2008. 2
candidate Dnew
             k (k = 2) is ahead of D , critically because D often
                                     JS                      JS                [CE19] C HEN M., E BERT D. S.: An ontological framework for sup-
yields negative benefit values even when the benefit of visualization            porting the design and evaluation of visual analytics systems. Computer
                                                                                 Graphics Forum 38, 3 (2019), 131–144. 4, 13, 16
is clearly there. We therefore propose to revise the original cost-
benefit ratio in [CG16] to the following:                                      [CE20] C HEN M., E DWARDS D. J.: ‘isms’ in visualization. In Founda-
                                                                                 tions of Data Visualization, Chen M., Hauser H., Rheingans P., Scheuer-
   Benefit Alphabet Compression − Potential Distortion                           mann G., (Eds.). Springer, 2020. 13
          =
    Cost                        Cost                                           [CG16] C HEN M., G OLAN A.: What may visualization processes opti-
                                                                    (7)
            H (Zi ) − H (Zi+1 ) − Hmax (Zi )Dnew
                                             2 (Z0 ||Z )
                                                 i    i                          mize? IEEE Trans. Visualization & Computer Graphics 22, 12 (2016),
          =                                                                      2619–2632. 2, 3, 4, 10, 13, 14, 15
                                Cost

                                                                                                                           © 2021 The Author(s) with LATEX template from
                                                                                   Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
Min Chen et al. / A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

[CGJM19] C HEN M., G AITHER K., J OHN N. W., M C C ANN B.: Cost-                         [KL51] K ULLBACK S., L EIBLER R. A.: On information and sufficiency.
  benefit analysis of visualization in virtual environments. IEEE Trans.                   Annals of Mathematical Statistics 22, 1 (1951), 79–86. 15
  Visualization & Computer Graphics 25, 1 (2019), 32–42. 4                               [Kle12] K LEIN H. A.: The Science of Measurement: A Historical Survey.
[Che18] C HEN M.: The value of interaction in data intelligence.                           Dover Publications, 2012. 1, 2, 10
  arXiv:1812.06051, 2018. 15                                                             [KS14] K INDLMANN G., S CHEIDEGGER C.: An algebraic process for
[Che20] C HEN M.: Cost-benefit analysis of data intelligence – its broader                 visualization design. IEEE Transactions on Visualization and Computer
  interpretations. In Advances in Info-Metrics: Information and Informa-                   Graphics 20, 12 (2014), 2181–2190. 14
  tion Processing across Disciplines, Chen M., Dunn J. M., Golan A., Ul-                 [Lin91] L IN J.: Divergence measures based on the shannon entropy.
  lah A., (Eds.). Oxford University Press, 2020. 10                                         IEEE Transactions on Information Theory 37 (1991), 145–151. 5
[CSC06] C ORREA C., S ILVER D., C HEN M.: Feature aligned volume                         [LL16] L IU Z., L I Z.: Impact of schematic designs on the cognition
  manipulation for illustration and visualization. IEEE Trans. Visualization               of underground tube maps. ISPRS - International Archives of the Pho-
  & Computer Graphics 12, 5 (2006), 1069–1076. 7                                           togrammetry, Remote Sensing and Spatial Information Sciences (2016),
[CSD∗ 09] C OLE F., S ANIK K., D E C ARLO D., F INKELSTEIN A.,                             421–423. 3
  F UNKHOUSER T., RUSINKIEWICZ S., S INGH M.: How well do line                           [MC10] M AX N., C HEN M.: Local and global illumination in the vol-
  drawings depict shape? In ACM SIGGRAPH 2009 Papers (2009), SIG-                          ume rendering integral. In Scientific Visualization: Advanced Concepts,
  GRAPH ’09. 3                                                                             Hagen H., (Ed.). Schloss Dagstuhl, Wadern, Germany, 2010. 7
[CTBA∗ 11] C HEN M., T REFETHEN A., BANARES -A LCANTARA R.,                              [MJSK15] M ITTELSTÄDT S., J ÄCKLE D., S TOFFEL F., K EIM D. A.:
  J IROTKA M., C OECKE B., E RTL T., S CHMIDT A.: From data analysis                       ColorCAT : Guided design of colormaps for combined analysis tasks.
  and visualization to causality discovery. IEEE Computer 44, 11 (2011),                   In Eurographics Conference on Visualization (EuroVis) : Short Papers
  84–87. 15                                                                                (2015), Bertini E., (Ed.), pp. 115–119. 2
[CWB∗ 14] C HEN M., WALTON S., B ERGER K., T HIYAGALINGAM J.,                            [MK15] M ITTELSTÄDT S., K EIM D. A.: Efficient contrast effect com-
  D UFFY B., FANG H., H OLLOWAY C., T REFETHEN A. E.: Visual mul-                          pensation with personalized perception models. Computer Graphics Fo-
  tiplexing. Computer Graphics Forum 33, 3 (2014), 241–250. 8                              rum 34, 3 (2015), 211–220. 2
[CYWR06] C UI Q., YANG J., WARD M., RUNDENSTEINER E.: Mea-                               [MML11] M ANDRYK R. L., M OULD D., L I H.: Evaluation of emo-
  suring data abstraction quality in multiresolution visualizations. IEEE                  tional response to non-photorealistic images. In Proceedings of the ACM
  Trans. Visualization & Computer Graphics 12, 5 (2006), 709–716. 2                        SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation
[DCK12] DASGUPTA A., C HEN M., KOSARA R.: Conceptualizing vi-                              and Rendering (2011), NPAR ’11, p. 7–16. 3
  sual uncertainty in parallel coordinates. Computer Graphics Forum 31,                  [NSW02] NAGY Z., S CHNEIDE J., W ESTERMAN R.: Interactive volume
  3 (2012), 1015–1024. 13                                                                  illustration. In Proc. Vision, Modeling and Visualization (2002). 7
[DK10] DASGUPTA A., KOSARA R.: Pargnostics: Screen-space metrics                         [PS91] P EDHAZUR E. J., S CHMELKIN L. P.: Measurement, Design, and
  for parallel coordinates. IEEE Trans. Visualization & Computer Graph-                    Analysis: An Integrated Approach. Lawrence Erlbaum Associates, 1991.
  ics 16, 6 (2010), 1017–26. 2                                                             2, 10
[FB09] F ILONIK D., BAUR D.: Measuring aesthetics for information                        [PWR04] P ENG W., WARD M. O., RUNDENSTEINER E.: Clutter reduc-
  visualization. In IEEE Proc. Information Visualization (Washington, DC,                  tion in multi-dimensional data visualization using dimension reordering.
  2009), pp. 579–584. 2                                                                    In IEEE Proc. Symp. Information Visualization (2004), pp. 89–96. 2
[FT74] F RIEDMAN J. H., T UKEY J. W.: A Projection Pursuit Algorithm                     [Rez15] R EZAEI J.: Best-worst multi-criteria decision-making method.
  for Exploratory Data Analysis. IEEE Trans. Computers 23, 9 (1974),                       Omega 53 (2015), 49–57. 5
  881–890. 2
                                                                                         [RLN07] ROSENHOLTZ R., L I Y., NAKANO L.: Measuring visual clut-
[GLS17] G RAMAZIO C. C., L AIDLAW D. H., S CHLOSS K. B.: Col-                              ter. Journal of Vision 7, 2 (2007), 1–22. 2
  orgorical: Creating discriminable and preferable color palettes for infor-
                                                                                         [Roe19] ROETTGER S.: The volume library. http://schorsch.efi.fh-
  mation visualization. IEEE Trans. Visualization & Computer Graphics
                                                                                           nuernberg.de/data/volume/, last accessed in 2019. 7
  23, 1 (2017), 521–530. 2
                                                                                         [Sch20] S CHLAUDT O.: Measurement. In Online Encyclopedia Philos-
[HYC∗ 18] H ONG S., YOO M.-J., C HINH B., H AN A., BATTERSBY S.,
                                                                                            ophy of Nature, Kirchhoff T., (Ed.). 2020. doi:10.11588/oepn.
  K IM J.: To Distort or Not to Distort: Distance Cartograms in the Wild.
                                                                                            2020.0.76654. 2
  2018, p. 1–12. 3
                                                                                         [SEAKC19] S TREEB D., E L -A SSADY M., K EIM D., C HEN M.:
[IN13] I SHIZAKA A., N EMERY P.: Multi-Criteria Decision Analysis:
                                                                                           Why visualize? untangling a large network of arguments. IEEE
   Methods and Software. John Wiley & Sons, 2013. 4
                                                                                           Trans. Visualization & Computer Graphics early view (2019).
[INC∗ 06] I SENBERG T., N EUMANN P., C ARPENDALE S., S OUSA                                10.1109/TVCG.2019.2940026. 13
   M. C., J ORGE J. A.: Non-photorealistic rendering in context: An obser-
                                                                                         [SNLH09] S IPS M., N EUBERT B., L EWIS J. P., H ANRAHAN P.: Select-
   vational study. In Proc. 4th International Symp. on Non-Photorealistic
                                                                                           ing good views of high-dimensional data using class consistency. Com-
   Animation and Rendering (2006), p. 115–126. 3
                                                                                           puter Graphics Forum 28, 3 (2009), 831–838. 2
[Ise13] I SENBERG T.: Evaluating and Validating Non-photorealistic and                   [Ste46] S TEVENS S. S.: On the theory of scales of measurement. Science
   Illustrative Rendering. Springer London, London, 2013, pp. 311–331. 3                    103, 2684 (1946), 677–680. 2
[JC08] J OHANSSON J., C OOPER M.: A screen space quality method for                      [TAE∗ 09]  TATU A., A LBUQUERQUE G., E ISEMANN M., S CHNEI -
   data abstraction. Computer Graphics Forum 27, 3 (2008), 1039–1046. 2                    DEWIND   J., T HEISEL H., M AGNOR M., K EIM D.: Combining auto-
[JC10] JÄNICKE H., C HEN M.: A salience-based quality metric for vi-                       mated analysis and visualization techniques for effective exploration of
   sualization. Computer Graphics Forum 29, 3 (2010), 1183–1192. 2                         high-dimensional data. In IEEE VAST (2009), pp. 59–66. 2
[Jun19] J UNG            Y.:                         instantreality       1.0.           [Tal20] TAL E.: Measurement in science. In The Stanford Encyclopedia
   https://doc.instantreality.org/tutorial/volume-rendering/, last accessed in              of Philosophy, Zalta E. N., (Ed.). 2020. 2
   2019. 7                                                                               [TBB∗ 10] TATU A., BAK P., B ERTINI E., K EIM D., S CHNEIDEWIND
[KARC17] K IJMONGKOLCHAI N., A BDUL -R AHMAN A., C HEN M.:                                 J.: Visual quality metrics and human perception: An initial study on
  Empirically measuring soft knowledge in visualization. Computer                          2D projections of large multidimensional data. In Proc. Int. Conf. on
  Graphics Forum 36, 3 (2017), 73–85. 3                                                    Advanced Visual Interfaces (Roma, Italy, 2010), pp. 49–56. 2

© 2021 The Author(s) with LATEX template from
Computer Graphics Forum © 2021 The Eurographics Association and John Wiley & Sons Ltd.
You can also read