Evaluating Reordering Strategies for Cluster Identification in Parallel Coordinates

Page created by Claude Smith

Travel

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Eurographics Conference on Visualization (EuroVis) 2020 Volume 39 (2020), Number 3
M. Gleicher, T. Landesberger von Antburg, and I. Viola
(Guest Editors)

Evaluating Reordering Strategies
for Cluster Identification in Parallel Coordinates
Michael Blumenschein1 , Xuan Zhang2 , David Pomerenke1 , Daniel A. Keim1 , and Johannes Fuchs1
1 University of Konstanz, Germany 2 RWTH Aachen University, Germany

(a) Medium clutter, Similar (b) High clutter, Similar (c) Medium clutter, Dissimilar (d) High clutter, Dissimilar

Figure 1: A dataset with three clusters and two different clutter levels is sorted by similarity (a–b) and dissimilarity (c–d) of neighboring axes
pairs. Clusters are more salient when arranging dissimilar dimensions next to each other. We show that in cluttered datasets, participants are
more accurate and more confident when performing cluster identification tasks on such a layout.
Abstract
The ability to perceive patterns in parallel coordinates plots (PCPs) is heavily influenced by the ordering of the dimensions.
While the community has proposed over 30 automatic ordering strategies, we still lack empirical guidance for choosing an
appropriate strategy for a given task. In this paper, we first propose a classification of tasks and patterns and analyze which
PCP reordering strategies help in detecting them. Based on our classification, we then conduct an empirical user study with
31 participants to evaluate reordering strategies for cluster identification tasks. We particularly measure time, identification
quality, and the users’ confidence for two different strategies using both synthetic and real-world datasets. Our results show that,
somewhat unexpectedly, participants tend to focus on dissimilar rather than similar dimension pairs when detecting clusters, and
are more confident in their answers. This is especially true when increasing the amount of clutter in the data. As a result of these
findings, we propose a new reordering strategy based on the dissimilarity of neighboring dimension pairs.
CCS Concepts
• Human-centered computing → Empirical studies in visualization;

1. Introduction fluences the perception of visible patterns [SKKAJ15]. This prob-
Parallel coordinates plots (PCPs) [Ins85, Ins09b] are a popular and lem is particularly given in PCPs, as line crossing and overplot-
well-researched technique to visualize multi-dimensional data. Di- ting distort salient structures. Therefore, the community has pro-
mensions are represented by vertical, equally spaced axes. Data posed a multitude of enhancements, such as sampling [ED06], edge
records are encoded by polylines, connecting the respective val- bundling [MM08], interactive highlighting [MW95], and the usage
ues on each axis. PCPs have been applied to practical applica- of transparency [JLJC05] to reduce the impact of visual clutter.
tions of various domains [JF16], such as finance [AZZ10], traffic
The ordering of axes plays a fundamental role in the design of a
safety [FWR99], and network analysis [SC∗ 05]. As discussed by
PCP and has a strong effect on the overall pattern structure [JJ09].
Andrienko and Andrienko [AA01], PCPs are suited for a multitude
In contrast to data preprocessing, sampling, dimension filtering,
of analysis tasks, such as cluster, correlation, and outlier analysis.
and other enhancements, reordering does not remove data from the
Compared to other visualizations for multi-dimensional data (e.g., PCP [PWR04, PL17], but changes the visual structure among neigh-
RadVis [HGM∗ 97], MDS and PCA projections, scatter plots, and boring axes. Depending on the user’s analysis goal, some patterns
scatter plot matrices), PCPs have the advantage to trace data records are more interesting than others [DK10]. As a result, more than 30
and patterns across a large set of dimensions. Empirical studies have different ordering strategies have been developed by the community
shown that PCPs outperform scatter plots in clustering tasks, outlier, to support a multitude of tasks. Some of these strategies group simi-
and change detection [KAC15], but are less suited for correlation lar dimension pairs [ABK98], try to avoid line crossings [DK10],
analysis [LMvW10, HYFC14] and value retrieval [KZZM12]. or put the most important dimensions first [YPWR03]. However,
A major challenge of visualizations is visual clutter, which in- our community lacks empirical guidance and recommendations for

© 2020 The Author(s)
Computer Graphics Forum © 2020 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.

This is the accepted version of the following article: M. Blumenschein, X. Zhang, D. Pomerenke, D.A. Keim, J. Fuchs (2020), Evaluating Reordering Strategies for Cluster Identification in Parallel Coordinates,
Computer Graphics Forum, 39(3), which has been published in final form at http://onlinelibrary.wiley.com. This article may be used for non-commercial purposes in accordance with the Wiley Self-Archiving
Policy [http://www.wileyauthors.com/self-archiving].