SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of

Page created by Judith Rojas
 
CONTINUE READING
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
Bachelor’s degree Project

SpotiVis
- Finding new ways of visualizing the spread of
popular music

                                   Author: Dennis Fredsson
                                   Supervisor: Rafael Messias Martins
                                   Semester: Spring 2021
                                   Subject: Computer Science
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
Abstract
Simply by reading data and statistics of the charting positions of popular songs
on global and national music charts, it is hard to understand how the popularity
of songs, albums, or artists within pop music truly behave over time. However,
analyzing the data using visualizations as means of communication might
provide us with new points of view and new insights into how the popularity
of contemporary popular music behaves over a longer period. This is the
hypothesis that we intend to investigate in this thesis. An interactive
visualization application (presented as a website) has been developed based on
data from “Daily Top 200” lists provided by Spotify. A survey was then used
to evaluate the application, with the results suggesting that new and interesting
insights into the trends in the popularity of music can be gained from the
proposed prototype.

   Keywords: Visualization, Music, Spotify, Pop, Popular music, Chart,
Streaming Service

                                        2
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
Contents
1  Introduction ________________________________________________ 4
  1.1 Background ___________________________________________ 4
  1.2 Related work __________________________________________ 6
  1.3 Problem formulation ____________________________________ 7
  1.4 Motivation ____________________________________________ 8
  1.5 Results _______________________________________________ 9
  1.6 Scope/Limitation _______________________________________ 9
  1.7     Target group _________________________________________ 10
  1.8     Outline ______________________________________________ 11
2 Method __________________________________________________ 12
  2.1     Research Project ______________________________________ 12
  2.2     Research methods _____________________________________ 14
     2.2.1 Gathering data _______________________________________ 14
     2.2.2 Analyzing and visualizing data __________________________ 14
     2.2.3 Validating ___________________________________________ 15
  2.3     Reliability and Validity _________________________________ 17
  2.4     Ethical Considerations__________________________________ 18
3 Theoretical Background _____________________________________ 19
4 Research project – Implementation_____________________________ 23
  4.1 Gathering Data __________________________________________ 23
  4.2 Visualizing data _________________________________________ 25
     4.2.1 Presentation layer _____________________________________ 25
     4.2.2 Data access layer _____________________________________ 32
5 Results ___________________________________________________ 33
  5.2 Aggregated results _____________________________________ 33
6 Analysis__________________________________________________ 38
7 Discussion & Future Work ___________________________________ 43
8 Conclusions _______________________________________________ 45
References ___________________________________________________ 46
Appendix A __________________________________________________ 52
     Link to the visualization tool ________________________________ 52
Appendix B __________________________________________________ 53
     Value of Visualization form _________________________________ 53
Appendix C __________________________________________________ 54
     Gathering participants message (note: translated from Swedish) _____ 54
Appendix D __________________________________________________ 55
     Google forms ICE-T questionnaire ____________________________ 55

                                     3
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
1     Introduction
This is a Bachelor’s thesis in Computer Science, with a focus on visualization.
The thesis aims to develop a visualization system to provide new insights into
the "pulse of popular music" on a national as well as global level, using
temporal data from the music listening service Spotify.

Merely looking at the data from national music charts, or “top lists”, around
the world, day by day, might not be an effective way of obtaining the big
picture of how a song or an artist performs over time on national charts around
the world. This is partly due to the large amount of data available.
SpotifyCharts.com contains more than 30 million data points across its
timeline starting on 1 January 2017, considering the different Top 200/Viral
50 lists, updated daily/weekly, in the 70 countries which has this data available.
Exploring and interpreting such a large data set, given its geospatial
characteristics, can be hard without appropriate support [1] [2].

Today, music labels use quite simple metrics to improve and analyze their data
[3]. This thesis aims to investigate how visualization can be used to interpret
and analyze data gathered from Spotify, in an attempt to gain a new perspective
on how the spread of popular music ensues. Although we only take a small
initial step into this area, we believe that the use of interactive visualization
can, in the long term, potentially help music professionals use this type of data
to their advantage to optimize or maximize the spread and popularity of new
releases.

1.1           Background
Spotify is one of the world’s leading audio and music streaming services,
originating from Sweden. As of today, over 30 million songs are available for

                                        4
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
listening, supported on all the most popular platforms (desktop and mobile).
Since their advent, streaming services such as Spotify have provided easy
access to music for the listeners. A computer or a phone with an internet
connection is all that is necessary to access the massive music library which
Spotify brings to its users. In the U.S, the introduction of technology offering
the ability to stream music has increased the number of people listening to
music from 2016 to 2017 and, in each year spanning from 2015 to 2017, the
amount of music consumed by each individual has also dramatically increased
[4] [5].

In order to provide a way of measuring the current most popular songs of today,
music charts provided by websites/magazines such as Billboard [6] in the
U.S.A, or Sverigetopplistan [7] in Sweden gather metrics and compiles them
into a list of the current most popular songs. These metrics historically included
sales of singles or albums, but with the advent of the music streaming industry,
they now include digital metrics as well, such as digital sales, downloads, and
streams. In this thesis, the charts used as brickwork for the data gathered are
curated by Spotify itself and thus employ only digital streams as a metric.
Spotify provides these charts through their website SpotifyCharts.com [8].

Visualization is any technique for creating images, diagrams, or animations to
communicate a message [9]. This visual imagery provides an effective way of
communicating both abstract and concrete ideas. Transforming temporal daily
chart data into imagery through visualization to find new pathways of
conveying chart performance of a piece of popular music is the challenge we
are dealing with in this thesis.

                                        5
SpotiVis Bachelor's degree Project - Finding new ways of visualizing the spread of
1.2           Related work
Music visualization is a large area of research, with a diverse range of subareas
touching on different aspects of the challenge [10]. Some works focus, for
example, on the structure of music itself, such as highlighting notable features
in modern musical compositions [11] or conveying information about interval
quality, chord quality, or chord progression in digital music [12]. These are not
directly related to this thesis, since we focus on the visualization of a large-
scale music collection instead of individual songs.

Regarding the visualization of music collections, the existing works mainly
focus on various attributes of musical information aiming to provide new
perspectives on personal musical archives that go beyond simple plain file lists
[10]. They assist in tasks such as editing, exploring, and navigating these
collections. Muelder et al. [13] use visualization to provide a graph-based
visual interface of a music listener’s digital music collection, based on the
content of the music itself instead of pre-defined tag information (since they
can often be incorrect or misleading). Songrium, a music browsing assistance
service, uses visualization to explore what is referred to as a “Web of Music”.
This Web of Music showcases the relationship between original pieces of
music and derivative songs, offering a way to discover new music for the
listener [14]. This thesis differs from these examples in that the features that
we use for the visualizations do not come from the music attributes themselves,
but from the geospatial trends in popularity of the songs and artists (i.e. their
ranks, over time, in the top lists around the world).

There has also been research done in visualizing music collections regarding
popularity. Mashima et al. [15] gathered data from last.fm (a music
recommendation service) from one point in time (July 2009) and visualized the
popularity of the top 250 artists by mapping their similarity to 2D coordinates

                                        6
and using font size to represent popularity. Sprague et al. [16] designed a
“democratic music jukebox” with the purpose of giving all individuals present
at social gatherings an equal influence over the music played. The collected
votes were then visualized to group participants with similar music taste
together, spreading social awareness. Zhang et al. [17] and Baur et al. [18]
[19] performed work involving individual listeners’ listening history to
develop visualizations of each user’s music listening history. These papers are
related to this thesis, but they either do not include large-scale temporal or
geospatial factors, or use popularity in terms of one user’s (or a small group of
people’s) preferences. In this thesis, we define popularity as how popular a
song is from a music chart perspective, including data from all Spotify listeners
in a region (or on a global level).

As a summary of the discussed related work, the majority of visualization
methods for musical information are founded upon differing types of content-
or context-based attributes or quantifying the similarity between various pieces
of music. Geographical data is rarely taken into consideration, and visualizing
listening habits in conjunction with geospatial data in a scientific manner is not
a common method (as noted by Hauger et al. [20]). The solution proposed in
this thesis is based on the visualization of music popularity trends in time and
space, which is a subarea of information visualization. A discussion on this
background is presented in Chapter 3.

1.3           Problem formulation
The service SpotifyCharts.com offers a large amount of useful data to music
enthusiasts, but it does not offer visualization or any other type of aggregation
or summarization of their data. This means that record companies, artists, or
anyone who has an interest in the area have to rely on browsing the charts
individually, in their original textual form, to gain an understanding of how,

                                        7
for example, a single by an artist performs on the music charts nationally, and
around the world. Individual charts can show, at any single time, a small
number of the most popular songs for a specific geographical region (or a
global aggregation) in a short time span—either a day or a week. This interface
does not facilitate other more complex exploratory tasks, such as: getting an
overview of the data over larger time spans; comparing the performance of
different songs and/or artists over time; or connecting the time and space
characteristics of the popularity of songs and/or artists.

This identified gap can be decomposed into the following two research
questions:

      1. How can we develop a tool that provides visual representations of data
         from music charts to support exploratory analyses of trends over time
         and space?

      2. Can the created visual representations provide valuable insights on the
         chart performances of songs and artists?

These research questions are henceforth referred to in this thesis as RQ1 and
RQ2.

1.4            Motivation
Contemporary music charts on a national level have been around for many
decades. In the U.S, Billboard published its first national music chart on July
27, 1940 [21], while in Sweden the first national music chart was in 1975 with
Sverigetopplistan [7]. However, given the related work discussed in the
previous section, it seems that compiling these temporal music charts into
visualizations has not attracted much interest. This could be, in part, due to the

                                        8
lack of access of the general public to the data which supports these music
charts, something which Spotify has changed since the company provides
music charts like these both in the Spotify client and through
SpotifyCharts.com [8].

This thesis aims to propose one possible solution for representing and
interacting with temporal music charts. Our main goal is to expand upon the
tools supplied today for monitoring and gathering information of charting of
popular music, for example, Spotify’s “Discover” section where users can find
and browse songs and artists. We believe that visualization is an interesting
and under-explored way to analyze this type of data and, through the
development and evaluation of our prototype, we aim to provide an initial step
of design ideas and a solution that could help raise interest in the area.

1.5           Results
We propose a new way of representing music charts over time, in the form of
visualizations realized through a tool developed during the process of creating
this thesis. We evaluate this developed tool via a method that involved
demonstrating it for test subjects, asking them to perform certain tasks with the
tool, and then asking the subjects to fill out a standard questionnaire (ICE-T)
as defined by Wall et al.’s Value-Driven Visualization Evaluation [22]. This
methodology is further discussed in Section 2.2.3.

1.6           Scope/Limitation
Scope. The music chart data collected is limited to the streaming service
Spotify, and in temporal terms only as far back as 1 January 2017. Therefore,
the examples showcased, and songs, albums, or artists selected are based on
what artists and songs were in the top 200 music charts for the period 1 January

                                        9
2017 – 31 December 2020. In terms of national charts available, it is also
limited to the countries where Spotify is available. Furthermore, due to the
sheer amount (70 countries) of national music charts available, not all countries
are included. The biggest music markets in terms of export and import are
prioritized. This is further discussed in Section 4.1.

Limitations. The tool proposed in this thesis is an initial prototype (or a proof
of concept) that uses simple and familiar visualizations. The user experience
and the visualizations could be improved with more development iterations,
including gathering requirements for the tool directly with the target users,
which was not done. The user study used to measure the value of the
visualization tool was a small-scale procedure, with only eight participants in
total, and no information on their background or music interests was gathered.
More participants would have allowed us to obtain more statistically
significant results. Additionally, the prototype was not compared to any other
existing similar tool, because we could not find a suitable candidate during our
literature review and search for related works. Such a comparison might have
brought to light more interesting results, weaknesses, and opportunities for
improvement.

1.7          Target group
In this thesis, we consider music consumers as our target group, i.e. anyone
who might be interested in using tools that provide insight into music charts
and their trends. Other potential target groups for spatiotemporal visualizations
of music charts could be music streaming services, such as Spotify1; music

1
    https://www.spotify.com/

                                        10
companies, such as Universal Music2; or music creators and/or artists (but they
are not considered in this initial work).

1.8          Outline
This report is organized as follows. In Chapter 2 we discuss the methodological
framework and research methods employed. Chapter 3 provides a theoretical
background for the project and the knowledge gap which it intends to reduce.
In Chapter 4 we discuss the implementation of the data-gathering script, the
implementation of the visualization tool, and the demonstration and survey
which results we present in Chapter 5. Chapters 6 and 7 analyze and discuss
the results and knowledge gained from this thesis work, as well as discuss
potential future work. In Chapter 8 we conclude the thesis work.

2
    https://www.universalmusic.com/

                                        11
2       Method

2.1          Research Project
The catalyst of this thesis work was the knowledge gap identified between
historic music charts on a global and national level, and the limited existing
visual representations of these music charts over time. In the interest of
reducing this knowledge gap, a design science methodology was employed. In
design science (DS), there are six clearly defined steps [23]:

      1. Problem identification and motivation
             In this thesis, the problem, as mentioned, is the identified gap
             between the availability of data from music charts for popular
             music and the lack of available tools to visualize spatiotemporal
             trends in these music charts in new ways. The motivation for this
             was to expand upon the subject and to provide the groundwork for
             future research in the area.

      2. Definition of the objectives for a solution
             The objective of the solution to this problem was to provide a tool
             for creating visualizations from gathered data to supply the target
             users with new insights into the development of the gathered music
             charts over time and space. In a short summary, the requirements
             that guided the design of the visualization tool were to support
             users in: (1) Getting an overview of the chart data over large time
             spans; (2) Comparing the performance of different songs and/or
             artists over time; and (3) Connecting the time and space
             characteristics of the popularity of songs. These requirements were
             not gathered in any formal or structured way, and were derived

                                        12
mainly from the characteristics of the dataset itself and the
      experiences of the thesis author and supervisor.

3. Design and development
      The artifact is the visualization tool that supports creating
      visualizations of these music charts, in accordance with user input
      of what artist, song, and time limitations may be of interest to the
      target user. Gathering the data necessary for this became an
      objective in itself to fit into the greater objective of producing this
      prototype.

4. Demonstration
      After producing some versions of the prototype of the artifact to be
      created, the artifact was deployed to be evaluated by the target
      users specified in Section 1.7 (i.e. music consumers and the public
      in general). The artifact was hosted online, available via a web
      address, to act as a demonstration of the artifact.

5. Evaluation
      To evaluate the artifact, the Value-Driven Visualization Evaluation
      methodology (visvalue, in short) was used [22], which is centered
      around a standard questionnaire for user feedback on visualizations
      (the ICE-T). Details are provided in Section 2.2.3.

6. Communication
      The results of the aforementioned evaluation were communicated
      in Chapters 5 and 6 of this thesis.

                                  13
2.2          Research methods
The methods used in this thesis can be subdivided into three main parts:

2.2.1 Gathering data
The website SpotifyCharts.com [8] provided a way to easily access temporal
data for historic daily or weekly Spotify charts. The lists available are: Daily
top 200, Weekly top 200, Daily viral 50, and Weekly viral 50. These charts are
available and updated daily in the Spotify client; however, the client itself does
not provide historical data, only the current, contemporary charts which are
updated on a daily/weekly basis. The charts available in the ordinary Spotify
client are also limited to the top 50 songs of the day.

The Daily top 200-chart lists the 200 most listened songs on Spotify of the past
day. Furthermore, these charts are divided into each separate Spotify market.
There is a global chart, and national charts such as the US, Great Britain, South
Korea, Sweden, etc. (around 30-40 national charts in total).

In order to gather this data, a script written in the programming language
Python3 is used to download chart data from a selection of the available
markets on SpotifyCharts [8]. This data is then stored in a database.

Since the visualizations created by the tool depends entirely on this data, this
part of the method is one important piece in the process of answering RQ1.

2.2.2 Analyzing and visualizing data
When all the data is gathered, it must be analyzed and visualized in manners
that aim to tell a story that is not apparent from just looking at the raw data.
The large amount of data was pre-processed into a new representation of the

3
    https://www.python.org/

                                        14
data. This was achieved through the development of a web application written
in JavaScript4. The implementation of this proposed visualization prototype is
a central part of answering RQ1 and is discussed in more details in Chapter 4.

2.2.3 Validating
To assess the value of the prototype, this thesis employed Value-Driven
Visualization Evaluation (or visvalue, in short). This methodology was created
with the intention to help researchers, designers, and practitioners to determine
the value of visualizations. It is centered around the ICE-T questionnaire,
which is composed of four different sections: Insight, Confidence, Essence,
and Time, which in turn are composed of questions that are answered on a 7-
point Likert scale, where 1 represents “Strongly Disagree” and 7 represents
“Strongly Agree”. The template of the ICE-T questionnaire is supplied in
Appendix B of this thesis [22] [24].

In more details, the ICE-T’s four sections are, according to Wall et al. [24]:
      •   I – “A visualization’s ability to spur and discover insights and/or
          insightful questions about the data.”
      •   C – “A visualization’s ability to generate confidence, knowledge, and
          trust about the data, its domain and context.”
      •   E – “A visualization’s ability to convey an overall essence or take-away
          sense of the data.”
      •   T – “A visualization’s ability to minimize the total time needed to
          answer a wide variety of questions about the data.”

To interpret the scores of the ICE-T questionnaire, Wall et al. [24] suggest that
an average score of 5 or above in any of the four ICE-T categories speak for a
strength in the visualization, while an average score of 4 or lower represents a

4
    https://www.javascript.com/

                                         15
weakness in the visualization. Based on their research, for visualizations to be
deemed as valuable and/or good, they should result in an overall collective
average score of 5 or higher, while visualizations that result in an overall
collective average score of 4 or lower should be reconsidered and have their
design revised. We revisit this score interpretation in Chapter 6 where the
questionnaire results are analyzed.

The overall procedure consisted of (1) gathering individual participants to
partake in the evaluation, (2) demonstrating the prototype, (3) suggesting some
specific tasks for participants to perform, then (4) each of the participants filled
the ICE-T questionnaire to evaluate their experience of using the
demonstration.

Participants were gathered via Linnaeus University’s workspace in Slack,
known as CoursePress. There, in the #general channel, which as of June 4,
2021, has 4,625 members, a message was posted inviting potential participants
and explaining this thesis work, its intent, what was expected of each
participant, and a screenshot of the application. Participants were then gathered
either by reacting to the post with a “hand up” reaction, or by sending a direct
message. The message posted is featured in Appendix C of this thesis.

The experiment was conducted asynchronously and remotely, due to the
COVID-19 pandemic which was ongoing during the writing of this thesis.
Each volunteer was sent a Google Forms link containing a link to the web-
hosted prototype visualization tool, the instructions for the tasks to be
performed (creating certain visualizations to motivate showcasing particular
functionalities), and the ICE-T questionnaire formatted into Google Forms-
questions. Each participant was also allowed to play with the tool in any way
they wanted, in addition to these instructions. Eight participants in total

                                         16
performed the instruction tasks, filled in all the ICE-T questions, and
completed the user study. A link to this form, and a reproduction of its
components and structure is featured in Appendix D.

The validation part of the methodology used in this thesis is intended to support
in answering RQ2.

2.3         Reliability and Validity
Concerning the reliability of this thesis work, it is important to mention some
points, mainly touching upon the gathering of data and the representation of
data in the visualization tool.

When gathering the data, there were some issues of gaps in the timelines. For
example, when downloading the daily music charts for the national chart of
Brazil, there appeared to be some sort of temporary outage on the end of
Spotify, where data for an extended period of time would be missing (a few
months, in this case). To address this, the script had to be run again at a later
date to fill the gaps in the incomplete temporal data. Furthermore, there is no
guarantee that SpotifyCharts will continue its operations in the future, which
would undermine the entire data-gathering part of this thesis work, since it
completely relies on SpotifyCharts and its supply of temporal music charts.
However, we possess a local backup of the data gathered, so this dilemma
would only affect future data.

Moving on, the design process of the visualization tool was in its nature
exploratory. This exploratory process is an inherent part of Design Science
[23]. It is almost impossible for an exploratory process to yield the same
results, even if all the variables that were surrounding it were the same. In our
case, they include the author, the time during which the work is performed, and

                                       17
the contemporary technology. This also presents us with some problems for
reliability, which must be taken into consideration.

Regarding the survey which was used to evaluate the value of the thesis work,
the reliability may also be negatively affected by the people who participated
in this survey. The number of people who participated, the individual
personality of each test subject, and something as simple as the mood the test
subject was currently in when filling out the survey questionnaire are all factors
that may have impacted the reliability of the results yielded.

When considering the validity of this thesis, the evaluation method of visvalue
[22] is of great assistance. The score of each of the sub-questions of the survey
questionnaire can be compiled and analyzed, and the value of the visualizations
determined. This way of evaluating visualizations was developed with keeping
the integrity of validity intact in mind. Using a standardized, well-supported
instrument is superior to developing a custom, specifically made questionnaire
for this thesis. Although the usage of a standardized questionnaire might not
completely represent the tool in every way, we employed a methodology that
has been validated and proposed by the scientific community as a qualitative
way to evaluate visualizations, so we eliminate the risk of making a
customized, but imprecise and invalid form.

2.4          Ethical Considerations
The ethical considerations of this thesis were related to the validating part of
the methodology. The confidentiality and anonymity of each of the test
subjects were secured through a Google forms-questionnaire, which offers
functionality to make answers to the questionnaire completely anonymous.
This utility was used to realize the form supplied through visvalue and
distribute it to participants.

                                        18
3     Theoretical Background
The concept of using visualization and visual analytics to analyze and provide
a way of understanding time-dependent data has been shown to be successful
for a long time [25]. Visual analytics can be defined as a multifaceted research
area where scientists that specialize in information visualization, scientific
visualization, and geographic visualization intimately work together with
researchers from analytical backgrounds, for instance, statistical analysis and
modeling, geographical analysis and modeling, and machine learning and data
mining, in finding new solutions to complicated problems on a societal scale.
Geo-spatial visual analytics (or geovisual analytics, in short) aims to solve
problems involving geographical space and events, objects, and processes
populating this geographical space. Since the majority of the objects occupying
space either arise or change in time, geovisual analytics must give apt
attentiveness to time and the relationship between space and time [26].

Visual analytics intends to merge the strengths of electronic data processing,
such as the modern computer is capable of, and human processing [1], which
“can be characterized as an information-processing system, which encodes
input, operates on that information, stores and retrieves it from memory, and
produces output in terms of actions.” [2] Visualization, as a medium for
humans and computers to converse and cooperate through graphic
representations, is the manner through which this can be achieved. To analyze
Spatio-temporal (which infers to time and space, where spatial refers to space
and temporal refers to time) data and producing solutions to Spatio-temporal
problems, seamless and sophisticated synergies are essential [1].

Today, analysis of temporal data which changes through space and time is not
limited to professional analysts [1]. To give an example related to this thesis
work, Spotify offers a yearly analysis of the users listening habits through their

                                        19
“YYYY Wrapped”-feature [27] (where YYYY is the current year, 2020 at the
time of writing of this thesis). Spotify itself defines it as “a special hub in the
app with some cool stats on the songs, artists, and podcasts you discovered
throughout 2020.” [27]. This supports that numerous citizens would be
interested in taking part in the Spatio-temporal analysis [1].

From the point of view of the researcher, the goal is to find techniques to
mitigate the complexity of the topical data and discover ways to make
analytical tools available and easily used for the wide community of
prospective users, to encourage Spatio-temporal thinking and contribute to
solving a wide array of problems [1].

To provide a way for users to interact and explore geographical spatial-
temporal information, user interfaces (or UI, for short) can be used as tools to
fulfill that goal. Suitable user interfaces for uncovering the potential of spatial-
temporal geovisual analytics tools are integral if they are to be used efficiently
and effectively [28].

In the visualization tool developed in conjunction with the writing of this
thesis, which is outlined in Section 4.2, a map of the world is featured. In this
map chart, each country where respective Daily Top 200 data was available,
was colored in accordance with how high the actual data attribute for that
respective country was. By data attribute, that entails a song by an artist, on a
scale from 1-200, which represents the position of the particular song on the
Daily Top 200 on that particular date which was currently selected in the map
chart.

The colormap chosen to represent this 1-200 interval, was the Viridis colormap
[29]. The reasoning behind the usage of this particular colormap was that it is

                                         20
a perceptually uniform colormap, i.e. a colormap where equal steps in data are
perceived by the human brain as equal steps in the color space, which has been
found as the best choice regarding colormap for the majority of applications
[30]. This can be motivated by research that has found that the human mind
perceives changes in lightness as changes in data much better than, for
example, changes in hue. Therefore, colormaps that have uniformly increases
in lightness over the scale of the colormap are clearer for the viewer. The
perceptually uniform colormap is an excellent example of such a colormap
[29] [30].

This “world map”-visualization of geographical data was prompted by virtue
of the usage of maps offering a recognizable and familiar way to present data
separated by regions, such as continents or countries, which can be overlayed
with coloring or heat maps in order to relay information to the viewer [15]. The
usage of a world map, in particular, was motivated by the fact that the data
gathered spanned countries on almost every continent, and thus could not be
limited to only, for example, Europe.

Furthermore, since in our case the geo-spatial data also changes over time,
cartographic animation has been employed. This has emerged as an effective
visualization technique through its innate capability to show interrelations
between geo-spatial data’s locations, attributes, and time. While these types of
animations have been employed in communicating geo-spatial temporal
information, they also have been impeded by the lack of interactivity for the
user [31]. In light of this, we have provided the user with tools to interact with
the geo-spatial temporal data after its initial manifestation.

The other part of the visualizations employed in the tool outlined in Section
4.2 features a line chart. This line chart was motivated because it a simple and

                                        21
familiar visualization technique for time series data. These charts are in nature
intuitive and make it easy to discern key events corresponding to lines on the
x- and y-axis [32]. The simplicity and comprehensibility of a line chart are
important to our application of such a chart, in which the x-axis represents time
in the shape of isolated dates, and the y-axis the charting position of a song.
However, because of this inherent simplicity, this is a further provocation of
employing the aforementioned map chart as a way to complement this
uncomplicated way of visualizing data.

                                       22
4     Research project – Implementation

4.1 Gathering Data
Firstly, to realize the implementation of the tool to be used to target the goal of
this thesis, the raw data from Spotify had to be gathered and compiled into a
manageable collection. To accomplish this, a script in the programming
language Python [33] was developed. The reasoning for using this particular
programming language is the experience of the author with the language, and
because of the existence of a very useful (and related) python package by the
name of fycharts [34]. A python package can be defined as pre-defined bits
and pieces of code that can easily be integrated with newly developed code.

What fycharts provided was a way to extract chart data from the Daily Top 200
music chart lists from SpotifyCharts.com. This package was developed to fill
the gap left by Spotify when the streaming service deprecated their official
Spotify charts API [34] (short for Application Programming Interface). One
could say that this, then, was the “unofficial Spotify Charts API”. The API
provides ways to easily target Daily Top 200-lists, delimited by country, start
of date range, and end of date range.

The application developed with the aid of fycharts downloaded national Daily
Top 200 music chart lists which were provided by SpotifyCharts.com and
compiled these lists into a database. The date range for the data stored in this
database was 2017-01-01 to 2020-12-31. This interval was chosen due to
SpotifyCharts.com’s temporal data only dating back to the beginning of 2017.
The end of the date range was chosen regarding the time constraints imposed
on this thesis work, which was February through May 2021. Instead of
intermittently updating the chart data, for example, monthly, the decision was
made to use the end of 2020 as an end of the date range. This is also in part

                                        23
because 3 full years was sufficient, and a more well-defined interval to be used
in the visualization tool, instead of having a few months of 2021-data featured
as well.

We mentioned earlier the usage of a database. The decision of using a database
was made due to the ease of accessing data through SQL [35], which is an
industry-encompassing and tested tool to communicate with and fetch data
from a database. SQL provides many ways of filtering data according to, in our
example, artist, song, start date, end date, and national region, and in addition
in a fast manner.

However, there were some complications in relation to the development and
usage of this data-gathering application. Before all national Daily Top 200 lists
could be gathered, SpotifyCharts.com implemented some changes to how data
could be fetched from their website. A DDoS mitigation service (in
SpotifyCharts.com’s case, Cloudflare) [36] was employed by the website, to
prevent malicious attacks intended to overflow the website with excessive
requests and as a result – create a denial-of-service in which the website
becomes unavailable to its intended users.

There were some efforts made in this thesis work to find a way around this
protection service to gather more national Daily Top 200 lists for usage in the
upcoming visualization tool – however, due to time constraints of this thesis, a
decision was made against it. Therefore, in the visualization tool, not all
national markets where Spotify is available are fully featured. Additionally, in
a few select cases, such as data from the Brazil market, there are some instances
of data being incomplete.

                                       24
4.2 Visualizing data
The second part of the implementation of this thesis consisted of developing a
tool for visualizing the data gathered in the previous sub-chapter in order to
support the investigation of spatiotemporal trends.

This visualization tool was developed in the programming language JavaScript
[37], divided into a client-side application as the presentation layer accessible
through a web browser, and a server-side application as the data access layer.
This software structure is often referred to as front-end and back-end,
respectively in the software development industry. The user interface of the
presentation layer is shown in Figure 1.

4.2.1 Presentation layer

            Figure 1. An overview of the complete user interface.

The front-end part of the application handles interpreting data that is requested
and delivered from the back-end. These requests and what exact data are to be
requested are defined by the user by a small interface when accessing the
application through a web browser. The user can input; Artist, Song, Start date,

                                       25
and End date. The user can then press a button with the label “search”, the
application forwards these filters and communicates with the server-side to
fetch the requested data according to these parameters. This part of the user
interface is shown in Figure 2.

                Figure 2. The input part of the user interface.

The website then presents the data in two ways: one way is by showing a line
chart with each national music chart where the requested song and/or artist has
charted over time for that specific period, the other way is a geovisual
representation where each national music chart is instead shown as each
respective country on a world map, with a colormap scale representing the
position for a specific song released by a specific artist, and a playable “time
slider to visualize changes over time. These different avenues of visualization
are shown in Figure 3 and Figure 5, respectively.

                                       26
Figure 3: A snapshot of the line chart representation of the web application
 for visualizing temporal music charts. The song “Without You (feat. Sandro
Cavazza)” by the artist “Avicii” is showcased as an example, during its first
    month of charting between its release on 2017-08-10 and 2017-09-10.

The line chart depiction of the data features a y-axis defining the chart position
on a scale of 1-200, representing the daily position of a song in the Spotify
Daily Top 200-list. The x-axis of the line chart represents the date, an interval
defined by the user as discussed previously. For each song, artist, and national
music chart combination a line is generated. Most of the time, several lines are
generated and are distinct by their varying coloring. At the bottom of the line
chart, a legend is shown explaining to the user what line pertains to what song
by an artist in a specific country. The line chart demonstration features
interactive functionalities in the form of hovering over lines and specific data
points to isolate them, and the ability to exclude lines entirely from the chart

                                        27
to improve visibility in the user interface. This functionality is demonstrated in
Figure 4.

Figure 4: The same showcased example as in Figure 2, however, most of the
  national markets have been excluded, and additionally, the mouse cursor
hovering over the “U.S.A – Aug 23, 2017” data point is used to highlight the
specific line, and to prominently show the chart position for that specific day.

The map chart representation of the search result data features a depiction of a
map of the world, where each time series of an artist and song search over a
specified period is instead represented by coloring each respective country in
different intensities according to how high the song charted on a scale from 1-
200. The user, additionally, is presented with a slider at the bottom of the
visualization with an associated “play/pause” button. The user can either press
this button, to play an automatic animation where the map changes the coloring

                                        28
of countries interactively according to each respective charting position, or use
the mouse cursor to drag the slider left and right to provide a way to control
more specifically what date to represent in the chart. This map chart is shown
in Figure 5.

               Figure 5. The map chart part of the visualization tool.

This answers how we visualize temporal and geospatial data over time for a
specific song, which is part of RQ2. For specific artist searches, the
visualization focuses on the line chart part of the tool, due its ability, for
example, to display single releases before an upcoming album. This is
highlighted in Figure 6.

                                         29
Figure 6. The Swedish artist “Veronica Maggio” chart performance in
Sweden from 2019-03-22 to 2019-08-01. This showcases the performance of
singles released before the upcoming album, which came out on 2019-06-14.

Moreover, the map chart features interactive functionalities beyond the time-
slider feature. Each country can be hovered over by utilizing the mouse cursor
to show the current chart position more specifically, and buttons labeled “+”
and “-“ can be used to zoom in on specific geographical regions. These
capabilities are shown in Figure 7.

                                      30
Figure 7. The map chart is zoomed in on Europe, and the mouse cursor is
hovering over the Nordic country Sweden to show the specific chart position
                             for that current day.

The scale of 1-200 on the map projection is visualized using the colormap
Viridis. As previously discussed in Chapter 3, this particular colormap has been
chosen on the grounds of it being a perceptually uniform colormap, which has
been proven to be a clearer and superior choice to many other, available
colormaps [29] [30].

The visualizations presented in the web application utilizes the framework
HighCharts [38], which is a JavaScript library supporting a plethora of
different charts and visualizations for JavaScript developers to feature in their

                                       31
applications. Without using this library, the visualizations featured in this web
application would most likely not have been realized, due to the sheer amount
of time that would have been required to implement the types of visualizations
utilized. Given that time was a finite resource in this thesis work, HighCharts
was of great assistance.

There was a very large number of JavaScript visualization libraries available
during the development of this software application. The reasoning behind the
choice of using HighCharts in particular was because of the library’s excellent
“Charts in Motion” functionality [39], which was a considerable catalyst for
actualizing the “map chart over time”-part of the visualization tool developed.

4.2.2 Data access layer
Regarding the server-side part of the application, this layer interprets the user-
defined requests discussed in the previous sub-chapter and communicates with
a database containing all gathered, locally stored data from SpotifyCharts.com
to respond to the client with the requested data. To achieve this, SQL was
employed to fetch data from a database, as previously discussed in this chapter.

                                        32
5     Results
The results presented in this section were gathered from eight participants,
which evaluated the prototype according to the methodology described in
Section 2.2.3. The ICE-T form used in the study is composed of 21 questions,
all on a Likert scale from 1-7, where each number represents:

                  1 = Strongly Disagree
                  2 = Disagree
                  3 = Somewhat Disagree
                  4 = Neither Agree nor Disagree
                  5 = Somewhat Agree
                  6 = Agree
                  7 = Strongly Agree
                  Unanswered question = Not Applicable

The questions are divided into four categories: Insight (8 questions),
Confidence (5 questions), Essence (4 questions), and Time (4 questions).

5.2 Aggregated results

As suggested by the authors of the method [24], we begin by presenting the
aggregated results for each level of the hierarchy, with both mean and median
scores. These results are shown in Tables 1 to 4. The questions have been
numbered according to the first letter of each question’s respective category (I,
C, E, or T) and their order in the form. These abbreviations are also used in
Chapter 6, in the discussion of the results.

The overall mean score, considering all questions/categories, was 5.35, and
the overall median score was 5.0.

                                        33
Table 1: ICE-T scores for the Insight category.

 Question                                                             Mean    Median
                                                                      score   score
 I.1 The visualization exposes individual data cases and their         5.75      6.0
 attributes
 I.2 The visualization facilitates perceiving relationships in the     5.87      6.0
 data like patterns & distributions of the variables

 I.3 The visualization promotes exploring relationships between        6.00      6.5
 individual data cases as well as different groupings of data cases
 I.4 The visualization helps generate data-driven questions            4.75      5.0
 I.5 The visualization helps identify unusual or unexpected, yet       5.75      5.5
 valid, data characteristics or values
 I.6 The visualization provides useful interactive capabilities to     5.25      5.0
 help investigate the data in multiple ways
 I.7 The visualization shows multiple perspectives about the data      4.62      4.5
 I.8 The visualization uses an effective representation of the data    5.37      5.5
 that shows related and partially related data cases

 Aggregated scores for Insight category                                5.42      5.0

Table 2: ICE-T scores for the Confidence category.

 Question                                                             Mean    Median
                                                                      score   score
 C.1 The visualization uses meaningful and accurate visual             6.12      7.0
 encodings to represent the data
 C.2 The visualization avoids using misleading representations         5.75      6.5
 C.3 The visualization promotes understanding data domain              4.28      4.0
 characteristics beyond the individual data cases and attributes
 C.4 If there were data issues like unexpected, duplicate, missing,    2.71      2.0
 or invalid data, the visualization would highlight those issues
 Aggregated mean score for Confidence category                         4.71      5.0

                                           34
Table 3: ICE-T scores for the Essence category.

 Question                                                            Mean    Median
                                                                     score   score
 E.1 The visualization provides a comprehensive and accessible        5.25      5.0
 overview of the data
 E.2 The visualization presents the data by providing a               6.12      6.0
 meaningful visual schema
 E.3 The visualization facilitates generalizations and                5.00      5.0
 extrapolations of patterns and conclusions
 E.4 The visualization helps understand how variables relate in       5.75      6.0
 order to accomplish different analytic tasks
 Aggregated mean score for Essence category                           5.53      6.0

Table 4: ICE-T scores for the Time category.

 Question                                                            Mean    Median
                                                                     score   score

 T.1 The visualization provides a meaningful spatial organization     6.00      6.0
 of the data
 T.2 The visualization shows key characteristics of the data at a     5.75      5.5
 glance
 T.3 The interface supports using different attributes of the data    4.87      5.0
 to reorganize the visualization's appearance
 T.4 The visualization supports smooth transitions between            4.87      4.5
 different levels of detail in viewing the data
 T.5 The visualization avoids complex commands and textual            6.00      6.5
 queries by providing direct interaction with the data
 representation
 Aggregated mean score for Time category                              5.50      6.0

The detailed distributions of the participant-given ICE-T scores per
question/category are presented in Figures 8 to 10. In more details, the scores
for the Insight category are shown in Figure 8; the scores for the Confidence

                                          35
and Essence categories are shown together in Figure 9; and the scores for the
Time category are shown in Figure 10.

      Figure 8: Distributions of ICE-T scores for the Insight category.

   Figure 9: Distributions of ICE-T scores for the Confidence and Essence
                                 categories.

                                     36
Figure 10: Distributions of ICE-T scores for the Time category.

                              37
6     Analysis
To analyze the data gathered in this thesis, the questions formulated in Section
1.3 are used as support for this section. Additionally, the results of ICE-T
questionnaire answered by survey participants in Chapter 5 are analyzed.

RQ1: How can we develop a tool that provides visual representations of data
from music charts to support exploratory analyses of trends over time and
space?

Collecting the data needed for the visualization was mostly successful. Large
amounts of spatiotemporal data were gathered and stored into a database,
which utilized SQL to query the gathered data for analysis. This database was
then successfully used as the basis for a web-based visualization prototype.

A prototype of a visualization tool featuring a time series (as a line chart) and
a geographical chart as coordinated views, both displaying changes over time
in the data, was the result of the effort of answering this question. The
motivation behind the usage of the types of charts employed was discussed in
Chapter 3. This, together with implementation descriptions presented in
Chapter 4, are our answers to the first research question of this thesis, RQ1.

However, since this is quite a broad question, the discussion on how valuable
these visualizations are (according to the value-driven methodology by Wall et
al. [24]) is probably a more important aspect of the thesis.

                                       38
RQ2: Can the created visual representations provide valuable insights on the
chart performances of songs and artists?

To provide evidence for this question, we discuss here the results for each of
the questionnaire’s categories (as featured in Chapter 5), then conclude with a
discussion on the general aggregated results at the end of this chapter.

Insight:
In terms of insight into the data featured in the visualizations, the survey results
in this category showed answers generally trending more towards Agree than
Disagree. The median score of the answers was 5.0 (“Somewhat Agree”),
while the average answer calculated from all questions in this category results
in 5.42 (between “Somewhat Agree” and “Agree”). Following Wall et al.’s
[24] interpretation that a score of 5 or more represents a strength of the
visualization, the data suggests that the prototype provided overall good insight
into visualizing the data.

In terms of strengths, questions I.1, I.2, I.3, I.5 all performed above the average
of 5.42. This evidence seems to allude to that the visualization performed well
in isolating data cases, the relationships between the data in these data cases,
and identifying unexpected outcomes when generating visualizations. This,
most likely, has to do with the functionality of isolating single song + artist
combinations in the tool, as well as surprising the user with representing the
raw data in the shape of the visualizations.

The weaknesses of providing insight into the data via visualization were related
to promoting data-driven questions and showing multiple perspectives about
the data. This could be because whilst a new way of viewing the data is
provided, it does not promote further thinking or give rise to new questions in

                                         39
relation to the visualization. Furthermore, the perspectives shown in this
current iteration of the tool, are quite one-dimensional in what they portray. It
should be mentioned, however, that these “weaknesses” still trend towards
Agree, above the midpoint of “Neither Agree nor Disagree”.

Confidence:
The confidence category had a median score of 5.0, but was the lowest scoring
category out of the four regarding the average score, with 4.71 (between
“Neither Agree nor Disagree” and “Somewhat Agree”). This suggests that
confidence is the category with the most potential for improvement.

The strongest links in terms of confidence, as alluded to by questions C.1 and
C.2, relate to meaningful and accurate representations of the data, and avoiding
the use of misleading representations. This could be because of the linear
scaling of the y-axis in the line chart, and the usage of the Viridis colormap in
the geographical bisection of the visualization.

The one question which without a doubt resulted in the lowest score of the
study was C.4, which relates to data issues such as duplicate, missing, or
invalid data, and whether the visualization would reveal such issues or not.
This could be related to issues with gaps in the timelines, especially in the case
of Brazil, which was mentioned and discussed in Section 2.3. In addition, if a
song which had a big gap in charting, such as songs that only chart during the
Christmas weeks of the year, was displayed in the visualizations, a line would
still be drawn over the entire calendar year, even if there were no chart entries
of that particular song from, for example, January to November.

                                        40
Essence:
The essence category also resulted in favorable results, with a median score of
6.0 and an average of 5.53 (between “Somewhat agree” and “Agree”).

The scores of the individual questions were quite close to each other in this
category, which suggests that in terms of essence, the tool was quite consistent
in a positive way. The strongest quality, supported by question E.2, seems to
relate to the tool providing a meaningful visual schema, which could be by
virtue of the type of charts chosen and especially how these charts complement
each other to provide value.

The weakest link in the essence category, based on question E.3, concerns
generalizations of patterns. Since similar song performance “curves” are not
grouped, but separated by time, this could be a contributing factor to this
weakness. If the line chart had a way of aggregating similar curves, this could
most likely result in a higher average score for this particular question.

Time:
Finally, when it comes to the time category of the ICE-T form, the survey
results similarly showed answers generally trending more towards Agree than
Disagree. The median score was 6.0 and the average answer calculated from
all questions in this category resulted in 5.5 (between “Somewhat Agree” and
“Agree”). This suggests that the tool does a good job of portraying time.

The strengths here are supported by the results of questions T.1, T.2, and T.5.
The data points to spatial organization, showing key characteristics, and
avoiding text-based queries and complex commands to interact with the tool
are all strengths. This could be due to the map chart representation, a clear way

                                       41
of showing the user the actual data attributes, and the “point-and-click”
interaction functionalities available once a search has been performed.

In terms of weaknesses, supported by the results of questions T.3 and T.4, the
visualizations scored lower when it came to supporting different attributes of
the data to reorganize the appearance of the visualization, and supporting
smooth transitions between viewing different levels of detail in the data. This
could a result of the data attributes featured being quite limited, and to view a
different level of detail, for example, a wider or narrower time span, the user
has to do a new search instead of, for example, being able to zoom in on the
line chart.

Overall:
To conclude this chapter, we analyze the overall results of the evaluation in
light of Wall et al.’s [24] recommendations on how to interpret ICE-T answers.
As mentioned before, the authors of the methodology point out that good
visualizations should strive for an overall average score of 5 or more. With a
mean overall score of 5.35, and a median of 5.0, we believe that the data from
the ICE-T suggests that the developed prototype fulfilled its goal of being an
initial viable spatiotemporal visualization tool for music charts. As such, this
concludes our answer for RQ2 with a positive outcome.

                                       42
You can also read