Recommend Songs With Data From Spotify Using Spectral Clustering - DANIEL BARREIRA NAZAR MAKSYMCHUK NETTERSTRÖM

Page created by Ruth Patel
 
CONTINUE READING
Recommend Songs With Data From Spotify Using Spectral Clustering - DANIEL BARREIRA NAZAR MAKSYMCHUK NETTERSTRÖM
DEGREE PROJECT IN TECHNOLOGY,
FIRST CYCLE, 15 CREDITS
STOCKHOLM, SWEDEN 2021

Recommend Songs With Data
From Spotify Using Spectral
Clustering

DANIEL BARREIRA

NAZAR MAKSYMCHUK NETTERSTRÖM

KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF ENGINEERING SCIENCES
Recommend Songs With Data From Spotify Using Spectral Clustering - DANIEL BARREIRA NAZAR MAKSYMCHUK NETTERSTRÖM
Abstract
Spotify, which is one of the worlds biggest music services, posted a data set and an
open-ended challenge for music recommendation research. This study’s goal is to
recommend songs to playlists with the given data set from Spotify using Spectral
clustering. While the given data set had 1 000 000 playlists, Spectral clustering was
performed on a subset with 16 000 playlists due to the lack of computational
resources. With four different weighting methods describing the connection between
playlists, the study shows results of reasonable clusters where similar category of
playlists were clustered together although most of the results also had a very large
clusters where a lot of different sorts of playlists were clustered together. The
conclusion of the results were that the data was overly connected as an effect of our
weighting methods. While the results show the possibility of recommending songs to
a limited number of playlists, hierarchical clustering would possibly be helpful to be
able to recommend song to a larger amount of playlists, but that is left to future
research to conclude.

                                          1
Sammanfattning
Spotify, som är en av världens största musiktjänster, publicerade data och en öppen
utmaning för forskning inom musikrekommendation. Denna studies mål är att
rekomendera låtar till en spellista med den angivna data från Spotify med hjälp av
klusteranalys. Fastän den publicerade datamängden hade 1 000 000 spellistor,
utfördes klusteranalys på 16 000 spellistor på grund av brist på beräkningskapacitet.
Med fyra olika viktningar på grafen med spellistor, visar studien resultat av rimliga
kluster där liknande kategori av spellistor var klustrade ihop. Däremot innehöll
resultatet i de flesta fallen ett väldigt stort kluster med många oliaka typer av
spellistor klustrades ihop. Slutsaten av detta var att den använda datan var alltför
sammankopplad som en effekt utav de använda vägningarna. Även om resultaten
visar att möjligheten finns att rekommendera låtar till ett begränsaat antal
spellistor, skulle hierarkisk klustring möjligen vara till hjälp för att kunna
rekomendera låtar till fler antal spellistor.

                                            2
Acknowledgement
We would like express our sincere gratitude to our supervisors Emil Ringh and
Parikshit Upadhyaya. Parik, without your help we would probably still have been
stuck on the difference between unnormalized and normalized Laplacian. Emil,
without you detecting some of our computational flaws and teaching us effective
ways to find them we would probably still be doing simulations. Without the
encouragement and continuous feedback from you two this project would have been
a lot harder, thank you.

                                       3
Authors
Daniel Barreira, barreira@kth.se
Nazar Maksymchuk Netterström, nazarmn@kth.se
Degree Programme in Technology
KTH Royal Institute of Technology

Place for Project
Stockholm, Sweden

Examiner
Gunnar Tibert
Vehicle Technology and Solid Mechanics,
KTH Royal Institute of Technology

Supervisor
Emil Ringh
Parikshit Upadhyaya
Department of Mathematics, Numerical Analysis,
KTH Royal Institute of Technology
Contents
1 Introduction                                                                                                     6
  1.1 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                 6
  1.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                 7
  1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  8

2 Method                                                                                                            9
  2.1 Graph Theory . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
  2.2 Eigenvalue and Eigenvectors . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  2.3 k-means algorithm . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  2.4 Spectral Clustering . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  2.5 Different approaches to weighting . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  2.6 Recommending songs from clustered graph              .   .   .   .   .   .   .   .   .   .   .   .   .   .   21

3 Results                                                                                                          22
  3.1 Results from weighting 1: Percentage of similar song .                       .   .   .   .   .   .   .   .   22
  3.2 Results from weighting 2: Percentage of similar artists                      .   .   .   .   .   .   .   .   25
  3.3 Results from weighting 3: A constructed function . . .                       .   .   .   .   .   .   .   .   28
  3.4 Results from weighting 4: A constructed function . . .                       .   .   .   .   .   .   .   .   30
  3.5 Results from random samples . . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   31
  3.6 Song recommendation to a playlist . . . . . . . . . . .                      .   .   .   .   .   .   .   .   32

4 Discussion                                                                                                       34
  4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                 36
  4.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  36

                                           5
1     Introduction
We live in an age where there is an overflow of data. All the data presents us
humans with a spectrum of different problems as well as opportunities. Music is an
important cultural part of our society, and it is one of the fields that can take
advantage of the opportunities that arises with the data overflow. Both from the
perspective as a musician and as a listener there are initiatives to be done. As a
musician, you probably want your music to reach as many listeners as possible, and
as a listener you want a diverse pool of music that is in your interest. All this leads
up to the purpose of our project, a challenge presented by AIcrowd called the
”Spotify Million Playlist Dataset Challenge”[3].
The challenge is to create an automatic playlist continuation by recommending 500
songs, ordered by relevance in decreasing order. In this paper the challenge will be
solved by using Spectral clustering.
Spectral clustering is a method to do clustering in the eigenspace of a graph
laplacian. Clustering is a method to analyze data that is widely used in many fields
such as statistics, computer science and more. The aim in this project is to see if
one can find clusters of playlists and use these clusters to not only give music that
directly correlates with the user but also find songs from connections in the cluster.
Problem formulation
From a data set given by Spotify, our study will focus on analysing how effective
spectral clustering is when recommending songs.

1.1    The data
The data set is sampled from over 4 billion public playlists on Spotify and consists
of one million playlists. There are 2 million unique tracks present in the data by
nearly 300 000 artists. The data set is collected from US Spotify users during the
years 2010 and 2017.

                                           6
Figure 1: An illustrative extraction of a part of playlist 661 from the data set.

Furthermore the data has roughly 66 million tracks in total and 2, 26 million unique
tracks. The given data has a few different attributes. An example is shown in Figure
1 for a playlist with 58 tracks. It is shown what is given for every playlist, and for
every track. The ones mainly used in this paper are the ones marked with a red ring.

1.2    Clustering
Clustering is a way of understanding information with dividing data into different
groups. The point is to define connections between data points with similarity, and
by proxy removing non-essential data also known as noise. By doing this one will be
able to detect different patterns and thus being able to analyze the given data. The
applications of clustering are numerous and it is a widely used method to start
analyzing big sets of data with machine learning [8], [7].
There are a variety of clustering algorithms. To name some of them,
ε-neighborhood, k-means and Density-based clustering. In this paper the study
revolves around Spectral clustering and how effective it is when clustering Spotify
playlists.

                                          7
The reason why there are lots of different methods of clustering is because of the
variety of datasets. For example consider the different data point sets in Figures 2
and 3.

      Figure 2: data set one                             Figure 3: data set two

As seen in Figure 2 and Figure 3, the structure of the data points are different. This
means that some clustering methods will also perform better versus others. By
using density-based clustering or spectral clustering one can get ”correct” results on
both graphs shown in Figures 2 and 3 but, k-means clustering will only be simply
implemented on the graph shown in Figure 2. To understand why, a detailed
understanding of the different algorithms is needed. Before explaining the
algorithms some theory is needed.

1.3    Limitations
The given data set consists of a lot of data. Due to lack of time and lack of
computational resources the entire data set could not be analyzed . Furthermore the
challenge itself is not being done, and the study only shows how spectral clustering
could work for the challenge and to understand the general structure of the data.

                                          8
2     Method
As mentioned before the main method in this paper is Spectral Clustering. But
before diving into the algorithm itself some preliminary material is presented,
starting with graph theory.

2.1    Graph Theory
A graph G is defined as a collection of i nodes N = {n1 , . . . , ni } and k edges
E = {e1 , . . . , ek }. A node represents a data point and an edge represents the
connection/relationship between two nodes [5]. We write

                                      G = (N, E).                                    (1)

Furthermore there is such a thing as a directed and an undirected graph.

Undirected and directed graph

A graph is undirected when an edge between two arbitrary nodes ni and nj is the
same without regard of the direction.

                                   eij ∈ E ⇒ eji ∈ E.                                (2)

An undirected graph also is called a symmetric graph. For a directed graph the
direction matters. The next step is to explain how nodes are conneted to each other
via edges.

Connectivity of a graph

A graph is called connected when it is possible to walk from one node to every other
node. Otherwise the graph is not connected and there are several number of sub
graphs. This is defined as multiplicity M , where M ≥ 1. [4]

                    M (G) := multiplicity = number of sub graphs                     (3)

The connectivity of a graph can be represented by different sorts of matrices.

                                            9
Adjacency matrix and weights

The graph can be represented with a matrix, called, the unweighted adjacency
matrix Auw and is defined as following:
                        
                        1 : if there is an edge between nodes k and j
           Auw (k, j) =                                                               (4)
                        0 : if no edge between nodes k and j

Figure 4: Example of an unweighted graph G and the associated adjacency matrix

A graph can either be unweighted (as seen in Figure 4) or weighted, meaning that
the edges values vary. The weighted adjacency matrix Aw is then defined as:

                                   
                                   w        : weight of edge (k,j)
                                        kj
                     Aw (k, j) =                                                      (5)
                                   0        : if no edge between (k,j)

A weighted graph can now be written as following

                                        G = (N, E, W )                                (6)

where W is a set of weights {w1 , . . . , wk }. From here one last definition needs to be
made from graph theory.

                                               10
Degree matrix

A degree matrix D is a diagonal n × n matrix, where n is the length of the
adjacency matrix, and is defined as the sum of each row in the adjecency matrix A
                                                           
                              n                d 1 . . . 0
                                                   . . . .. 
                             X               
                        di =     wi,j    D=             .                   (7)
                             j=1
                                                         dn

where i, j = {1, . . . , n}.

2.2     Eigenvalue and Eigenvectors
Definition: For a square matrix B, λ is an eigenvalue and ν is the corresponding
eigenvector if
                                     Bν = λν                                     (8)

Spectrum of Eigenvalue

For a square matrix B , the spectrum is the set of eigenvalues. If a symmetric
matrix with the size n × n has n non-negative eigenvalues, then that matrix is called
a symmetric positive definite matrix. [1]

Eigs in MATLAB

In this paper the MATLAB-tool eigs() is used to calculate the eigenvalues of our
matrices. To use eigs, the matrix should be square and sparse. Eigs makes the
calculations significantly faster than eig() for sparse matrices.

2.3     k-means algorithm
k-means clustering is an iterative method, commonly used to cluster a set of data
points. Given k center points (called centroids) in a space, the method performs
clustering and assigns all data points to a cluster defined by the nearest centroid.
There are challenges in using this method on some occasions, but it is still a
fundamental core of Spectral clustering. In Figure 5 and Figure 6 one can see the
clusters it detects for two different set of data, for two different cases.

                                          11
Figure 5: k-means on graph 1                  Figure 6: k-means on graph 2

After initializing k clusters, the distance from each data point to each centroid is
calculated. Thereafter the centroid that the data point is closest to is used to define
the cluster that the data point is assigned to. Thereafter the mean-value is
calculated for the positions for all data points in the different clusters. This gives a
new position to the centroids in the next iteration. Same process is repeated until
the centroids are stationary, meaning that they find an equilibrium. The algorithm
is presented below in Table 1.

         Algorithm: k-means algorithm
   1.    Randomly initialize k centroids c1 , . . . , ck .
  2.     Calculate the distance from each data point to every centroid.
  3.     Assign each data point to its nearest centroid.
  4.     Calculate the mean value of the grouped data points in each cluster and
         make that the new positions for the centroid.
  5.     Repeat from step 2 to step 4 til the difference in positions for each centroid
         is below a given tolerance.
                              Table 1: k-mean algorithm

                                            11
2.4    Spectral Clustering
Spectral clustering uses the spectrum of eigenvalues from the graph laplacian matrix
to cluster the graph in question [5]. There are unique advantages when using the
spectrum of the eigenvalues to cluster a graph. These advantages are going to be
presented later.
The method implies calculating the eigenvalues and eigenvectors of a so called
Laplacian matrix. Given a graph G = (N, E) with nodes {n1 , . . . , nm } ∈ N , then
the unweighted Laplacian matrix L of the graph G is a m × m−matrix defined as

                                      L=D−A                                         (9)

where D and A are the degree and adjacency matrices defined in (5) and (7),
respectively. An important property of L is that the matrix is symmetric positive
semi-definite. The spectrum of the Laplacian matrix is calculated to obtain
underlying structures of the data of the graph. The number of connected
components is equal to the multiplicity of the 0 eigenvalue, i.e., eigenvalues with the
zero value. If k different eigenvectors have eigenvalue λ1,...,k = 0, the graph has k
connected components. This gives information about the number of clusters. On
the other hand, if the graph G is connected, λ2 gives information about the
connectivity of the graph. The greater the value of λ2 the stronger the connectivity.
In Figure 7, 8 and 9, examples to illustrate these properties are shown:

   Figure 7: λ1,2,3 = 0             Figure 8: λ2 (G1 ) > 0    Figure 9: λ2 (G2 )  0

The normalized Laplacians

To calibrate spectral clustering to different graphs there are strategies that involve
normalizing the Laplacian matrix. One way of normalizing the Laplacian matrix is
as following:
                                Lnorm = D−1/2 LD−1/2                                (10)

where the degree matrix D is defined in (7). The properties of the graph laplacians
have an impact on how the graph is partitioned, in other words, how the graph is

                                          12
cut.[5]

Partitioning a graph

Consider a graph G = (N, E, W ), with i nodes. Consider also the k partitions
{A1 , A2 , . . . , Ak }. Furthermore use the notation Ac for the compliment of the
partition A. The problem in question is to find the partitions of the graph by
minimizing the cut, defined as:
                                                      k
                                                   1X
                         Cut(A1 , . . . , Ak ) =         W (Ai , Ac i )              (11)
                                                   2 i=1

where
                                                   k
                                                   X
                                W (A, B) =                wi,j .
                                               i∈A,j∈B

This takes into account the weight of the nodes that are being cut. This is an
attempt to cut in a way that the clusters whose weights of the edges connecting the
different clusters are as small as possible. However this method does not take into
account the number of nodes, nor the volume in the different clusters. The risk is
then that the clusters strongly vary in size but also partitioning the clusters in a
”wrong way”. Figure 10 shows an example how the partitioning can be made.

Figure 10: The figure shows how the cut can be done on a graph with multiplicity
1. Cut 1 shows how the cut can be done if number of nodes is not being taken into
account. While cut 2 shows how we probably want to cut the graph.

                                             13
As shown, this method is not necessarily giving the partitioning that is being
sought-after. Other quantities are therefore needed. To make the size of the clusters
more similar, there are two ways one can consider measuring sizes of the clusters.
Either by taking the number of nodes in a cluster |Ai | into account or taking the
volume of a cluster vol(A) into account. This leads to minimizing either, the so
called, RatioCut or NCut [5].

                                                          k
                                                    1 X W (Ai , Ac i )
                     RatioCut(A1 , . . . , Ak ) =                               (12)
                                                    2 i=1  |Ai |

where
                   |Ai | := the number of nodes in a partition Ai ,

and
                                                      k
                                                 1 X W (Ai , Ac i )
                       N Cut(A1 , . . . , Ak ) =                                (13)
                                                 2 i=1 vol (Ai )
where
                                                k
                                                X
                                    vol(A) :=         di
                                                i∈A

To summarize, minimizing RatioCut will encourage the clusters to have similar
amount of nodes and minimizing NCut will encourage the clusters to have similar
volume. Solving this minimization problem exactly is NP-hard but with the graph
Laplacian, it is possible to approximate a solution. By using the unnormalized
Laplacian L an approximation of the minimization problem of RatioCut is done and
the normalized Laplacian Lnorm can approximate the minimization problem of
NCut. Proof and theorem of the connection of the eigenvectors of the Laplacian
with the graph cut functions is given by Von Luxburg [5].

                                           14
Spectral Clustering Algorithms

In Tables 2 and 3 two different algorithms for Spectral clustering are presented.

       Algorithm with aproximation of RatioCut
  1.   Create a graph G = (N, E).
  2.   Compute the unnormalized Laplacian L = D − A.
  3.   Find the eigenvectors ν1 , . . . , νk of the matrix L belonging to the k smallest
       eigenvalues and create a matrix U = [ν1 , . . . , νk ].
  4.   Treat each row in U as a data point x1 , . . . , xn and perform k-means
       clustering on the points into k partitions A1 , . . . , Ak .
             Table 2: Spectral clustering with unnormalized laplacian

       Algorithm with approximation of NCut
  1.   Create a graph G = (N, E).
  2.   Compute the normalized Laplacian Lnorm = D−1/2 LD−1/2 .
  3.   Find the eigenvectors ν1 , . . . , νk of the matrix Lnorm belonging to the k
       smallest eigenvalues and create a matrix U = {ν1 , . . . , νk }.
  4.   Create T = ti,j and set ti,j = d−1/2 νi,j . In other words, let T contain the
       normalized rows of U .
  5.   Treat each row in T as a data point x1 , . . . , xn and perform k-means
       clustering on the points into k clusters A1 , . . . , Ak .
              Table 3: Spectral clustering with normalized laplacian

Spectral clustering and k-means (test results for a small dataset)

Figure 11 shows a faulty partitioning using k-means clustering. This is a problem
where k-mean clustering sometimes gives undesired results. Because of the way the
conditions are initialized, the centroids can find an equilibrium in an unwanted way.
An easy and reasonable way to get around the problem is to do the k-mean
clustering a few more times, and use the most common outcome (the undesired
result is more unusual than the desired result).

                                           15
Figure 11: An example of an undesirable convergence of k-means

Figure 12 presents an example of using k-means clustering, as done in Figure 6, but
instead of using the graph G, the k-means is done in the eigenspace of the Laplacian
L. The algorithm has partitioned 3 clusters in a desired manner. The set of points
which are shown in Figure 3 are distributed in a far more complicated way than the
set of points in Figure 2. By only doing k-means the partitioning of this data this is
not easy, one would have to involve transformations [7] and have a deeper
understanding of the structure of the data to be able to accomplish the ”correct
clustering”. With Spectral clustering instead, the partitioning is made significantly
more effective.

Figure 12: Illustration of how Spectral clustering has ”correctly” clustered the data
in Figure 3.

When clustering the data with three different sizes of circles with our spectral
clustering script, the results were correct approximately 60% of the time. This

                                         16
means that k-means (which is the final step in spectral clustering ) found
equilibrium on wrong spots in the eigenspace, approximatly 60% of the time.
An important factor that affects the precision in k-means is the initialization of
centroids. If a centroid is initialized far away, relatively to the data points, it has a
chance of ending up without data points associated with it. In another case, more
than one centroid has a high probability of initializing in the same cluster, which
would also cause problems. By initializing random centroids in a more strategic
way, instead of pure randomization, the precision increases drastically. One method
of initializing starting points in a strategic manner is called k-means++.

k-means++

This algorithm proposes a specific iterative mathematical method of assigning
starting positions for the centroids. By using this method the centroids are more
evenly spread out through the graph [2]. Let P be the set of data points and define
D(x) as the distance from a data point to closest centroid. The algorithm is found
in Table 4:

        Algorithm k-means++
   1.   Randomly initialize a centroid c1 from P
  2.    For each datapoint compute D(x) to the centroid that is nearest.
  3.    Choose as the next centroid the point with the highest probability P++
  4.    Repeat step 2 and 3 until k centroids have been initialized.
                          Table 4: Algorithm for k-means++

Where
                                       X          D(x)2
                               P++ =         Pk           .                          (14)
                                                      2
                                       x∈P    x∈P D(x)

                                             17
39     61   200                            100   100   100
             32     68   200                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             39     61   200                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
              0    100   200                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
             100   100   100                            100   100   100
              0    100   200                            100   100   100
             32     68   200                            100   100   100

     Table 5: Results with k-means             Table 6: Results with k-means++

Table 5 and 6 represent solutions to Spectral clustering on predefined data points.
The data is modeled as 100 data points in each circular shaped cluster, as in figure
3. Each row of the table represents a simulation, and shows the number of points to
each clusters found. The table on the left represent results from spectral clustering
featuring k-means and the right table represent results from spectral clustering
featuring k-means++. In table 6, the advantage of k-means++ becomes obvious.

                                         18
2.5    Different approaches to weighting
The weighting of the graph is a crucial part in this study. The weighting will
determine how the playlists are clustered and that will directly lead to a resulting
list of recommended songs.

Weighting 1: Percentage of similar songs between two playlists

This weighting method goes as follos: the number of songs that belong to both
playlists are found and divided by the length of the longest playlist. The similarity
is based from the perspective of the big list, i.e if list B has 100 songs and list A has
50 songs, their similarity is 0.5 and not 2.0. With this method there are values
between 0 and 1. The formula is seen below, where A and B are arbitrary playlists
with their tracksURIs. Length(M ax(A, B)) is the length of the longer list and
similarity(A, B) is the number of common URIs.

                                       similarity(A,B)
                              wAB =                                                 (15)
                                      Length(Max(A,B))

Furthermore the weighting also included some constraints. The constraints were
split up into two different segments. The first constraint was that data under a
certain threshold did not get added to the adjacency matrix, for example the filter
could be that the lists needed to have more than 20% in common for them to be
added to the adjacency matrix. The second constraint was applied in a second
stage. First of all, every row and column with only 0’s in it were removed, meaning
they did not have any connections, were removed. Second of all, the rows and
columns that had less or more than a specific number of edges were removed, for
example if a row only had 1 connection to another playlist it was removed. This was
done in hope of reducing the number of small clusters and outliers.

Weighting 2: Percentage of similar artists between two playlists

In the same manner as weighting function 1, this was used based upon the data
given from the data set. The percentage similarity was determined using (15), but
with A and B containing artist URIs instead of track URIs. This function was also
subject to constraints in the same way as in weight function 1.

                                           19
Weighting 3 and 4: A constructed function

The third and fourth weight function is given by Figure 13, and it is a curve of (16)

                               w3,4 = 3w − 6w2 + 4w3                             (16)

Figure 13: A plot of the function showing how a low percentage similarity gets
increased to a higher similarity and how a high percentage similarity gets decreased
to a lower.

This weight function is done with the goal to create more connections with a closer
value to each other. As seen in figure 13 the function amplifies low values and
devalues high values. The function is done for both artist and tracks, hence it is
both the third and fourth weight function.

                                         20
2.6    Recommending songs from clustered graph
Eigenvalues and eigenvectors were computed with the function eigs in MATLAB. As
mentioned before, we take every eigenvector corresponding to eigenvalues close to
the value 0, and thus cluster the eigenspace. In the reduced space, each row still
corresponds to one playlist in the same order. So when the clustering algorithm is
done, one can find the index for the given row, and go back to the original
JSON-data and withdraw essential information such as, song name corresponding to
a track URI, artist name corresponding to artist URI and the name of the playlist.
When it comes to recommending a song from a cluster, a random number is
generated from the size of the cluster. This number is then the row in the cluster
you look at, and thus also a playlist. If it is the same playlist as the one that is
getting songs recommend to it, a new number is randomized. If it is another
playlist, another number is randomly generated from the size of songs in the playlist
to suggest a song. If the song already exists in the the playlist you are
recommending songs to, then we randomize a new number. This algorithm is then
repeated as many times as there are songs left in the cluster, or until the set amount
of recommended songs is reached.

                                         21
3     Results

3.1    Results from weighting 1: Percentage of similar song
Results from unnormalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
Nr. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f.   Size(A)    Nr. cl   max(cl)   Comments
                                           One big cluster, one with size 145
20%           4702       324      3578
                                           a few with size 10 to 12 and the rest less than 7
                                           One big cluster, one with size 136
20%
Results from normalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
N.r. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f.   Size(A)    Nr. cl   max(cl)   Comments
                                           Clusters and cluster sizes identical
20%           4702       324      3578
                                           to unnormalized laplacian
                                           Same as above
20%
Closer look at a few selected clusters

The selected clusters are from a threshold of 30% and no edge filtering. They are
presented as the playlist name they have in the data. There are 259 clusters in total
of which less than 8% are greater than 10 in size. The first cluster, which is
presented in Table 9, is one with a size of 19, this is also one of the larger clusters
obtained using this particular weight function. The second one, in Table 10, is also
one that is quite large, relatively to the other clusters, and it has size 7.

Latin Trap’      latino ’    Cecilia’         Lily’        Nueva’ Latin Vibes’
Spanish’         spanish’    Regeaton’        Spanish’     Party’ reggaeton’
Fiesta latina’   Mi Gente’   musica favorita’ Spanish Mix’ Gucci’
Latin Vibes’                 BEANS’

                         Table 9: Playlist names in a cluster

Yeezy’   yeezy ’   Kenye West ’   Kenye West’    Yeezy Taught Me
Kanye    Kanye

                        Table 10: Playlist names in a cluster

                                          24
3.2    Results from weighting 2: Percentage of similar artists
Results from unnormalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
N.r. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f.   Size(A)    Nr. cl   max(cl)   Comments
                                           One very large cluster,
20%          14567        44      14463
                                           second largest is size of 6
20%
Results from normalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
N.r. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f.   Size(A)    Nr. cl   max(cl)   Comments
                                           Same partitioning as with
20%          14567        44      14463
                                           the unnormalized Laplacian
                                           Same partitioning as with
20%
Closer look at a few selected clusters

The selected clusters are obtained using a threshold of 50% and no edges filtered.
The clusters we are taking a closer look at is number 127 with a size of 68 and
number 70 with a size of 14. Cluster 127 is the third largest one, and cluster 70 is in
the top 10% largest ones, relative to the other clusters.

 Classical’ Movie Soundtracks’ movie scores’ movie themes’            Soundtrack’
 Movies’    Harry Potter’      Symphonic’    Scores’                  Chill’
 homework’ Classical’          Orchestra’    movie music’

                         Table 13: Playlist names in a cluster

Disney’               Disney Jams’       Disney’             Tangled’       Disney’
Disney’               Disney’            Disney’             Disney’        Disney’
Princess’             disney’            Disney’             Disney’        Disney’
Disney Music!!!!!!’   Disney’            Disney’             disney’        DISNEY ’
Disney/Pixar’         Disney’            disney’             Disney         Disney’
Disney                disney’            disney playlist.’   Disney Music’  Disney :) ’
DISNEY’               disney’            Disney’             Disney ’       Disney’
Disney ’              Disney Classics’   disney’             Disney’        babies ’
Disney’               Disney Favs’       Disney’             Disney’        disney’
Disney’               Disney’            Disney Princess’    Disney         Best of Disney’
disney’               Disney’            Disney’             DISNEY JAMS ’ DISNEY’
Disney’               Disney Jams’       Disney’             Disney’        Disney!’
disney songs’         Disney’            Disney’             hakuna matata’ Disney’
Disney’               disney’            Olivia’

                         Table 14: Playlist names in a cluster

                                           27
3.3    Results from weighting 3: A constructed function
TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
N.r. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f   Size(A)    Nr. cl   max(cl)   Comments
20%         13689      120      13416     The second largest cluster is size 5
                                          A big variation on the cluster size
20% >9      2739       569      212       top 11 clusters
                                          {212, 103, 99, 80, 58, 58, 46, 42, 31, 36, 24}
                                          Cluster sizes after the biggest have sizes
20% >14     4091       394      2905
                                          {28, 25, 25, 20}, 360 clusters less than 5
                                          Biggest cluster is size 33.
20% >6      1802       561      33
                                          543 clusters are less than 10
                                          One very large, three clusters bigger than 10
30%         9984       225      9419
                                          The rest of the cluster are less than 6
                                          One very large, a few with size between 10 − 69
30% >9      3710       624      1461
                                          480 clsuters less than 4
                                          Clusters have sizes: {75, 34, 30, 26, 25}
30% >6      2743       733      135       694 clsuters les than 10 and
                                          443 clusters less than 3
                                          Clusters have sizes: {49, 43, 20, 20}
30% >14     4943       429      3621
                                          289 clsuters less than 4
                                          One very large cluster, thereafter {164, 65, 26}
40%         5249       308      4125                                                      .
                                          252 clusters are less than 4
                                          One very large cluster, second biggest 125
40% >9      3091       511      965
                                          a few between 20-90 and 305 cluster with size 2
                                          Largest cluster is size 77, thereafter {60, 34, 28, . . . }
40% >6      2438       624      77
                                          343 cluster with size 2
                                          One very large cluster,
40% >14     3768       388      2412
                                          353 cluster that are less than 3

                   Table 15: Results from 16000 spotify playlists.

                                          28
Closer look at a few selected clusters

Table 16 and Table 15 show clusters that are obtained using a threshold of 30% and
filtering playlists that have more than 6 edges.

Awesome Playlist’ Country’   Zoned’            greek’      Dark Side’ summer country’
electro’          Rock’      Lindsey Stirling’ smiles :)’  Pool’       Black’
woo’              Relaxing ’ Spring 2017’      90s Rock’   pump up’ Chill’
Gaming Songs’     jjj’       energy’           cool beans’ Perfection’ 80s’

                       Table 16: Playlist names in a cluster

This is what you came for   Party playlist   Me      Eurodance Gaming    Supernatural
Lit                         Sunshine         Drive   Ay        ALT       Rock

                       Table 17: Playlist names in a cluster

                                        29
3.4    Results from weighting 4: A constructed function
TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix
N.r. cl = number of clusters, max(cl) = maximal cluster size

TH    E.f   Size(A)    Nr. cl   max(cl)   comments
                                          One very large cluster
20%         15837      5        15829
                                          the other 4 clusters have size of 2
                                          Size of biggest clusters {12, 11, 9, 8 . . . }
20% >9      356        115      12
                                          99 clusters less than 5
                                          Size of biggest clusters {7, 5, 5, 4 . . . }
20% >6      173        65       7
                                          41 clusters with the size of 2
                                          Size of biggest clusters {47, 36, 31, . . . }
20% >14     692        142      47
                                          13 clusters with size less than 5
                                          One very large cluster the rest are size 2
30%         15621      11       15601

                                          Sizes of biggest clusters {101, 30, 25, 20, . . . }
30% >9      913        217      101       186 clusters with size less than 5,
                                          120 clusters with size 2
30% >6      528        181      9         All clusters have almost similar size
                                          One very large cluster, second biggest 22
30% >14     1550       208      826
                                          120 clusters less than 2
                                          One very large cluster, second biggest size of 4
40%         14817      44       14721

                                          One large cluster, 221 clusters less than 3
40% >9      1810       388      194

                                          Sizes of biggest clusters {33, 22, 15, 14, 13, . . . }
40% >6      1086       345      33
                                          271 clusters that are less than 4
                                          Sizes of biggest clusters {87, 45, 31, 30, 19, . . . }
40% >14     2906       323      1770
                                          254 clusters with the size of 3 or less

                    Table 18: Results from 16000 spotify playlists.

                                          30
Closer look at a few selected clusters

Table 19 and 20 show cluster that are obtained using a threshold of 40% and
filtering of nodes that have greater than 9 edges.

JAMS’          Love Music’              basic’      RUNNIN’            ”emoji music note”
electronic’    Litty ’                  Cruisin’    modern rock’       vibes’
pregame’       Happy Happy Happy’       Blues’      PARTY ’            classic’
4th of july’   2016’                    english’    Classical’         Summer 15’
Beach Music’   rock’                    90s Rock’   Random!’           childhood’
skrt skrt’     dance’                   broadway’   sad song’          Way Back When’
lift’          In the Name of Love’     TX Country’ Bruno Mars         Summertime
TX Country     RECENT                   Swing

                        Table 19: Playlist names in a cluster

Solitude’ Spanish’ randoms’        Julion alvarez’              *** good stuff’   june’
Workout’ Relax’    Piano Guys’     Brown Eyed Girl’             wedding playlist’ Country’
MVP ’     Fall’    ThrowBack Pop ’ Hawaii ’                     gabrielle ’

                        Table 20: Playlist names in a cluster

3.5    Results from random samples
The extraction of playlists from the data set has been done on consecutive data,
which means for the analysis of 16 000 playlists, the first 16 000 playlists from the
data set was extracted. A test has been done where three different sets of 16 000
random playlists from the data set have been extracted and our algorithm with
weighting w1 and w2 have been implemented. The results show identical cluster-size
differences, where the resulting clusters are one very big cluster and the rest with
smaller size. These tests gives us no reason to believe that the given data set is in
some way ordered by the publisher or that our extraction is an outlier.

                                         31
3.6    Song recommendation to a playlist
In Table 21 a full playlist with 39 Disney songs is shown. This playlist is randomly
chosen from the smaller clusters to show a practical example of the song
recommendation procedure to a playlist. The algorithm had a potential to
recommend 920 songs to the playlist, where 48 of them are suggested as seen in
Table 22.

 A Disney playlist
 Roger Bart’              Lillias White’                          Bruce Adler’
 Go the Distance          I Won”t Say                             Arabian Nights’
 Brad Kane’               Lea Salonga’                            Jonathan Freeman’
 One Jump Ahead’          A Whole New World’                      Prince Ali (Reprise)’
 Lea Salonga’             Donny Osmond’                           Harvey Fierstein’
 Reflection               I”ll Make a Man Out of You              A Girl Worth Fighting For
 Jason Weaver’            Carmen Twillie’                         Jeremy Irons’
 I Just Can”t Wait t...   Circle Of Life                          Be Prepared
 Nathan Lane’             Jodi Benson’                            Samuel E. Wright’
 Hakuna Matata’           Part of Your World                      Under the Sea
 Chorus                   Angela Lansbury’                        Robby Benson’
 Belle’                   Be Our Guest                            Something There’
 Angela Lansbury’         Mandy Moore’                            Donna Murphy’
 Beauty and the Beast’    When Will My Life Begin                 Mother Knows Best
 Mandy Moore’             Mandy Moore’                            Judy Kuhn’
 I”ve Got a Dream         I See the Light                         Just Around The Riverbend’
 Phil Collins’            Phil Collins’                           Phil Collins’
 Two Worlds’              You”ll Be In My Heart’                  Son Of Man’
 Rosie O”Donnell’         Phil Collins’                           Heidi Mollenhauer’
 Trashin” The Camp’       Strangers Like Me’                      God Help The Outcasts’
 Tony Jay’                Kristen Bell’                           Kristen Bell’
 Heaven”s Light           Do You Want to Build a Snowman?’        For the First Time in Forever’
 Kristen Bell’            Idina Menzel’                           Kristen Bell’
 Love Is an Open Door’    Let It Go                               For the First Time in Forever
 Ne-Yo’                   Phil Collins’
 Friend Like Me           You”ll Be In My Heart’

Table 21: The songs with the associated artists from a random playlist to which songs
are recommended to.

                                           32
48 SUGGESTED SONGS
 Maia Wilson’                    Cheryl Freeman’                    Opetaia Foa”i’
 Fixer Upper’                    The Gospel Truth I                 We Know The Way
 Jesse McCartney’                Fess Parker’                       Phil Collins’
 When You Wish Up...’            The Ballad Of Davy Crockett’       On My Way’
 Miley Cyrus’                    Tony Jay’                          Sarah McLachlan’
 Butterfly Fly Away’             The Bells Of Notre Dame’           When She Loved Me’
 Adriana Caselotti’              Auli”i Cravalho’                   Bryan Adams’
 Whistle While You Work’         How Far I”ll Go’                   You Can”t Take Me
 Ken Page’                       Angela Lansbury’                   Alessia Cara’
 Oogie Boogie”s Song’            Human Again’                       How Far I”ll Go
 Beth Fowler’                    Keith David’                       Jenifer Lewis’
 Honor To Us All                 Friends on the Other Side          Dig A Little Deeper
 Judy Kuhn’                      The Cast of                        M. Keali”i Ho”omalu’
 Colors Of The Wind’             One of Us’                         He Mele No Lilo’
 Jump5’                          Rhoda Williams’                    Louis Prima’
 Aloha, E Komo Mai               The Music Lesson                   I Wan”Na Be Like You
 Mary Costa’                     Jeremy Jordan’                     Adam Mitchell’
 An Unusual Prince               The World Will Know’               Days In The Sun’
 Bruce Reitherman’               Anna Kendrick’                     Pocahontas’
 The Bare Necessities’           No One Is Alone’                   Where Do I Go From Here’
 Auli”i Cravalho’                Bobby Driscoll’                    Shakira’
 Know Who You Are’               Following The Leader’              Try Everything - From
 Cedar Lane Orchestra’           Jemaine Clement’                   Dr. John’
 The Lion King’                  Shiny’                             Down in New Orleans’
 Adriana Caselotti’              Elvis Presley with Orchestra’      Tony Jay’
 Some Day My Prince ...’         Suspicious Minds’                  Out There’
 Samuel E. Wright’               98’                                *NSYNC’
 Kiss the Girl                   True To Your Heart’                Trashin” The Camp
 Mark Mancina’                   Ferb’                              Richard White’
 Village Crazy Lady              Backyard Beach’                    Gaston
 Jim Cummings’                   Elton John                         Rachel House’
 Gonna Take You There            Can You Feel The Love Tonight      I Am Moana

Table 22: 48 recommended songs, with the associated artists, to playlist in Table 21.

                                         33
4    Discussion
As seen from almost every table in the results, there is a general theme of one big
cluster, and then a large number of small ones. This pattern is especially visible
when data points with less than 2 and 5 edges are removed from the first and
second weight function. Sometimes there is even only one cluster present after the
filtering is done. From this the conclusion was made that the problem is not that
the data is not connected enough. Rather, the analysis made from this is that the
data might be overly connected, thus the decision was made that for the third and
fourth weighting we try and filter out data with top many connections.

The code is written in such a way that from the start it removes 0-rows and
0-columns, and furthermore this also shows in good way how many playlists have 0
matches after the threshold-filter. It is obvious from Table 7 that it is quite rare
that a playlist has 20% in common with another playlist, and looking at the 50%
threshold data, we see that the matrix is reduced into a 192 × 192 matrix from a
16000 × 16000, which leaves the undesired result of filtering out 98.8% of the data.
What is important to note is that the big cluster is a partition of all the data that
has a high connectivity, meaning that suggesting a song to another playlist within
the cluster is almost equal to randomizing a song. When trying the normalized
Laplacian an identical result is obtained, and thus the conclusion is made that for w1
there is no difference when approximating the minimization of NCut or RatioCut.

The second weight function shows that the matrices are larger in size, which is no
surprise, because as stated in section 1.2 there is a big difference in the number of
unique artists and unique tracks over the data set. Even considering this, it is worse
in some aspects. Consider the point earlier made that the big cluster is almost equal
to randomizing a song from the data set, and assuming that the second largest
cluster is reliable. It is then possible to put a number on how many playlists in a
given cluster that are able to receive a song recommendation. Below is a summation
of how many playlists that are eligible for a song recommendation.
 Filter   20%
with artists seemed better because of more connectivity, but Table 23 shows that w1
is almost better for every threshold. In the results part we present a closer look at a
few clusters to get a grasp of how the clusters looks like. It then becomes visible
that for the clusters that out algorithm does find, those are the clusters that have
things in common and therefore we consider them as good candidates to be the
source of a recommendation for a list within the cluster.
As expected when using w3 and w4 the dimensions of the matrices became larger.
The weight function itself is amplifying edges with a low value, and reducing values
that are on the high end, as seen in Figure 13. From the results of w3 and w4 , one
can see that there are a large number of clusters when the filtering is done for data
points with more than 6 , 9 and 14 edges. Looking at w4 , having a 20% threshold
and not filtering any edges, the matrix size is 15 829. Increasing the threshold
another 10% decreases the size to 15 621. Considering that the optimum would be
being able to suggest song recommendations to every playlist, these sizes are
desirable. The problem comes when looking at the cluster sizes, and furthermore
looking at the largest cluster, showing the same tendencies as w1 and w2 . To resolve
these tendencies we filtered out playlists that we defined as overly connected with
the intent to maybe break up the larger clusters.

 Filter   20%   >6     >9     >14    30%        >6     >9     >14    40%    >6     >9     >14
 w3       273   2739   1802   1186   565        2249   2126   1322   1124   2438   2126   1356
 w4       8     173    356    692    20         528    913    724    96     1086   1616   1136

Table 24: A table of how many playlists are able to receive a song recommendation

This table is done in the same manner as for w1 and w2 , the difference is that
sometimes the largest cluster is not too big, and thus not being subtracted when
calculating how many playlists that can receive a suggestions. Interesting to note is
that the same pattern is seen as above, that tracks behave in a more desired way
than artists. Furthermore as seen in table 18 the intent of filtering over-connectivity
also lead to more clusters, and because of this it is also easier to suggest songs to a
given playlist. This is illustrated by the fact that the algorithm went up from being
able to suggest songs to 1 206 playlists at best to 2 739. But after doing a closer
look at clusters, as seen in Table 16, 17, 19 20 the clusters seem more random and
are not as coherent as they were for w1 and w2 . The conclusion drawn from this is
that artificially modifying the weighting function does not bring a desirable result.
Lastly we have recommended songs to a playlists as seen in Table 22. The playlist
we recommended song to is a playlist with Disney songs. The recommended songs
that the algorithm gave are also Disney related songs, for example the first
suggestion is a soundtrack from the movie Frozen.

                                           35
4.1    Conclusion
One of the most important conclusions is that trying to manipulate the data
structure with filtering different number of edges is not the way to go. First of all,
every time an edge filtering is done, that also means that a playlist is removed, and
because the optimum goal is to be able to suggest songs to every playlist, this is a
bad solution. Secondly, when the filtering is done for playlists that we defined as
overly connected, the clusters became more random and thus also a worse source for
song recommendation. When applying the simple weightings w1 and w2 without
filtering edges and ignoring the large cluster, we get results which are desirable.
When looking at the largest cluster, the conclusion is that some playlists are too
connected and needs to be dealt with in an alternative way. How well the presented
method would fare in the challenge is unknown, but after taking a glance at the
recommended songs in 3.6 it would seem that the recommendations are reasonable.
To conclude, this study has shown that recommending songs using Spectral
clustering might be a viable option, but further research has to be done.

4.2    Future work
One of the most important parts to note is the fact that every simulation presented
in this paper is done on a 16 000 × 16 000 matrix. There is no way to deduce if the
result would be different using the intended 1 000 000 × 1 000 000 matrix, so here
there is naturally room for further testing. As stated in the limitations section we
decided to not pursue larger sets of data due to limitations in computational
resources. Furthermore there are unlimited different ways to do the weight
functions. In our case we picked one that is the simplest, natural and logical, and
then used another one that we deduced might address the problems we had detected
in the first one. That weight function did do better in the sense of being able to
suggest songs to more playlists, but there were still a lot of small clusters and thus a
limitation on how many songs one could recommend to some of the playlists. There
was also the problem of the clusters being less structured than they had been with
the original weight functions. A weight function that might not make the data
overly-connected but still finds meaningful clusters would be interesting. Ultimately
we note that there are methods to solve the problem of one large cluster. One of
them is called hierarchical clustering [6], and in a sense it means that further
clustering is done on the biggest clsuter.

                                          36
References
[1] Howard. Anton and Robert C. Busby. Contemporary Linear Algebra. Hoboken,
    NJ : Wiley, 2003. isbn: 0471163627.
[2] David Arthur and Sergei Vassilvitskii. “K-Means++: The Advantages of
    Careful Seeding”. In: SODA ’07 (2007), pp. 1027–1035.
[3] Ching-Wei Chen et al. “Recsys Challenge 2018: Automatic Music Playlist
    Continuation”. In: RecSys ’18 (2018). doi: 10.1145/3240323.3240342.
[4] Elias Jarlebring. “Numerics for graphs and clustering”. In: Lecture notes
    numerical algorithms for data science (SF2526) (2019), pp. 8–9.
[5] Ulrike von Luxburg. “A Tutorial on Spectral Clustering”. In: Statistics and
    Computing 17(4), (2007). url: https://arxiv.org/abs/0711.0189.
[6] Frank Nielsen. “Hierarchical Clustering”. In: Feb. 2016. isbn:
    978-3-319-21902-8. doi: 10.1007/978-3-319-21903-5_8.
[7] Jake VanderPlas. Python Data Science Handbook: Essential Tools for Working
    with Data. eng. Sebastopol: O’Reilly Media, Incorporated, 2016. isbn:
    1491912057.
[8] “What is Clustering”. In: Machine Learning Crash Course (2020). url: https:
    //developers.google.com/machine-learning/clustering/overview.

                                      37
www.kth.se
You can also read