Methodology for identifying homogeneous consumer groups based on qualitative data

Page created by Lonnie Rojas
 
CONTINUE READING
Methodology for identifying homogeneous consumer groups based on qualitative data
IOP Conference Series: Earth and Environmental Science

PAPER • OPEN ACCESS

Methodology for identifying homogeneous consumer groups based on
qualitative data
To cite this article: G S Gabidinova 2021 IOP Conf. Ser.: Earth Environ. Sci. 677 022012

View the article online for updates and enhancements.

                               This content was downloaded from IP address 46.4.80.155 on 28/09/2021 at 16:00
Methodology for identifying homogeneous consumer groups based on qualitative data
AGRITECH-IV-2020                                                                                               IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012                        doi:10.1088/1755-1315/677/2/022012

Methodology for identifying homogeneous consumer groups
based on qualitative data

                     G S Gabidinova
                     Naberezhnye Chelny Institute (branch) of Kazan Federal University, 68/19, Mira
                     prospect, Naberezhnye Chelny, Republic of Tatarstan, 423812, Russia

                     E-mail: gab-gul@yandex.ru

                     Abstract. In this article, the author proposes a method for identifying homogeneous consumer
                     groups based on qualitative data. The problem is that when researching the end-user market,
                     information is often presented not in quantitative but in qualitative form. The random variables
                     with which mathematical statistics deal are usually assumed to be numeric. Therefore, among
                     researchers there is an opinion that achieving at least an interval level of measurement is always
                     desirable, since it expands the researcher capabilities, giving him grounds to use data
                     mathematical and statistical analysis traditional methods. Sociologists, on the other hand,
                     emphasize the qualitative data enormous role in the respondents' study. The presented
                     methodology is based on cluster analysis, differs from the applied market segmentation methods
                     in that it uses cluster analysis algorithms developed concerning qualitative indicators, and
                     involves a proximity measure use that allows one to determine the natural weights between
                     clustering variables. Also, the technique provides for the optimal partition determination based
                     on the changes' graph in the average internal communication, depending on the selected clusters'
                     number. The optimal among the partitions set is considered to be a partition in which the average
                     internal connection increases sharply in comparison with the previous partition. Provided that
                     the clusters' number in each subsequent partition in comparison with the previous one is greater
                     by one. Thus, the methodology allows identifying the existing market structure.

1. Introduction
Our task was to segment the smoked sausages market in Naberezhnye Chelny (Tatarstan).
   Based on the secondary data analysis results, it was found that the smoked sausages consumers
behaviour is influenced by many factors, namely cultural, socio-economic, personal, psychological, and
organizational.
   To take into account these factors, the following market segmentation variables were selected: age,
gender, occupation, education level, nationality, marital status, income level, the reason for making a
purchase, consumption intensity, brand loyalty degree, the purchased product description, the purchased
product price, goods purchase place, consumer status.
   When choosing the most suitable method, we considered many market segmentation methods: one-
parameter method, segmentation grid, AID method, methods with a multi-stage approach (Hayley
Russell model, Peter Dixon model), a priori method, flexible segmentation, component-wise
segmentation, cluster method, self-organizing Kohonen maps.
   To carry out market segmentation, we have chosen a cluster analysis algorithm that allows us to find
natural market segments based on a variables' variety measured in qualitative scales.
              Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
              of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd                          1
AGRITECH-IV-2020                                                                                            IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012                     doi:10.1088/1755-1315/677/2/022012

2. Methodology
The method proposed by the authors assumes cluster analysis using the Mirkin proximity measure. This
proximity measure belongs to the associativity coefficients category. In addition to associativity
coefficients, correlation coefficients, distance measures and probabilistic similarity coefficients are also
distinguished.
    Mirkin's proximity measure differs from other coefficients in that it is obtained not simply as a
meaningful explanation, but as a classification process, certain theoretical premises result as a whole [4,
5]. Determined by the formula:
                                                      m
                                                kij =  pijl ,                                                         (1)
                                                      l =1

              1/ nl , if xl (i) = xl ( j ) = s,
where pijl =  s l
             0, if x (i)  x ( j ),
                                 l

   n sl - the objects' number for which l-th characteristic took the value s.
  This indicator value depends on two factors:

    •   the matching answers number from respondents;
    •   the respondents' number who chose the same answer option.

   The more coinciding answers the two surveyed respondents have, and the fewer the respondents who
chose similar answers, the greater proximity' measure value between them.
   To identify consumers, homogeneous groups, we have chosen Mirkin algorithm "unification" [5].
When choosing this cluster analysis algorithm, we proceeded from the fact that:

    •   the population volume subjected to clustering is more than 1,000 objects;
    •   each object is described by many features (more than three);
    •   the clusters' number is to be determined;
    •   there is no a cluster formal definition;
    •   the objects' number in a cluster can be any;
    •   it is required to determine the clusters, ideal representatives;
    •   clustering criterion is not specified.

    The "unification" algorithm is one of the hierarchical algorithms a group agglomerative algorithms.
In addition to hierarchical algorithms, there are also iterative algorithms, algorithms such as matrix
ordering, algorithms such as cutting a graph.
    The unification algorithm starts with a trivial partition, i.e. partitioning in which each object is
considered as a separate cluster. When implementing this algorithm, we maximize the partition
following quality index:
                                                             z
                                             F ( R, a ) =        (k     ij   − a) ,                                  (2)
                                                          c =1 i , jRc

 where R - is the original partition;
   kij - coefficient defining the objects' proximity measure i and j;
   c - cluster number;
   z - clusters number;
   Rc - objects included set in c-th cluster;
   a - threshold, i.e. the number with which the objects' proximity measure is compared to determine
whether the considered two objects (either an object and a cluster, or two clusters) can be attributed to
one common cluster.

                                                      2
AGRITECH-IV-2020                                                                                             IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012                      doi:10.1088/1755-1315/677/2/022012

   This indicator expresses the total internal communication in the breakdown R taking into account the
threshold a. We propose to select the threshold value so that the selected homogeneous groups' number
varies from 2 to 10.
   Then we carry out the following steps:

    •   We build the communication matrix A =             a , where a
                                                             ij                 ij    = kij − a, aii = 0 .
    •   Find the matrix maximum positive element a = max aij . Therefore, we combine the classes Rα
        and Rβ.
    •   We summarize the rows and columns with numbers α and β.
    •   Repeat steps 2-3 until the matrix all elements, except for the diagonal ones, are negative.
    •   Having performed the last union, we obtain the final partition, which is optimal.

   To check the partition quality, the so-called the partition quality indicator concept, defined on the all
possible partitions set, is often introduced into the cluster analysis problem statement. Then the best
partition is understood as the partition on which the selected quality indicator extremum is achieved.
   We propose to take [4,5] an average internal link in the original partition as a criterion for the partition
quality:

                                           1 z        1                       
                                      =                                kij  ;
                                           z c=1  nc (nc − 1) i , jRc , i  j 
                                                                                                                        (3)

   This indicator is determined for each partition. The change in the average internal connection value
depending on the selected segments number is presented in a graph form. We find the partition at which
the average internal communication indicator value increases sharply in comparison with others. This
will be the desired optimal partition.

3. Results
The smoked sausages market was segmented in the city of Naberezhnye Chelny (Tatarstan), namely the
smoked sausages end consumers totality, including real and potential consumers.
   For this, a structured questionnaire was drawn up. The questions were arranged in a specific order.
Answer options were offered for most of the questions. The respondents, depending on the question
specifics, were allowed to choose one or several answer options.
   All variables, except for the variable "age", are measured in qualitative scales, namely in nominal,
dichotomous and ordinal scales. When processing the data, the variable "age" values were also
transferred to an ordinal scale by determining the age intervals.
   After the survey, the selected data was encoded. The coding was performed as follows. A matrix of
X was compiled, where each question was assigned as many columns as required to display all possible
answers that carry useful information. Those options such as “don't know” or “doesn't matter” were not
considered. This coding was chosen because there were questions to which it was allowed to choose
several answers.
   Thus, all personal data were brought together in a single matrix consisting of Boolean columns, i.e.
columns with "zero" and "one" values:

                                                       X = xil  ,                                                     (4)

where i is the respondent's number;
  l - feature number.
  For the first ten objects, the matrix part looks like this:

                                                             3
AGRITECH-IV-2020                                                                            IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012     doi:10.1088/1755-1315/677/2/022012

                                           Table 1. Matrix part.
                                                         1
   i
         1      2       3      4      5     6      7         8     9      10    11     12     13     14
   1     0      0       1      0      0     0      1         0     0       0    0       0     1       0
   2     0      1       0      0      0     0      0         1     0       0    0       0     1       0
   3     0      0       1      0      0     0      1         0     0       0    0       0     1       0
   4     0      1       0      0      0     0      0         1     0       0    0       0     1       0
   5     0      0       1      0      0     0      0         1     0       0    0       0     1       0
   6     0      1       0      0      0     0      0         0     0       0    0       0     1       0
   7     0      0       1      0      0     0      0         1     0       0    0       1     0       0
   8     0      0       1      0      0     0      1         0     0       0    0       0     1       0
   9     0      0       0      1      0     0      0         1     0       0    0       0     1       0
  10     0      0       1      0      0     0      0         1     0       0    0       0     0       1
  …      …      …       …      …      …     …      …         …     …      …     …      …      …      …
  nl    208     473    634    169     26    21     533       512   62     10     8    205    736    143

   Further, homogeneous consumer groups were identified. When using the cluster analysis algorithm
chosen by the author, the Mirkin proximity measure is applied, which is determined by formula 1.
   Because the research objects number is more than a thousand and the calculations' volume is so large
that it is almost impossible to calculate everything manually, a program was developed in the MATLAB
system (figure 1) with which help all the calculations for this research were made.

                                Figure 1. Program in MATLAB system.

                                                     4
AGRITECH-IV-2020                                                                                                             IOP Publishing
    IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012                                      doi:10.1088/1755-1315/677/2/022012

          We calculate the proximity measure between two different objects. As a result, we get the matrix
    K = kij :

                                                                   Table 2. Matrix          K = kij .
          i                                                                             j
                      1             2         3            4             5               6          7            8            9           10          …
       1              0           0.020     0.020        0.015         0.022           0.025      0.022        0.040        0.020        0.012        …
       2            0.020           0       0.011        0.014         0.023           0.027      0.019        0.027        0.017        0.013        …
       3            0.020         0.011       0          0.006         0.029           0.013      0.013        0.015        0.011        0.014        …
       4            0.015         0.014     0.006          0           0.010           0.017      0.015        0.018        0.020        0.023        …
       5            0.022         0.023     0.029        0.010           0             0.028      0.019        0.017        0.012        0.020        …
       6            0.025         0.027     0.013        0.017         0.028             0        0.022        0.019        0.018        0.036        …
       7            0.022         0.019     0.013        0.015         0.019           0.022        0          0.019        0.031        0.021        …
       8            0.040         0.027     0.015        0.018         0.017           0.019      0.019          0          0.024        0.012        …
       9            0.020         0.017     0.011        0.020         0.012           0.018      0.031        0.024          0          0.016        …
      10            0.012         0.013     0.014        0.023         0.020           0.036      0.021        0.012        0.016          0          …
      …              …             …         …            …             …               …          …            …            …            …           …

          This "unification" algorithm starts with a trivial partition, i.e. partitions, where each object represents
    a separate class. We build a communication matrix                                  A = aij between classes. For a trivial partition,
    aij = kij − a Therefore, for                 a = 0,023 we have:

                                                                   Table 3. Matrix a = 0,023 .
               t         1           2            3            4              5             6            7            8            9          10          …

                                                  = 3R4 = 4R5 = 5R6 = 6R7 = 7R8 = 8R9 = 9R10 = 10
c
                    R1 = 1 R2 = 2R3                                                                                                                   …
1             R1        0          -0.003       -0.003       -0.008       -0.001         0.002       -0.001        0.017       -0.003        -0.011       …
2             R2     -0.003           0         -0.012       -0.009        0.000         0.004       -0.004        0.004       -0.006        -0.010       …
3             R3     -0.003        -0.012          0         -0.017        0.006        -0.010       -0.010       -0.008       -0.012        -0.009       …
4             R4     -0.008        -0.009       -0.017          0         -0.013        -0.006       -0.008       -0.005       -0.003         0.000       …
5             R5     -0.001         0.000        0.006       -0.013          0           0.005       -0.004       -0.006       -0.011        -0.003       …
6             R6      0.002         0.004       -0.010       -0.006        0.005           0         -0.001       -0.004       -0.005         0.013       …
7             R7     -0.001        -0.004       -0.010       -0.008       -0.004        -0.001          0         -0.004        0.008        -0.002       …
8             R8      0.017         0.004       -0.008       -0.005       -0.006        -0.004       -0.004          0          0.001        -0.011       …
9             R9     -0.003        -0.006       -0.012       -0.003       -0.011        -0.005        0.008        0.001          0          -0.007       …
10            R10    -0.011        -0.010       -0.009        0.000       -0.003         0.013       -0.002       -0.011       -0.007           0         …
…             …        …             …            …            …            …             …            …            …            …             …          …

       With a given matrix A, select the maximum value act , i.e max act . In our matrix
    max act = a18 = 0,017 , as you can see, the matrix maximum element is positive, therefore, we combine
    objects 1 and 8 into one class. To do this, we summarize rows with numbers 1 and 8, as well as columns
    1 and 8, we get a new matrix A:
                                                                       Table 4. Matrix A.
                     t        1             2            3            4            5             6            7            8             9            …
                                                         = 3R4 = 4R5 = 5R6 = 6R7 = 7R8 = 9 R9 = 10
      c
                         R1 = 1,8 R2 = 2R3                                                                                                        …
      1             R1       0.034        0.001       -0.011         -0.013       -0.007        -0.002       -0.005       -0.002        -0.022        …
      2             R2       0.001          0         -0.012         -0.009       0.000         0.004        -0.004       -0.006        -0.010        …

                                                                                   5
AGRITECH-IV-2020                                                                                                                IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012                                         doi:10.1088/1755-1315/677/2/022012

  3     R3          -0.011                             -0.012      0     -0.017       0.006     -0.010      -0.010       -0.012   -0.009   …
  4     R4          -0.013                             -0.009   -0.017      0         -0.013    -0.006      -0.008       -0.003   0.000    …
  5     R5          -0.007                             0.000    0.006    -0.013          0      0.005       -0.004       -0.011   -0.003   …
  6     R6          -0.002                             0.004    -0.010   -0.006       0.005        0        -0.001       -0.005   0.013    …
  7     R7          -0.005                             -0.004   -0.010   -0.008       -0.004    -0.001         0         0.008    -0.002   …
  8     R8          -0.002                             -0.006   -0.012   -0.003       -0.011    -0.005      0.008           0     -0.007   …
  9     R9          -0.022                             -0.010   -0.009   0.000        -0.003    0.013       -0.002       -0.007      0     …
  …     …             …                                  …        …        …            …         …           …            …       …       …

   Let us find the maximum positive element of the new matrix A max act = a69 = 0,013 . Therefore,
we combine classes 6 and 9 into one group. We continue the calculations until all a ct ( c  t ) are
negative. The matrix and gives the summary links between the classes, and the classes themselves are
determined by the summing operations sequence.
    Such clustering was carried out at different values of the threshold value a, and partitions were
obtained where the clusters' number varied from 2 to 10. For each split, the average internal link was
determined using formula 3.
    The change in the average internal connection depending on the selected clusters' number is shown
in the graph (figure 2).
    As you can see from the graph, the average internal connection has increased sharply compared to
the previous one when the objects were divided into seven clusters. This is the desired optimal partition.
    As a research clustering 1,534 objects result, this algorithm identified seven natural clusters: cluster
1, which includes 149 objects; cluster 2, including 226 objects; cluster 3, including 285 objects; cluster
4, including 343 objects; cluster 5, including 195 objects; cluster 6, including 200 objects; cluster 7,
including 136 objects.

                                             0,04
              Average internal connection in the

                                             0,03
                                             0,02
                          partition

                                             0,01
                                                   0
                                                       1    2      3     4        5        6        7       8        9       10   11
                                                                             Сlusters' number in the partition

             Figure 2. Graph of changes in the average internal connection depending on the
             allocated clusters' number.

4. Discussion
Based on the obtained segments each analysis results, it was concluded that the market segmentation
was successful. The following market segments were identified:

    •   Ritualists. Men and women over 25, family, with an average income level. They consume
        traditional semi-smoked pork and beef sausages in natural packaging, they have established
        tastes. Consume on average at least once a week. They are purchased mainly in the
        manufacturer's branded kiosks and the markets.

                                                                                       6
AGRITECH-IV-2020                                                                          IOP Publishing
IOP Conf. Series: Earth and Environmental Science 677 (2021) 022012   doi:10.1088/1755-1315/677/2/022012

    •   Pensioners. Elderly people with low income. They consume inexpensive semi-smoked and
        boiled-smoked pork and beef sausages on average once or twice a month. Buy for the holiday
        and when there is money. They are purchased mainly in markets and nearby stores.
    •   Amateurs. All ages people, with middle and high-income levels. Consumed often, almost every
        day. They give preference to quality products. There is no attachment to smoked sausages any
        type, they are ready to try new non-traditional varieties. The price doesn't matter. They prefer
        to buy in supermarkets.
    •   Young people. Non-family people are usually young. Not often consumed two to three times a
        month. They have no pronounced preferences. They buy spontaneously usually the highest
        grade semi-smoked sausages, as a rule, in supermarkets, nearby stores, and branded kiosks.
    •   Forced consumers. Working townspeople with lower middle income, mostly women. They
        are rarely consumed, once or twice a month. When there is a quick snack need at work, on the
        road or at home. Preference is given to semi-smoked sausages made from pork.
    •   Elite consumers. Citizens over 35 years old, family, with a high level of income. Consume high
        quality proven products, at least once a week. Preference is given to semi-smoked, boiled-
        smoked and uncooked smoked sausages. Buy in supermarkets and brand kiosks.
    •   Muslims. People adhering to Muslim traditions. Consumed very rarely or not consumed at all.
        We are ready to consume more often, provided that smoked sausages are made according to
        Muslim traditions and sold in specialized Muslim stores. They prefer semi-smoked and cooked-
        smoked beef and horse meat sausages without lard in natural packaging.

    For each of these segments, it is necessary to develop its marketing complex. With the successful
development of a marketing mix for each of the selected segments, the company can increase sales and
strengthen its position within each market segment.

5. Conclusion
Thus, to conduct market segmentation, we have developed a cluster analysis method, which allows
conducting research based on many variables; consider variables as equivalent, without requiring a
relations hierarchy establishment between them; proceed from the fact that the market structure is
unknown, and we cannot set the potential consumers' profiles in advance; find natural market segments
based on a variety of variables measured on qualitative scales.

References
[1] Istomin P O 2016 Market segmentation Modern science theory and practice 6-1(12) 533
[2] Karasev A P 2014 Consumer markets segmentation (Yaroslavl, Russia: Avers Plus)
[3] Kutserubov A E 2017 Modern approaches to market segmentation Proc. Sayapin readings. The
        round table materials collection (Tambov, Russia: Tambov State University) p 165
[4] Mandel I D 1988 Cluster analysis (Мoscow, Russia: Finance and Statistics)
[5] Mirkin B G 1980 Qualitative features and structures analysis (Мoscow, USSR: Finance and
        Statistics)
[6] Rolbina E S 2013 The consumer market preliminary segmentation VEPS 1 93

                                                     7
You can also read