Perceptual Font Manifold from Generative Model

Page created by Javier Henderson
 
CONTINUE READING
Perceptual Font Manifold from Generative Model
Perceptual Font Manifold from Generative Model

                                           Yuki Fujita, Haoran Xie, and Kazunori Miyata
                                        Japan Advanced Institute of Science and Technology
                                                          Ishikawa, Japan

Figure 1: The perceptual manifolds of fonts. We show all manifolds of causal, formal and POP styles, and separated the manifold from
left to right. In this research, perceptual manifolds are obtained from the results of a perceptual study of the feature space of fonts. The
feature space is learned from the latent space of a generative model of the font generation.

   Abstract—Though in recent times, various fonts are available
online for public usage, it is difficult and challenging to
generate, explore, and edit the fonts to meet the preferences of
all users. To address these issues, we propose in this paper, a
font manifold interface to visualize the perceptual adjustment
in the latent space of a generative model of fonts. In this
paper, we adopt the variational autoencoder network for font
generation. Then, we conducted a perceptual study on the
generated fonts from the multi-dimensional latent space of the
generative model. After we obtained the distribution data of
specific preferences, we utilized a manifold learning approach
to visualize the font distribution. As a case study of our                         Figure 2: Search results from Google Fonts.
proposed method, we developed a user interface for generated
font exploration in the designated user preference using a heat
map representation.
                                                                        2 shows the user-searching interface of Google Fonts. In
  Keywords-Character Fonts; Generative Model; Manifold
Learning; Perceptual Study;                                             order to search for the desired fonts, the user is asked to
                                                                        select different image features of font, the image features
                                                                        include thickness, slant, and width. This exploring approach
                      I. I NTRODUCTION
                                                                        is inefficient because humans font perceptions cannot easily
   Character fonts play crucial role in the representation of           be separated into specific features. For example, it is difficult
information. A user may choose the preferred types and                  to clarify which font features are the most appropriate for
shapes of fonts for special purposes. However, the fonts                creating an appealing festival poster. The same issues also
used in festival posters and business documents may be                  arise in Font Squirrel where tags are used to display the font
entirely different from each other. Usually, the choice of              features; moreover, understanding the meaning of these tags
fonts requires the skill and experience of a professional               may be challenging.
designer as it is always challenging to choose the most                    Recent research presents the font manifold in two-
suitable fonts from online sources or font lists in document            dimension (2D) by learning the nonlinear mapping from
files to meet the user’s perception need. In recent times,              existing fonts [3]. In this approach, a casual user can easily
several kinds of character fonts such as Google Fonts [1]               explore and edit fonts using the manifold with smooth
and Font Squirrel [2] are widely used.                                  interpolations. However, it is still challenging to explore the
   Exploring and editing to obtain the desired fonts from the           fonts with specific perception because all the font features
font libraries are usually challenging to casual users. Figure          are only geometrically displayed. To address this issue, in
Perceptual Font Manifold from Generative Model
Figure 3: The framework of our proposed perceptual font manifolds. In the offline process, we employed the VAE generative model
obtained the latent space of font generation. The generated result revealed that perceptual study in the latent space is explored for three
types of perception. We remark that the perceptual manifolds are obtained by the manifold learning approach. Finally, a casual user can
explore the desired fonts on the user interface proposed in the paper.

this paper, we propose a perceptual font manifold with                    •   We propose a perceptual font manifold to meet specific
different perception purposes. Moreover, it is worthy of note                 perceptual requirement in font exploration and editing.
that the authors adopted the font matching with energy-                   •   We provide a font-exploring user interface based on our
based optimization process in the research, which is time-                    proposed perceptual font manifolds, which is efficient
consuming and difficult to implement. Thanks to the rapid                     and user-friendly.
development of generative models based on deep learning
                                                                                          II. R ELATED W ORKS
approaches, with which we directly applied the latent space
from the generative model to construct the font manifold. In              Generative Model and Character Fonts. Paul et al.
contrast to the previous work [3], the approach proposed in            [4] proposed a neural network architecture, which creates
this paper is more straightforward and easier to implement.            a character font dataset from a single input font image. This
                                                                       dataset contains 62-letter fonts (from ”A” to ”Z”, ”a” to ”z”,
   In this paper, we propose the perceptual font manifolds
                                                                       and ”0” to ”9”). Samaneh et al. [5] proposed an end-to-end
from a generative model based on a deep learning approach
                                                                       system, called Multi-Content GAN, which generates a set
of the capital letter ”A” fonts. The results of the perceptual
                                                                       of font images using two networks; ”Glyph Network” and
studies in the latent space of the generative model were visu-
                                                                       ”Ornamentation Network”. Besides, this system can transfer
alized. Furthermore, we utilized the variational autoencoder
                                                                       the typographic stylization as well as the textual stylization
(VAE) network for font generation. As an advantage of the
                                                                       (color gradients and effects). Mehta et al. [6] proposed a
VAE model, the latent space is continuous such that similar
                                                                       character recognition system using artificial neural network
images are classified close to each other; hence, enabling
                                                                       and nearest neighbour approach from scanned images. We
the exploring of the perception distribution of fonts.
                                                                       observed that researches that deal with the combination of
   Besides, in this paper, based on the generation results             a generative model and character fonts focus mainly on
of the VAE model, we propose a user interface for the                  the generation of character font images; however, they do
perceptual study. When the user changes the latent variables           not put the perceptions of font images into consideration.
in the multi-dimensional space from the learning results, the          Therefore, in this paper, we bridge the gap observed above
user interface displays the corresponding font image as an             by considering the perception distribution in the latent space
output. In order to investigate perceptions of font images,            of the generative model.
we surveyed using three perceptions of casual, formal,                    Character Fonts and Perception. Choi et al. [7] pro-
and POP types of fonts. Finally, based on the perceptual               posed a system, which shows the character font correspond-
font manifolds presented in this paper, we propose a user              ing to an input image. A large number of character fonts are
interface. The proposed interface is shown to be efficient and         mapped on the semantic space in advance, and the system
easy to use in contrast to other widely used font searching            examines which character font corresponds to the input
interfaces.                                                            image on the space. The research carried out in [7] dealt with
  The main contributions of this research are as follows.              a combination of character fonts and perception. Regarding
Perceptual Font Manifold from Generative Model
Figure 4: The architecture of our VAE network.

mapping on the semantic space, the authors did not consider                           IV. F ONT G ENERATION
the acquiring of the image feature values. However, in this            This paper adopts the generative model for font genera-
paper, we utilized the generative model for mapping to the          tion. We clarified the learning network; moreover, the details
latent space and considered the feature values of character         of our dataset are as follows.
images.
   Distribution of Character Fonts. Campbell et al. [3]             A. Generative Model
visualized the distribution of character fonts using the man-          In this paper, we utilized the VAE, which is the most
ifold. The user can move on this manifold with a mouse              representative generative model in the deep learning field
and transform the font image continuously. Guo et al. [8]           (see [9] and [10]). Figure 4 shows the network architecture
visualized Chinese font manifold. This system is learned            of the VAE model employed in this paper. The number under
using the feature of shape and skeleton; hence, it can gener-       each layer indicates the size of each array. The encoder
ate new fonts (not existing fonts). Furthermore, the systems        has four convolutional layers, and a flatten layer, while the
display the distribution of the character fonts such that one       decoder has a lambda layer and two convolutional layers.
can generate and search the fonts using that distribution.          The lambda layer calculates the latent variables using a mean
However, they also do not consider any perception from              and a deviation. For the implementation of the VAE model,
font images. In this paper, besides the display distribution,       we used the Keras library [11].
we considered the perception.
                                                                    B. Training Dataset
                                                                       To construct the training dataset for the VAE network, we
                  III. S YSTEM OVERVIEW                             downloaded 2244 TrueType font files from Google Fonts.
                                                                    Among all these files, the valid image data includes 2169
   Figure 3 illustrates the framework of the system proposed        PNG image files because 75 files could not be converted
in this paper. Contrary to the conventional generative model        into image files. Then, we executed the data cleansing of
based on deep learning approaches, in our proposed system,          the font images ready for machine learning.
the human effort is involved in the font generation processes.         Figure 5 illustrates the data cleansing process. The goal
Therefore, our framework includes two main components:              of the data cleansing process is to create 28 × 28 images for
machine computation and human involvement.                          our learning network. Similar to MNIST dataset, we utilized
   At the machine computation stage, we adopt generative            the same learning image sizes. Usually, the training image
network to generate the latent space of various fonts, while        has a white margin part except for the font body. We erased
at the human involvement stage, we ask the participant to           the white margin of font images and maintained the font
respond to font perception using the perceptual font user           body as much as possible for all image data.
interface proposed in this paper. In this interface, the user can      In the data cleansing process, we first converted the
adjust the latent variables to achieve the desired perception.      downloaded TrueType font files into grayscale image files.
In the case study considered, we recorded all latent variables      At the beginning of the process, the image files were set in a
related to causal, formal, and POP feelings. Furthermore, the       resolution with 256 × 256 pixels and sufficiently large size.
perceptual study revealed that the perceptual manifold of           Then, the rectangle-bounding box of the font is calculated.
fonts is obtained via manifold learning. Finally, we propose        To convert the rectangular image into a square size, we
a user interface for font exploration, which is based on the        added the white row (column) to the shorter side of the
approach proposed in this paper.                                    rectangle alternately. Finally, we scaled the image to 28×28
Perceptual Font Manifold from Generative Model
Figure 5: Data Cleansing Process

                                                                  Figure 7: The proposed user interface (a) used in our perceptual
                                                                  study (b).

                                                                  variables as shown in Figure 7 (b). Observe that each latent
        Figure 6: Font examples in our training dataset.          variable can be operated from 0 to 99. However, because
                                                                  the latent space is calculated based on Gaussian distribution,
                                                                  we adopt the percent point function and divide the variable
pixels using bilinear interpolation. Figure 6 shows examples      range from 5% to 95% into one hundred equal parts. In our
of font images in our font training dataset.                      proposed user interface, the output font image corresponding
                                                                  to the selected latent variable is displayed in real time.
            V. P ERCEPTUAL F ONT M ANIFOLD
                                                                     To classify the fonts into different user perceptions, we
   In this section, we propose the perceptual font manifold
                                                                  adopted three styles of font perception in this paper, includ-
considering the user perception in the font generation and
                                                                  ing POP, formal, and casual styles. Whenever the user feels
exploration. To effectively analyze the user perception of
                                                                  the right perception from the font image, the user was asked
font images, we conducted a perceptual study of font styles
                                                                  to click the perception buttons on the user interface, and the
based on the generated results from the VAE network. In this
                                                                  system will save the selected latent variables. To achieve a
study, the participants can choose the latent variables with
                                                                  good starting point, the user can also click the ”Changing a
the output of the font image generated in the latent space.
                                                                  Starting Font” button to change all latent variables randomly.
The latent space will have enough information if the latent
dimensions are high. However, it is difficult to obtain data         The purpose of this perceptual study is to classify the
from the perceptual study in high-dimensional latent space.       font images into three designated user perceptions. At the
Considering the trade-off between the computation cost and        beginning of the perceptual study, we showed examples of
the perception evaluation load, in our study, we chose five-      feature font of three user perceptions as shown in Figure
dimensional latent space.                                         8. Besides, we invited 17 graduate students to join our
                                                                  perceptual study. The survey time was limited to five minutes
A. Perceptual Study                                               because the user may feel exhausted to repeat the same task
   Figure 7 (a) shows the user interface for the perceptual       for a long time period. Finally, a total of 884 latent variable
study. The participants were asked to select their favorite       sets were collected including 273 POP, 311 formal, and 298
fonts using the five sliders to change the values of the latent   casual styles.
Perceptual Font Manifold from Generative Model
(1)POP               (2)Formal             (3)Casual
                  Figure 8: Feature Examples

                                                                    Figure 10: Our proposed user interface using the perceptual font
                                                                    manifold.

                                                                    interface in our case study, which is implemented as a Web
                                                                    application for easy usage.
                                                                       The proposed user interface is simple and easy to use.
                                                                    The interface shows the corresponding font image on the
                                                                    upper-left window continuously while moving the control
                                                                    point on the heatmap. To help choose the font in different
                                                                    user perceptions, the user can select the perception buttons
                                                                    on the right side of the user interface (All, POP, Formal,
                                                                    and Casual). The corresponding heatmap (Figure 1), from
                                                                    a different perspective is presented to the end user. When
                                                                    the user finds the desired font image, the user can click
                                                                    on it, and the modal window will be displayed for further
Figure 9: Results of dimensionality reduction from our perceptual   confirmation.
study. Blue points indicate the POP style, green points denote
formal style, and yellow points for causal style of fonts. tSNE-
1 and tSNE-2 are the dimensions of the reduced distribution data.
                                                                                          VI. C ASE S TUDY
                                                                    A. Comparison Study
B. Manifold Learning                                                   In our case study, we compared the user interface pro-
   The outcome of the perception study of font images               posed here with the traditional user interface for font search-
revealed that one could obtain distribution data in a latent        ing as used in online font libraries such as Font Squirrel.
space of five dimensions. To reduce the distribution data in        Note that we only focused on ”A” font in this research
two dimensions, we adopted a manifold learning approach.            for simplicity purpose. Figure 11 shows the traditional user
In this paper, we utilized the t-distributed Stochastic Neigh-      interface for font exploration, which displays 1592 font
bor Embedding (tSNE) method for model reduction [12].               images from the generation results of the VAE network. All
Figure 9 shows the result of the dimensionality reduction of        these font images are arranged in 10 columns and 160 rows
the distribution data from our perceptual study.                    with the scrollbar in a web application.
   To visualize the two dimensional distribution data from             Our case study, we randomly selected 10 target font
the manifold learning, we adopted the kernel density es-            images from the generated results as shown in Figure 12.
timation method to obtain a heatmap representation of               In both the traditional and the proposed user interfaces, the
the perceptual manifold of fonts [13]. Consequently, we             target font images are located on the upper-left side of the
achieved the heatmaps of all perception styles; POP, formal,        interface windows. The participants were asked to look for
and casual styles of fonts as shown in Figure 1. The high-          the target font images on both user interfaces and click on
density area is displayed in red color, while the low-density       the explored position with a confirmation window. Figure
area is displayed in blue color.                                    12 shows the distribution of target images on the proposed
                                                                    user interface.
C. User Interface                                                      We conducted the comparison study with 20 graduate
   Using the perceptual manifold of fonts introduced in this        students and randomly divided them into two groups; group
paper, we propose a user interface for font exploration from        1 and group 2. The members in group 1 were asked to use
the VAE generative network. Figure 10 shows the user                the traditional user interface first, and after that, to use the
Perceptual Font Manifold from Generative Model
Figure 11: Traditional user interface for font exploration in a
comparison study.

                                                                     Figure 13: Comparison of SSIM scores between traditional (red)
                                                                     and our proposed user interface (green).

Figure 12: Test dataset used in our case study and their distribu-
tions in the manifold of fonts.                                      Figure 14: Comparison of time usage between traditional (red)
                                                                     and our proposed user interface (green).

proposed interface (Figure 10). The members in group 2
were asked to use the proposed user interface first, and then        where x and y are pixel positions on two font images. µx
the traditional user interface.                                      and µy denote mean pixel values, σx and σy are standard
                                                                     deviations of pixel value on two images, and σxy is the
                                                                     covariance value. C1 = 6.5 and C2 = 58.5 are constant
B. Discussion                                                        values. SSIM score is expressed between 0 and 1, and the
   In the case study carried out in this paper, regarding the        closer to 1, the higher the similarity.
usage of the two user interface considered here, we com-                Figure 13 shows the comparison results of exploration
pared the exploration accuracy and time cost. Regarding the          accuracy in SSIM score using the traditional and our pro-
exploration accuracy, we used structural similarity (SSIM)           posed user interface. Regarding SSIM scores, there is no
to quantify the similarity between two images [14]. In our           apparent difference between the two interfaces. In details,
research, SSIM scores are calculated between the target font         the user interface proposed in this paper achieved a little
images and the explored images from user operation. The              higher median and maximum values in SSIM scores than
formulation of SSIM score is given as follows.                       the traditional interface.
                                                                        Figure 14 shows the comparison results of time cost
                         (2µx µy + C1 )(2σxy + C2 )                  in seconds in using the traditional and our proposed user
    SSIM (x, y) =                                             (1)
                      (µ2x + µ2y + C1 )(σx2 + σy2 + C2 )             interface. Regarding the average of times cost, the partici-
Perceptual Font Manifold from Generative Model
Figure 15: Explorations of font manifolds with three perception style demonstrating our matching results.

                                                                                             VII. R ESULTS
                                                                       The proposed system was implemented on a desktop
                                                                    computer with ubuntu 16.04 LTS, Intel Core i7-7700 CPU
                                                                    @ 3.60GHz ×8, GeForce GTX 1060 3GB GPU, and pro-
                                                                    grammed in Python 3.5 with CUDA 8.0, CuDNN 6.0 and
                                                                    Keras-gpu 2.1.6. The utilized VAE model was trained for 50
                                                                    epochs.
                                                                       Figure 15 shows the exploration results of fonts in three
                                                                    perceptual font manifolds. For POP perception of font im-
                                                                    ages, it is clear that these fonts are in bold and slant styles,
                                                                    whereas they are all in slim shape without slant styles in
                                                                    the formal perception. Also, for casual perception, there is
                                                                    an apparent feature of circular curves in the font styles. All
                                                                    these exploration results agree well with the common sense
                                                                    of these user perceptions as shown in Figure 8 (examples of
Figure 16: Application of our results with mapping from A to Z      images).
of test words.                                                         Though the perception study conducted in this research
                                                                    only handles ”A” fonts for simplicity, we can map the
                                                                    explored fonts into all character fonts. Figure 16 shows the
                                                                    text application of our results from ”A” to ”Z” which are
                                                                    obtained by matching the closest SSIM scores of ”A” fonts
pants operated 1.7 times slower in the traditional interface        in existing fonts to our generated font manifolds. The star
(97 seconds) than in our proposed interface (54 seconds).           points on the left figures of the font manifolds denote the
To confirm the difference between the two interfaces, we            font positions. We observed that the perception of the three
utilized the student’s t-test to verify the time cost. The result   font styles is apparently correct for common users.
revealed a score of 0.71 %, which shows that there is a
significant difference between the two interfaces. Hence, the                             VIII. C ONCLUSION
user interface proposed in this paper is more efficient than          In this paper, we proposed the perceptual font manifolds
the traditional interface.                                          using latent space from a generative model and the percep-
Perceptual Font Manifold from Generative Model
tual study carried out in this research. For the generative           [7] Saemi Choi, Kiyoharu Aizawa, and Nicu sebe: FontMatcher:
model, we employed the VAE network architecture with five                 Font Image Paring for Harmonious Digital Graphic Design,
latent dimensions of ”A” dataset, whereas for the perceptual              ACM, pp.37–41 (2018).
study, we obtained the distribution data of POP, formal,              [8] Yuan Guo, Zhouhui Lian, Yungmin Tang, and Jianguo Xiao:
and casual styles of a font in the latent space. We reduced               Creating New Chinese Fonts Based on Manifold Learning
the data dimensions into two dimensions using manifold                    and Adversarial Networks, EUROGRAPHICS (2018).
learning and made the heat maps of each distribution using
kernel density estimation method. Finally, we proposed a              [9] Diederik P. Kingma, Max Welling: Auto-Encoding Variational
                                                                          Bays, In Proceeding of the International Conference on Learn-
user interface for font exploration using our perceptual font             ing Representations (ICLR). (2014).
manifolds.
   In the case study of this research, the participants were         [10] Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra:
asked to explore the target font images using the traditional             Stochastic Back-propagation and Approximate Inference in
font exploration interface and the user interface proposed in             Deep Generative Models, Technical report, arXiv: 1401.4082.
                                                                          (2014).
this paper. It was verified that our proposed user interface
requires less time cost while keeping good exploration               [11] Keras: The Python Deep Learning Library (online),
accuracy. Furthermore, the exploration results of different               https://keras.io/
perception styles revealed that our proposed system could
achieve the desired style of fonts suitable for not only ”A”         [12] Laurens van der Maaten, Geofferey Hinton: Visualizing Data
                                                                          using t-SNE, Journal of machine Learning Research 9,
font but also all character fonts.                                        pp.2579–2605. (2008).
   As limitations of this research, the system proposed in this
paper only handles ”A” fonts. Though we have matched all             [13] B. W. Silverman: Density estimation for statistics and data
the fonts by finding the closest position in ”A” font, the                analysis, (1986).
accuracy for other character fonts may be lost. In order
                                                                     [14] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh and
to address this issue, the supervised font style transfer                 Euro P. Simoncelli: Image Quality Assessment: From Error
employed in [4] should be uitlized. Besides, there are many               Visibility to Structural Similarity, IEEE Transaction on Image
fonts defined in the shapes of outline fonts represented by               Processing, Vol.13, No.4. (2004).
TrueType font. The outline font expresses the font data as the
curve parameters and responds appropriately to scaling. We
also considered the exploration of the perceptual manifold
of the outline fonts.
   As possible future research, the concept of the perceptual
study of generative models may be applied to other research
targets such as human faces and foods.
                         R EFERENCES
 [1] Google Fonts (online),
     https://fonts.google.com/

 [2] FONT SQUIRREL (online),
     https://www.fontsquirrel.com/

 [3] Neill D.F. Campbell, Jan Kautz: Learning a Manifold of
     Fonts, ACM Transactions on Graphics, Vol 33, 4 (2014).

 [4] Paul Upchurch, Noah Snavely, and Kavita Bala: From A to Z:
     Supervised Transfer of Style and Content Using Deep Neural
     Network Generators, arXiv: 1603.02003 (2016).

 [5] Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen
     Wang, Eli Shechtman, and Trevor Darrell: Multi-Content
     GAN for Few-Shot Font Style Transfer, arXiv: 1712.00516
     (2017).

 [6] H. Mehta, S. Singla and A. Mahajan: Optical character recog-
     nition (OCR) system for Roman script English language us-
     ing Artificial Neural Network (ANN) classifier, International
     Conference on Research Advances in Integrated Navigation
     Systems (RAINS), Bangalore, 2016.
Perceptual Font Manifold from Generative Model
You can also read