CHARACTER DETECTION IN BNI INTERNET BANKING CAPTCHA IMAGE USING TEMPLATE MATCHING CORRELATION - ESQ Business School

Page created by Ted Carlson
 
CONTINUE READING
IJTB | International Journal of Technology And Business

       CHARACTER DETECTION IN BNI INTERNET BANKING CAPTCHA
           IMAGE USING TEMPLATE MATCHING CORRELATION

                                                            Deni Sutaji
                              Universitas Muhammdiyah Gresik/Informatika, Gresik, 61121, Indonesia
                                                Email: sutaji.deni@umg.ac.id

                                                              Abstract
          Captcha image is already used in Internet Banking as a security device in the process of authentication for
          active users as a humans or a robots. BNI Bank is the one that applies capthca image, contain four number
          characters in it. In this study, we propose how to read characters in the captcha image using Template
          Matching Correlation method. Later in the next study it can be used as an automatic login system for
          Micro, Small and Medium Enterprises (MSME) players who apply transfer payments to BNI Bank
          accounts. The first step to recognize is pre-processing, then proceed with segmentation and labeling per
          characters, and the last is matching with a reference template dataset using Template Matching
          Correlation. Of the 100 test data, this system produces an accuracy rate that reaches 100%. So this
          method is suitable for identifying characters of captcha image on the BNI Internet Banking login page.

          Keywords : captcha, internet banking, template matching correlation

INTRODUCTION                                                       accounts, so it is better to have a system for
                                                                    checking automatic mutations for MSME
       In Indonesia, the economic characteristics are               entrepreneurs. There is one sub-system of an
dominated by Micro, Small and Medium Enterprises                    automatic system is how to recognize characters in
(MSMEs). MSMEs have a vital and strategic task in                   captcha image automatically. Because the written
helping national economic development. Other                        characters are randomly generated by the BNI Bank
benefits of MSMEs for employment are also                           Internet Banking event if website page is re-loaded
beneficial for Indonesia's economic growth. It is                   (for login / logout).
recorded that up to 2015 the number of                                     Captcha contained in the BNI Internet
entrepreneurs recorded in the tax director general                  Banking login line is used as a security code
was 56,539,560 and 99.9% in the form of MSMEs                       consisting of a row of numbers of characters
[14]. According to the world bank, MSMEs are                        generated randomly on the login page in the form of
grouped into 3 categories, namely: Micro                            images. The aim is to can’t copy-paste the form
Enterprises (number of employees             10); Small             contents. Internet Banking users are required to
Business (number of employees 30); and Medium                       write the sequence number in the fields as a sign to
Business (number of employees up to 300 persons)                    be allowed to log in on the Internet Banking page.
[14].                                                                      The popular method for recognizing
       In carrying out its business, many MSMEs                     characters in various types of letters and numbers is
use transfer facilities, both from ATMs and Internet                using the Optical Character Recognition (OCR)
Banking, one of which is BNI Bank Internet                          model with Template Matching. Kurniawan in 2016
Banking. MSME entrepreneurs will certainly see the                  in his research succeeded in recognizing the
transfer of accounts for their business transactions                character of vehicle number plate images using the
with their customers whether the payments made by                   Template Matching method, with 30 sample data
consumers have entered into a debit account or not.                 capable of recognizing as many as 238 characters
By using BNI Internet Banking, MSME                                 with an accuracy value of 80.25% [1]. Whereas
entrepreneurs must enter a username, password and                   research conducted by Sutaji in 2018 using the same
captcha image. Of course this will make routine                     method can recognize the captcha images of BRI
activities at all times carried out by the entrepreneur,            Internet Login with an accuracy of 93.5% [11]. With
or even carried out by administrative staff. It will                the approach of artificial intelligence, research
take more time and concentration and the risk of                    conducted by Ye Wang and Mi Lu novel adaptive
errors that result in blocked Internet Banking                      algorithm managed to recognize characters in
                                                                    captcha images with an average accuracy value of

    Deni Sutaji                                                                                  ©2019 IJTB All rights reserved.
    E-mail address: sutaji.deni@umg.ac.id
IJTB | International Journal of Technology And Business

70.78% [13]. Sliding window based on the neural           a Bank that provides Internet Banking services in a
network was also carried out by Hussain, et al. With      limited scope and no transactions are carried out.
a success rate of 95.5% in character recognition [4].     c) Transaction Internet Banking, which is a service
      In this article, we will discuss how to             provided by the Bank to customers to execute
recognize captcha image characters on the Bank            transactions through the internet network.
BNI Internet Banking login page using Template
Matching Correlation.                                     In the third type, the captcha image is embedded in
                                                          the login authentication system that is on the BNI
THEORY/CALCULATION                                        Internet Banking login page and which will be the
                                                          object of this research.
a. Capthca Image
       Captcha stands for Completely Automaticand         c. Optical Character Recognition
Human Apart is a fully automated public test to                    Optical Character Recognition (OCR) is a
identify whether a user includes a computer or            system that functions to recognize character letters
human [3] [4]. The point is Captcha serves to ensure      and numbers to be converted into written files. This
that the sender of the data is not human (script /        letter or number recognition system can be used to
program / robot) that automatically sends data            increase the flexibility, ability and intelligence of
continuously.                                             computer systems [8][13]. A smart character
       Generally Captcha in the form of an image in       recognition system can be used to help humans in
which there is a code, where the code can be easily       activities that are currently carried out by many
read by humans, but the computer will have                parties namely information and knowledge
difficulty reading the code in the image (easier to       digitalization activities. For example in the making
read the code in text form) because for computers,        of digital library collections, ancient digital satra
an image is a collection of color intensity values        collections, automation of screening notes, etc.[8].
from each pixels, so a process that is not simple and     The OCR algorithm can be seen in Figure 1.
complex is needed to be able to recognize objects in
the image, let alone to know the meaning of the
image [12].                                                                            Start
       However for humans, it is very easy to read
the code in the form of images and enter the code in
a text input as a condition for sending data, so that                                File input
in this way only humans can be expected to continue
sending data while the computer / robot cannot [6].
                                                                               Pre-Processing
b. Internet Banking
         Internet Banking according to its constituent
words is a combination of two words, namely
internet and bank. The internet is a network system                            Segmentation
that connects computers in global coverage
throughout the world [2]. According to Bank
Indonesia, Internet Banking is one of the services in                           Normlaisation
the form of services that facilitate customers to
obtain information, carry out communications and
carry out banking transactions with the help of the
                                                                              Fiture Extraction
internet network. There are three types of Internet
Banking services, namely [2]:

a) Informational Internet Banking, which is a                                   Recognition
service provided by the Bank to customers in the
form of information with internet network media
and no transactions being carried out.                                                FInish

b) Communicative Internet Banking, which is a
service provided by a bank to a customer in the form                       Fig. 1. OCR Flowchart
of communication, characterized by interaction with
                                                                                                                      2
                                                                                                  Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business

d. Template Matching Correlation
         Basically Template Matching is a simple
process in character recognition in a captcha image.
The way the Template Matching algorithm works is
that it starts with an input image which contains
letters or numbers. The image is then compared with
the template image stored in the database. Then the
template is placed at the center of the part of the
image that will be compared. After that the
calculation is done to get how many pixel points are
the most suitable for the template image [9].
         These steps are repeated continuously for
the entire input image that will be compared. The
highest suitability value of the pixel point between
the input image and template image indicates that
the template is the template image that best fits the
input image [4]. Illustration of this algorithm can be
seen in Figure 2.2

e. Connected Component Labeling
Connected Component Labeling is an image
segmentation technique that can also be used to
classify regions or digital image objects. This
technique uses the theory of pixel connectivity in
the image. Pixels that enter the region are called                 Fig. 2. Template Matching Illustration
connected (indicating there is connectivity or
connectivity) by adhering to adjacency rules (rules
of proximity of pixels) [9]. This pikesl proximity                In this study, the template image dataset
rule utilizes the neighboring properties of pixels.       collection stage was started by downloading the
The pixels that are connected say basically have          Captcha image from the ibank.bni.co.id test page.
adjacency properties with each other because they         Then the image is cut and changed in character size.
still have neighboring relationships. Suppose a           Then the image is converted into a binary image to
symbol √ denotes the pixel intensity value. Let's         get a dataset image that is in the form of a character
just say that value is from the range (0,1). Keep in      template from characters numbers 0 to 9 which will
mind, that images that can be processed using this        later be used as a reference template dataset.
method are binary images. Neighboring must have a                 In this study, the system will be divided into 2
length or distance of 1 unit (directly between pixels     main stages, the first is the pre-processing of the
with pixels without any intermediate) [4] [5].            initial data, character segmentation and labeling and
                                                          the last is character recognition using the Template
EXPERIMENTAL METHOD                                       Matching algorithm to recognize the pattern of
                                                          Captcha character images. System design can be
       The problem is due to the absence of a system      seen in Figure 3.
that can recognize the character of the Captcha
number, in the e-banking application the user is          The following is the flowchart of each stage:
required to enter the Captcha character number by         1. Pre-Processing
typing through the keyboard, when the user wants to              The Pre-Processing is needed in this study.
enter the Captcha character number by typing              The first step is convert the RGB captcha image
through the keyboard the user can make an error in        from internet banking page to grayscale image.
entering the character of the Captcha number and          Second step is image adjustment, the grayscale
the worst risk is that the Internet Banking account       image is improved by contrast and brightness with
will be blocked. Then a system is needed that can         adjustment. After the image is adjusted, the third
read and recognize Captcha number characters.             step is converting the image to a binary image using
                                                          otsu thresholding. In this step the value of
                                                          thresholding is important to success of this study.
                                                          After obtaining a binary image, then the process
                                                                                                                 3
                                                                                             Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business

goes to the second stage, namely character
recognition for each numbers which are contained in
the image. The flowchart of pre-processing process
                                                                                      Start
can be seen in Figure 4 and the result of pre-
processing step is shown in Figure 5 respectively.

                                                                                 Capthca
                       Start                                                     Images

                                                                                Convert to
                                                                                Grayscale

             Captcha Images
                                                                         Image Adjustment

                                                                          Convert to Binary
                                                                              Image
              Pre-Processing
                                                                              Thresholding

                                                                                   Binary
          Detection Character                                                     Images
            using Template
               Matching
                                                                                   FInish

                                                                     Fig. 4. Flowchart Pre-Processing Process

                      Finish                               2.   The detection process
                                                                At this stage, generally the process image is left
                                                           with only pixel characters. This stage is the final and
                                                           main stage in the system, and the previous stage can
                                                           be called the auxiliary stage or the initial stage only.
                                                           This stage begins with the labeling process, cutting
  Fig. 3. Flowchart sistem pengenalan karakter dengan      pixels for each label, and matching pixel patterns
                  Template Matching.
                                                                                                                     4
                                                                                                 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business

with available datasets. The results of segmentation                While in the character detection step, the
in each character can be seen in Figure 6.                   character segmentation process becomes an
     Furthermore, the matching process with the              important part. Because from the results of this
template matching correlation algorithm by                   segmentation each character will be matched with
comparing images that have been separated will be            database templates number 0 to 9. The advantages
tested with patterns available in the dataset to obtain      obtained from capthca images from BNI bank are
character information. The process is repeated in a
                                                             the number characters contained in them separately,
number of 4 character label indices.
                                                             so the segmentation process has no difficulty in
                                                             getting 4 the character number that is in the captcha
                                                             image.
                                                                    The number of captcha character images
                                                             tested are 100 sample data obtained from the
                                                             download results on the website ibank.bni.co.id
                                                             page. The purpose of this test is to determine the
                                                             success and failure rate of the system which built
                                                             and the most important thing to measure accuracy,
                                                             and later conclusions can be drawn according to
                                                             observations from the test. To see examples of test
                                                             results can be seen in Table 1.
            Fig. 5. Pre-Processing Process

                                                                   (a) Threshold 0.1                  (b) Threshold 0.2

                                                                   (c) Threshold 0.3                  (d) Threshold 0.4
      Fig. 6. Segmentation Labeled of Each Characters.

RESULTS AND DISCUSSION                                            (e) Threshold 0.5                  (f) Threshold 0.6

       This section will explain the testing of digital
image processing applications to detect captcha
image characters on internet banking using the
                                                                  (g) Threshold 0.7                  (h) Threshold 0.8
Template Matching Correlation method. Testing is
done by looking for a value that is close to or even         Fig. 7. Otsu threshold from 0.1 until 0.8 values respectively, 0.4
the same for the match between the pixel value of                               is the best threshold value
the image of the test data and the template pixel
                                                                      Shown in Figure 7 that for the 0.1 threshold
image that has been previously provided by using             value it produces a poor binary image. Each
the Template Matching Correlation Algorithm.                 character there are missing pixels, so that the formed
       In the pre-processing step, the process of            number is not perfect, this will affect the process of
converting grayscale images into binary images with          segmentation and character detection. Another case
the otsu method is done by trial and error to get the        with a threshold value of 0.2 and 0.3. Character
best character results. If it's too thick, the characters    number looks thinner than the original number, this
can’t match the template, and vice versa. From the           does not affect the segmentation process, but it
results of the search trials the best threshold value        affects the character detection process.
obtained is 0.4. Contrast with other threshold values                 Whereas for the threshold values of 0.5 and
that are close to are 0.1, 0.2, 0.3, 0.4, 0.5, and 0.8       0.6, it can be seen that the character number looks
can be seen in Figure 7.                                     fatter than the original image, this does not affect
                                                             the segmentation process but will affect the error
                                                                                                                          5
                                                                                                      Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business

detection process. For the last value thresholds are            Evaluation for the results of accuracy is stated in the
0.7 and 0.8, it can be seen that there are additional           following equation:
pixels in the character number area, this will result
in errors in the process of segmentation and
character detection. So the most appropriate
threshold value in this study is 0.4.                               =   100
                                                                        100
     Table 1. Result Character Detection of 10 captcha images       =   100%

No           Image           Read      Correct       Failed     Based on the results of the accuracy evaluation
 1                           1002          4             0      above, we can conduct that the level of accuracy
                                                                with 100 test capthca images is 100%.
 2                           5772          4             0

 3                           1437          4             0      CONCLUSION
 4                           7418          4             0             Based on the research that has been done, it
                                                                can be concluded that the Template Matching
 5                           2349          4             0      Correlation method can be used to detect the
                                                                character numbers in captcha image from Internet
 6                           1626          4             0      Banking of BNI Bank by comparing the same
 7                           2689          4             0
                                                                number of pixels between the reference dataset
                                                                template and the input template.
 8                           1234          4             0             The suggestions from this study are that it is
                                                                expected that the processed character images not
 9                           3331          4             0      only come from the ibank.bni.co.id test page, but
                                                                also from other test pages such as bri, bca and other
10                           5069          4             0      test pages. This research can be developed by
                                                                combining with the grabbing method for data
                                                                collections, so that it can be used as an automatic
                                                                system for reading account mutations by MSME
            Table 2. Confusion matrix Evaluasi Sistem
                                                                players in Indonesia.

                                      Result Detection
                                                                ACKNOWLEDGMENT
     Confusion Matrix
                                Correct Char        Failed              Thank you as much as possible to the
                                                                Ministry of Research, Technology and Higher
              Readed as              TP                 FN      Education of Indoensia, for the opportunity that
Original

               Number                100                 0      given to the author, because it was trusted to
 Class

                                                                research this topic with the Penelitian Dosen
            Failed Read as           FP                 TN
                                                                Pemula scheme.
               Number                 0                  0

          From the 10 examples of the results of the            REFERENCES
trial, then the test continued for all test data. From
the whole test the system can recognize numbers in              [1] Bayu S., Kurniawan. 2016. Aplikasi Pengenalan
                                                                    Citra Nomor Kendaraan Bermotor Menggunakan
captcha images properly. It caused that the number
                                                                    Metode Template Matching, Tugas Akhir Teknik
characters in each captcha images are not contain                   Informatika Universitas Sam Ratulangi Manado.
overlap pixels area. Each number characters in the
                                                                [2] Budi Agus R, Aspek Hukum Internet Banking,
image can be well identified correctly, as shown in                 (jakarta:PT.Raja Grafindo Persada, 2005).
Table 2.

                                                                                                                      6
                                                                                                  Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business

[3] Dani Rohpandi 2010, Aplikasi Pengenalan Citra         [17] Gonzales, R.C.; Woods, R.E; Eddins, S.L. 2004.
    Dalam Huruf Ngalena Menggunakan Matlab                     Digital Image Processing Using MATLAB. Pearson
    STMIK Tasik Malaya.                                        LPE.
[4] Hussain R., Gao Hui, Ahmed S. R., Parveen S. S.       [18] Gonzalez, R.C.; Woods, R.E. 2002. Digital Image
    2016. “Recognition Based Segmentation of                   Processing. Prentice Hall.
    Connected Characters in Text Based,” 8th IEEE         [19] Jahne, B. 2002. Digital Image Processing. Berlin:
    International Conference on Communication                  Springer-Verlag.
    Software and Networks.
[5] Raden, S.B., Irfan M., 2012. Perbandingan
    Algoritma Template Matching dan Feature
    Extraksion Pada Optical Character Recognition.
    Fakultas Teknik dan Ikmu Komputer Indonesia Jln.      AUTHOR’S PROFILE
    Dipati Ukur No. 112-116 Bandung.
                                                          Deni Sutaji was born in Gresik on October 11th,
[6] Kusuma, W.A, Sutaji, D. 2017. “Segmentasi             1984. He earned his Master’s degree in Informatics from
    Pembuluh Darah Pada Citra Retina Menggunakan          Institut Teknologi Sepuluh November Surabaya on
    Multi-Scale Line Detector (MSLD) dan Adatptive        October 2016. He is currently works as a lecturer at
    Morphology,” Jurnal Register, vol. 3, pp. 49–56.      Muhammadiyah Gresik University.
[7] Kusumanto R.D. 2011, Pengolahan Citra digital
    Untuk Deteksi Obyek Menggunakan Pengolahan
    Warna Model RGB. Jurusan Teknik Komputer
    Politeknik Negeri Sriwijaya Palembang.
[8] Prasetyo, E. 2011. Pengolahan Citra Digital dan
    Aplikasinya Menggunakan Matlab. Yogyakarta:
    Andi Publisher.
[9] Hartanto, S., Sugianto, A., dan Endah, S.N. 2014.
    Optical Character Recognition Menggunakan
    Algoritma Template Matching Correlation. Jurnal
    Masyarakat Informatika, Vol.5 No.9, pp. 1-12.
[10] Sutaji D., Fatichah C., dan Adni, N.D. 2016.
     Segmentasi Pembuluh Darah Retina Pada Citra
     Fundus Menggunakan Gradient Based Adaptive
     Thresholding Dan Region Growing. Jurnal Register,
     vol. 2, pp. 105–116.
[11] Sutaji D., Husenti N. 2018. Deteksi Karakter Pada
     Citra    Captcha      Login   Internet   Banking
     Menggunakan Template Matching. Prosiding SNTE
     2018, vol. 4, pp. 37–40.
[12] Louis V.A., Manual B., and John L. 2004. Telling
     Humans and Computers Apart Automatically.
     Comm. Of the ACM, 47(2):57-60.
[13] Ye Wang and Mi Lu, “A self-adaptive algorithm to
     defeat text-based CAPTCHA,” IEEE International
     Conference on Industrial Technology (ICIT), 2016.
[14]    Kerjasama LPPI dengan Bank Indonesia. 2015.
       Profil Bisnis Usaha Mikro, Kecil dan Menengah
       (UMKM).
[15] Sakkatos P, Theerayut W, Nuttapol V and Surapong
     P 2014 Analysis of text-based CAPTCHA images
     using template matching correlation technique
     JICTEE 2014 - 4th Jt. Int. Conf. Inf. Commun.
     Technol. Electron. Electr. Eng. 5–9
[16] Zou H, Zhang B, Tao Z and Wang X 2016 A Finger
     Vein Identification Method Based on Template
     Matching J. Phys. Conf. Ser. 680

                                                                                                                7
                                                                                            Copyright © 2019 IJTB
You can also read