PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY

Page created by Tracy Gomez
 
CONTINUE READING
International Journal of Computer Engineering and Applications,
              Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469

      PLAGIARISM DETECTION TECHNIQUES AND LITERATURE
                          SURVEY
                      Kapil Vilasrao Gawande1, Dr. Piyush Pratap Singh2
1
    Center of Informatics and Language Engineering, Mahatma Gandhi Antarrashtriya Hindi
                        VishwaVidyalaya, Wardha, Maharashtra, India.
         2
           Associate Professor, School of Computer & System Science, Jawaharlal Nehru
                                   University, New Delhi, India.

ABSTRACT:
          Literature is an intellectual knowledge and new arguments are being made for the theft of that
          literature. New technical tools are being created for this problem. It’s said that “Money can be
          stolen, some goods can be stolen but knowledge cannot be stolen”. But by stealing the same
          knowledge as literature, the theft of the same literature begins with writing on paper. In recent
          years, many online tools have been able to identify potential plagiarism in research areas. In
          this paper major contents are the dimensions and techniques of plagiarism, NLP problems of
          plagiarism identifier, problem of sentences.

Keywords: Plagiarism detection, curevin , Plagiarism of Code, NLP Methodology ,Text Similarity

     INTRODUCTION
           Plagiarism is a challenging task for publishers, researchers, universities and educational
    institutions. To call another person's words, thoughts, is plagiarism, whether it is written text,
    audio/video music, or a picture. So far, plagiarism has been defined in various dictionaries.
    [13]But according to the Oxford Dictionary, plagiarism is defined as “Plagiarism is presenting
    someone else’s work or ideas as your own, with or without their consent, by incorporating it into
    your work without full acknowledgement. All published and unpublished material, whether in

                             Kapil V. gawande and Piyush Pratap Singh                                    1
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY

 manuscript, printed or electronic form, is covered under this definition.” It is plagiarism to use
 others texts, pictures, audios and videos without permission. If someone uses the literature of
 others, he has to mention and give credit as a Citation and references in his literature, from whose
 literature has been recited or taken Or if someone translates another language to include it in their
 text, images, audio, or video, it is also considered plagiarism. 1Earlier any text was taken like
 that. But in the present time, by stealing a text or paraphrasing, they commit plagiarism.
        There are many types of Plagiarism, the basic and commonly user Text Similarity
 plagiarism. Under text similarity plagiarism there is a Copy-paste text similarity, code similarity
 and Translation Similarity. For example student makes uses of English text translate to Hindi text
 Language copying by other’s works for their assignments to get more marks with no efforts.

REVIREW
  1. Plagiarism in Document
  2. Plagiarism in Code
  3. Citation and References
  4. Plagiarism Methodology
  5. Survey of Papers

           1. PLAGIARISM IN DOCUMENT
   There are two types to check the plagiarism in the Documents.
   1. Web embedded System
   2. Stand-alone System

       1.1 Web embedded System
       [2]Web enabled systems are more commonly used because they make their search for
       playarized resources easier on the World Wide Web and are more reliable. It is found as
       two types of system,

       * curvein

       Intelligent Identification System in which the presented document is compared to the works
       of a previous student and other international databases to ascertain whether it is a literary
       document.

       *Secure identity
       Search or check the paper presented with the following database,
               i)      Internet Database
               ii)     Document already published
               iii)    Data Warehouse or Global Database

       Examples: Plagiarism Checker by Grammarly, Quetexrt: Plagiarism Checker, CopyScape,
       ProWritingAid, Copyleaks, etc

       1.2 Stand-alone Systems:
       This is a system that needs to be installed in the computer. There are two types of it which
       are as follows-

                          Kapil V. gawande and Piyush Pratap Singh                                  2
International Journal of Computer Engineering and Applications,
      Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469

  *Verification System:
  [2]This system works only when connected to the Internet. It searches by searching the
  Internet to match the sentences in the query document with suspicious websites.

  *CopyFind:
  [2] This system works to detect plagiarism of documents between two or more documents.
  Examples: Plagiarism checker X, Turnitin, AntiPlagiarism, ect.

  1.3 Plagiarism text Similarity
      1.3.1 Lexical text Similarity:
  What is the difference between the words of the two sentences taken in the text? The words
  rat and cat below were spelled differently.
  [14]Eg. The cat ate the mouse.
       The mouse ate the cat food.

      1.3.2 Semantic text Similarity:
  How much difference there is in terms of meaning in the sentences of both given text is the
  similarity of the mean text?
  Eg. Modiji declared the lockdown on 22 March.
      The Prime Minister of India declared Lockdown on 22 March.

      1.3.3 Monolingual :
  [3]In the context of plagiarism, stealing text from a homogeneous document, such as - from
  Hindi text to Hindi text without reference or wrong reference.

      1.3.4 Crosslingual:
  [3]In the context of plagiarism, stealing text from a document in a different language, such
  as - from Hindi text to English text without reference or incorrect reference.

2 PLAGIARISM OF CODE:
  [15]C#, Java, Python, HTML, XML, GO, C, C++, Javascript, Swift, Ruby, PHP, Perl, Scala
  and many more programming languages available to learn. But a variety of approaches
  have been introduced to detect common logic and code to source code written with C, C
  ++, JAVA, C#, or .Net. Programming is the language of the future. Therefore, it attracts
  this language to more students every year. With more and more students learning to code, a
  growing number are finding themselves with plagiarism allegations.
  [3]Code plagiarism can be investigated as follows-

  Level 0 - Basic Program without Modifications
  Level 1 - Only comments are changed
  Level 2 - Replaces the identifier name
  Level 3 - Change in the position of the variable
  Level 4 - Change Constant and Work
  Level 5 - The loops are replaced in this level program
  Level 6 - Control structures are transformed into a uniform form using different control
  structures

3 Citation and References

                    Kapil V. gawande and Piyush Pratap Singh                                3
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY

 It is important to check the citations and references given in the literature. This shows how
 accurately the litterateur has obtained the information or not, and from where the
 information has been obtained.
 It can be searched from the Internet and from the corpus.
 1. Most documents are available on the Internet and many e-libraries which can be used for
 checking citations and references.
 2. Books can be checked from the Corpus Data Warehouse in a stand-alone system.

4 PLAGIARISM DETECTION METHODOLOGY:
 So many Plagiarism Detection tools have been made for plagiarism and many techniques
 have been used. But the text is still based on pre-processing NLP methods.
           4.1.1 Pre-Processing and NLP methodology.
               i. Tokenization
               A document is distinguished by breaking it into tokens or words where a token
               is a unit of the document that can be used.
                ii. Stop word remover
               Stop words are words that have no meaning in themselves. They are used in
               languages to give a structure to a sentence. They can be removed from the split
               method without affecting the accuracy of parity.
               iii. Lemmatization
               Words can have different forms, which are formed as a result of adding
               suffixes and prefixes to the original forms of words. These suffixes can be
               removed by lamination. Thus different forms of the same word are reduced by
               the same word.
                iv. Stemming
               The original words are used to transform their meanings by applying the
               preceding Prefix or suffix. Steaming is the process of searching the root word
               from such word.
               Example: रे लगाड़ी= रे ल+गाड़ी= रे ल= prefix, गाड़ी =root word
                v. Synonym Replacement
               A litterateur never wants to be caught or searched. So they can either insert or
               delete parts of a sentence or simply paraphrase it. At this stage a word and all
               its synonyms are detected, thus the algorithm can be detected paraphrasing.

 4.1.2 Document frequency Comparison method
     [1]The vector space model is a generic model, often applied to information retrieval,
     translation, or other textual process tasks. To detect plagiarism, the vector space model
     can be viewed as a global similarity measurement method. Sentences extracted from
     suspect and source documents are seen as groups that are mutually independent. Using
     the vector space model, the frequency of the text is derived and then it can be matched
     to other text frequencies. Frequency is measured between 0 and 1.

 4.1.3 Multinomial Naive Bayes
     [1] Naive Bayes Classifier This is suitable for pattern recognition that can be used to
     detect plagiarism. When “S” it be a sentence, t1, t2,t3… tn have cautious results on many
     features displayed by the word.
     Apply Condition of Bayes theorem:
     P(S/t1,t2,t3......tn)=

                       Kapil V. gawande and Piyush Pratap Singh                              4
International Journal of Computer Engineering and Applications,
            Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469

            Apply Conditional Probability:
            P(S/t1,t2,t3......tn)=P(S).P(t1,t2,t3.....tn)/S.

5    SURVEY OF PAPERS:

         SR.NO                                  DESCRIPTION OF PAPER
           1         [2]Research paper written by Prasanth.S, Rajshree.R and Saravana Balaji.B
                     entitled "A Survey on Plagiarism Detection" is written about the type,
                     technique of text similarity in plagiarism.
            2        [4]Research paper written by Vítor T. Martins, Daniela Fonte, Pedro Rangel
                     Henriques, and Daniela da Cruz entitled "Plagiarism Detection: A Tool Survey
                     and Comparison" shows the tools and their comparison and accuracy.
            3        [5]A research paper written by Ali Bukar Maina, Mahmoud Bukar Maina and
                     SuleimanSalihu Jauro titled "PLAGIARISM: A PERSPECTIVE FROM A
                     CASE OF A NORTHERN NIGERIAN UNIVERSITY" surveyed work on
                     plagiarism.
            4        [6]Research paper 2012 written by A. S. Bin-Habtoor and M. A. Zaher entitled
                     "A Survey on Plagiarism Detection Systems" surveyed the tools and techniques
                     on plagiarism.
            5        [3]A paper written by Hussain A Chowdhury and Dhruba K Bhattacharyya
                     entitled "Plagiarism: Taxonomy, Tools and Detection Techniques" discusses
                     plagiarism like - Types, Plagiarism Detection Method, NLP Related Problems,
                     Techniques and Tools.
            6        [7]Research paper written by Yuehong (Helen) ZHANG and Xiaoyan JIA titled
                     "A survey on the use of CrossCheck for The survey conducted in detecting
                     plagiarism in journal articles "has re-examined and showed its results.
            7        [8]P.Rubini & Ms. Research paper written by S.Leela entitled "A SURVEY
                     ON PLAGIARISM DETECTION IN TEXT MINING" has illustrated the
                     techniques of plagiarism and identification.
            8        [1]Research paper written by Harshall Lamba and Sharvari Govilkar entitled
                     "A Survey on Plagiarism Detection Techniques for Indian Regional Languages
                     "introduces plagiarism, plagiarism techniques -Candidate Document Retrieval,
                     Document Comparison Techniques, Multinomial Naïve Bayes, Semantic Role
                     Labeling, Fingerprinting based Plagiarism Detection, Latent Semantic Analysis
                     (LSA) and Fuzzy Semantic Similarity Techniques. Written and explained.
            9        [12]Research paper written by Jens Lykkesfeldt entitled "Strategies for Using
                     Plagiarism Software in the Screening of Incoming Journal Manuscripts:
                     Recommendations Based on a Recent Literature Survey "surveyed the software
                     screening and showed their results.
           10        [11]A research paper written by Hermann Maurer, Frank Kappe and Bilal Zaka
                     entitled "Plagiarism - A Survey" introduces plagiarism, tools of text similarity
                     and their technique.

    [6] CONCLUSION
A survey has been conducted about plagiarism techniques and text-related difficulties. NLP depicts
the problem encountered in word and sentence. Prevention of plagiarism requires new algorithms so
that new knowledge and research can be done and the theft can be curbed.

                              Kapil V. gawande and Piyush Pratap Singh                             5
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY

REFERENCES
    [1]    Harshall Lamba, Sharvari Govilkar, 4,April 2017. “A Survey on Plagiarism Detection
           Techniques for Indian Regional Languages” Vol. 164, International Journal of Computer
           Applications, pp.44-50.

    [2]    Prasanth.S, Rajshree.R,Saravana Balaji.B, 19,January 2014. “A Survey on Plagiarism
           Detection” Vol. 86,International Journal of Computer Applications, pp.21-23.

    [3]    Hussain A Chowdhury,Dhruba K Bhattacharyya, “Plagiarism: taxonomy, Tools and Detection
           Techniques” arxiv.org (5, Feb 2021)

    [4]    Vitor T. Martins, Daniela Fonte, Pedro Rangel Henriques, and Daniela de Cruz “Plagiarism
           Detection: A Tool Survey and Comparison”OASICS, Dangstuhl Publishing,Germany pp.143-
           158

    [5]    Ali Bukar Maina, Mahmoud Bukar Maina and SuleimanSalihu Jauro, December, 2014.
           “Plagiarism: A Perspective From A Case Of A Northern Nigerian University” Vol. 1,
           IJIRR,Issue 12, pp.225-230.

    [6]    A.S. Bin-Habtoor, M. A. Zaaher, April 2012. “A Survey on Plagiarism Detection Systems”
           Vol. 4, No. 2,International Journal of Computer Theory and Engineering ,pp 185-188.

    [7]     Yuehong(Helen) Zhang and Xiaoyan Jia, OCTOBER 2012. “A survey on the use of
           CrossCheck for Detecting Plagiarism in journal articles”, Vol. 25 , No. 4, Learned Publishing,
           5:292-307.

    [8]     P. Rubini, Ms. S.Leela , December 2013. “A Survey on Plagiarism Detection In Text Mining”,
           Vol.1, International Journal of research in computer applications and robotics, Issue 9, pp. 117-
           119.

    [9]    Martin Potthast, Andreas Eiselt, Alberto Barron-Cedeno, “Overview of the 3rd International
           Competition on Plagiarism Detection”, PAN (webis.de) (18, April 2021)

    [10]    25th ANNUAL CONFERENCE OF THE SPANISH SOCIETY FOR NATURAL
           LANGUAGE PROCESSING, SEPLN 2009, 3rd PAN Workshop, Uncovering plagiarism,
           authorship and social software misuse.

    [11]    Hermann Maurer, Frank Kappe, Bilal Zaka, 25, Aug 2006. “Plagiarism-A survey”, Vol 12,
           no.8, Journal of Universal Computer Science, pp. 1050-1084.

    [12]    Jens Lykkesfeldt, February,2016. “Strategies for Using Plagiarism Software in the Screening
           of Incoming Journal Manuscripts: Recommendations Based on Recent Literature Survey”,
           BCPT, pp.161-164.

    [13]
              https://www.ox.ac.uk/students/academic/guidance/skills/plagiarism#:~:text=Plagiarism%20
           is%20presenting%20someone%20else's,is%20covered%20under%20this%20definition             (06
           ,April 2021)

                          Kapil V. gawande and Piyush Pratap Singh                                        6
International Journal of Computer Engineering and Applications,
       Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469

[14]   https://kavita-ganesan.com/what-is-text-similarity/#.YHrCTa8zZPY (17, April 2021)

[15]   https://copyleaks.com/code-plagiarism-checker (20,April 2021)

                     Kapil V. gawande and Piyush Pratap Singh                              7
You can also read