SIMILARITY CHECK BY CONCEPT RELEVANCE (SCCR): PLAGIARISM DETECTION IN TEXT DOCUMENTS

Page created by Samantha Fowler
 
CONTINUE READING
International Journal of Pure and Applied Mathematics
Volume 119 No. 15 2018, 1953-1967
ISSN: 1314-3395 (on-line version)
url: http://www.acadpubl.eu/hub/
Special Issue
                                                                                                  http://www.acadpubl.eu/hub/

                  SIMILARITY CHECK BY CONCEPT RELEVANCE (SCCR):
                     PLAGIARISM DETECTION IN TEXT DOCUMENTS
                            1
                               DURGA BHAVANI DASARI, 2Dr. VENU GOPALA RAO.K
            1
                Assistant Professor, Dept of CSE, Konerulakshmaiah Education Foundation, Vaddeswaram,
                                                     Guntur, AP, India.
                                              1
                                                Email:bhavani.dd@gmail.com.
            2
                Professor, Dept of CSE, G. Narayanamma Institute of Technology and Science, Hyderabad,
                                                         India.
                                            2
                                              Email:Kvgrao1234@gmail.com.
         Abstract: Numerous models of plagiarism detection exist and majority of them are based on
         sophisticated level of text analysis methods like the finger printing, style comparison or the
         string matching. In this paper, a contemporary method called Similarity Check by Concept
         Relevance to detect text data plagiarism is proposed, which identifies substantial amounts of
         plagiarism using the analysis of conceptual relevance. The solution aimed at identifying similar
         and plagiarized set of documents using the conceptual relevance of text. Experimental study
         reveals that the proposal is significant to consider as an optimal solution for handling likewise
         manuscripts and plagiarized manuscripts by concept relevance.
         Keywords:Concept Relevance, Paraphrasing, Plagiarism detection, similarity check

                                                 1   Introduction
              Foray of internet, in particular, mobile broadband services have brought in a paradigm shift
         in the information is obtained, with the websites being the most prominent information sources
         [1], [2]. On the other hand, the rise in internet accessibility has resulted in lowered academic
         integrity, more specifically with respect to plagiarized content [3], often described as using
         someone else‟s content and branding it as self-produced content. Similarly, in the context of text,
         we can observe that copying other‟s work and not providing adequate citation reference is
         termed as „text-plagiarism‟. In the academic context, it is termed as „scholar-plagiarism‟, where
         scholars going to colleges or universities [4] and is committed by students or scholars. This
         forms a major portion of the „text-plagiarism‟. As huge readily accessible content is available,
         students tend to increasingly rely on plagiarism and over the recent past, the trend continues to
         increase steadily. In particular, international scholars often depict such unauthentic information
         copy, with most of the reported cases observed to depend on web for content copying [5]. The
         study in [6] predicts that around 33 percent of publications made by students in schools and
         colleges tend to perform unauthentic copying to some extent. The same conditions are observed
         in Chile. One of the authentic surveys conducted during 2010[7] mentioned that around 55
         percent of mid-school and 42 percent of higher education scholars resorted to some sort of
         plagiarism by ignoring the original author reference. Amid the huge quanta of data sources and
         manuscripts available currently, assessing the true and genuine nature of published works is
         becoming highly complicated. Different search-engine tools are being deployed for assessing
         originality of a work with regard to internet sources. However, the procedure is highly complex
         and involves large labor and costs [4]. On the other hand, human assessment has also turned out
         to be a herculean task involving several days of time. Given the current academic conditions,

                                                     1953
International Journal of Pure and Applied Mathematics                                             Special Issue

      instructors seldom possess adequate time for proper assessment. Further, despite multiple
      strenuous efforts of instructors to restrict students from unauthentic copying, some of them
      continue to do so [8]. In Chile, lack of proper identification mechanism in Spanish further
      worsens the condition, resulting in higher plagiarism incidents being observed.
           Avoiding such copying is vital in academic context at all stages as it impacts scholar‟s skill
      acquisition procedure [9]. Both tutors and institutions detest such copying due to its contradiction
      to educational goals. Accordingly, several instructors have express strong desire for tackling the
      plagiarism issue and enable them to identify copied portions of the work at relative ease[3].
      Given the seriousness of the issue, authors in [10] proposed that institutions must be equipped
      with adequate tools and mechanisms to automatically identify the copied content. Such
      mechanisms are referred in contemporary literature mostly as plagiarism identification engines.
      These tools are programs designed to judge one document against other document or potential
      sources to check for content similarity and thereby detect plagiarized research works [11]. This
      ensures that instructors can detect plagiarized content at ease and within limited time frames.
           Analyzing different studies in contemporary literature regarding unauthorized content
      copying issue in academics depicts that several researchers suggested that this is a group of
      different improper features instead of being a single issue. To address the complicatedness in
      detecting plagiarism, a few researchers suggested multiple kinds of copying, producing sub-
      problems, which can be relatively easy to handle.
           Based on our observation, in the context of academic plagiarism identification, engines are
      expected to equip instructors with group of tools to analyze the text manuscripts proposed
      instead of merely detecting plagiarism issues, thereby handling the issue from different aspects in
      literature. Accordingly, our manuscript provides a mechanism that executes automatic plagiarism
      identification for academic entities through a multi-layer viewpoint. The mechanism, termed
      DOCODE 3.0, assists instructors by providing them a total interface with visual tools to identify,
      comprehendand manage diverse copy stages and scenarios. The proposed mechanism is a
      complete-featured scheme developed on the basis of sound and scalable framework. It deploys
      multiple programs for identification, which are proven as highly efficient in identification. These
      programs also are found to pose superior performance over benchmark schemes in existing
      studies. The outcomes have been assessed in different earlier works and also in multi-national
      plagiarism identification platforms. DOCODE has been developed largely independent of
      language and accordingly, can be implemented in different scenarios despite the scenario of
      development of this research work is restricted to Chile and Spanish.
           The subsequent sections in this research work are organized as below. The section2 provides
      a detailed overview of related studies proposed in contemporary literature along with benchmark
      models and architectures. Further, the next section details the functioning of DOCODE
      mechanism and the services offered by the model. Further, prominent programs included in the
      DOCODE and their functioning are provided in the section. The section 4 depicts the
      organization of the mechanism, focusing on its architecture. The next section details user
      interfaces. The last section presents the final conclusion and the scope of additional study in this
      area.

                                                 1954
International Journal of Pure and Applied Mathematics                                               Special Issue

                                           2   Related Research
           This section presents a small overview of plagiarism, focusing on the prominent definitions
      presented by different studies, benchmark models for automatic identification. Further, the
      section also presents a quick overview of a few prominent copy-identification models.
      2.1 Classification of the problem of plagiarism
             Several researchers have put forward different descriptions of the term- „plagiarism‟. For
      ease of understanding, it can be considered as incorporating concepts, paragraphs, text chunks
      etc., belonging to a different researcher or study [12]. Analyzing different descriptions of the
      word as proposed by different researchers, we can understand that the concept of plagiarism is
      not a single issue but instead a group of diverse improper features. In-depth analysis of these
      descriptions shows that knowingly and unknowingly, researchers have attempted to address the
      plagiarism defining task by classifying it into multiple groups or types. A prominent and early
      research work aiming to describe plagiarism categories can be observed in 1990‟s and is referred
      in [13]. The research work suggests that the concept of plagiarism can assume six different types.
      The study in [14] also focused on these different types of plagiarism. The [13] work also
      mentioned that in education, scholars are likely to copy content with an aim to score better
      gradesand on the other hand, academic students tend to opt for plagiarism to get recognition and
      status. Nevertheless, in either of the scenarios, in case of one document is plagiarized from other
      document, it proposes that both the documents show certain level of intertextuality that cannot be
      recognized if the copied document is separately assessed.
            The study in [15] attempted to describes challenges of student copying and aimed to put
      forward certain descriptions, classifying the kinds of this cheating characteristics associated with
      plagiarism including copy, examinations, cheating and alliance. In [16], researchers observed
      that different approaches for concealing plagiarism in their output publications, irrespective of
      the type of plagiarism practiced. According to our research work, categorization of plagiarism
      into different types is essential for analyzing the potential issues confronting the automated
      identification models. This report relies on the concepts put forward by the researchers in
      [17]and built further in [4], and utilized these classification concepts as the architecture to assess
      the functioning of DOCODE 3.0 with respect to these challenges. Summarizing, the below
      scenarios of plagiarism have been taken into consideration in this research work
         1) „Word for word‟replication: Copy-paste presenting from ae-publication, also involving
            such replications as authorship-plagiarism (adapting some body‟s document and merely
            changing the author information and nothing else in the document).
         2) Rewording: Partly editing the research work in terms of appending few characters or
            words, substituting few words with others and can also include complete removal of
            specific words. Further, willfully including grammatical and spell faults, substituting
            certain words with context related or irrelevant synonyms, reframing the sequence of
            sentences are categorized under this segment. In addition, translated copying is also
            grouped in this segment.
         3) Relying on Technical behaviors to make use of inherent system flaws: Predominantly,
            including invisible text (having same color as the background) to depict as a blank area
            and inclusion of scanned documents as pictures within a research work, to ensure that the
            scanned picture could not be assessed by systems (as the system considers this as non-
            textual and ignores the same).

                                                  1955
International Journal of Pure and Applied Mathematics                                             Special Issue

         4) Willful and wrong citation of author information: The students present citations to the
            work but often these citations are not found because they are virtual, wrong citations
            (citations exist but are irrelevant to the context) and expired citations (sources which are
            no longer active or expired).
         We presume that the above four classifications are adequate for handling the plagiarism
      through the proposed automated identification tool due to the fact that all the categories
      mentioned above pertain to different specific problems. Though these classifications are
      considered in this manuscript, it cannot be stated that only this classification is accurate and the
      most complete classification available in literature.
         The grouping is considered only for the purpose of analyzing the functioning of our
      automated detection model, because the model should be capable of handling most of the
      proposed classifications, irrespective of these classifications being unequally challenging [16].
      2.2 AutomatedIdentification of Copied Content
            Several studies have been put forward on accurately identifying plagiarism in an automated
      procedure to save time and efforts involved in manual detection. One of the recent studies,
      depicted in[18] , regards the plagiarism as merely reusing other‟s research and act as if it is his/
      her authentic work. With regard to this scenario, the study in [19] presents that studies on
      plagiarism often place it on a level with detection of largely likewise paragraphs in text
      documents.
             These researchers also presumed that the current analysis of plagiarism is not capable of
      depicting the entire scenario and accordingly, attempted to categorize the identification problem
      into 2 main subgroups- external and internal identification. In case of external identification, the
      authors believed that the original work for a copied work can be observed in a folder or storage.
      In case of intrinsic identification, the identification model aims to detect copied paragraphs only
      on the basis of data obtained from copied work [20]. Our research work also finds the
      classification of plagiarism into internal or external as most supporting due to the fact that both
      the approaches involve different sub-tasks, which are utilized for describing the services offered
      by the automated identification mechanism. In addition, a few of these challenges can be
      associated with different stages of plagiarism as presented in followingsection.
      2.3 Mechanisms for plagiarism Identification
            The focus on automatic plagiarism identification is not merely on educational
      environment. Instead, the focus is extending to different commercial applications. Multiple paid
      mechanisms are being offered in the market. All these approaches vary with each other but on a
      top level, these approaches can be classified as hermetic and Web based plagiarism detection.
      While the web-based identification approaches aim to detect similarities for the duplicate text
      over a wide range of internet sources, the hermetic approaches aim to detect plagiarism by
      comparing with a set of works stored locally[16]. Further, a few of the prevailing tools like
      Turnitin are accessible online while others are downloadable and the programs run on the
      systems. A brief overview of some of the prominent such tools is provided below-
         •   Turnitin3:a paid application for identifying plagiarism in an uploaded document. The
             tool constantly updates its source document datasets and compares the uploaded
             document against these locally available documents. The application has around 100
             million research publications, 12 million website pages along with journals and papers.

                                                 1956
International Journal of Pure and Applied Mathematics                                             Special Issue

         •   EVE24:the commercial application which browses the internet for potential original
             publications of the uploaded research paper. The site outputs the related website links and
             presents a complete report.
         •   PlagiarismDetect.com5:This application also functions like EVE24 and browses internet
             for potential matches to the uploaded document.
         •   Glatt-Services6:This application comprises of three segments. Segment one is an
             introduction/training session to assist scholars about the plagiarism types and possible
             ways to evade it. The next segment is a screening segment to identify the copied content
             in the uploaded files and the next segment is also a screening segment for identifying
             unintentional scenarios of text copy.
         •   Ephorus7:This is also a paid application for identifying plagiarism, which involves
             assistance for assessing the accuracy of citations. It also enables the students or
             instructors to identify the cited references as reflected during presenting academic works.
             This application is being merged with Turntin.
         •   WCopyfind8: A free application built on windows operating system, to compare
             different files and presents similarities in terms of copied sentences and phrases.
         To date, multiple research works [21], [9], [22], [4], [8] evaluated these models and compared
      them based on diverse metrics. However, despite the comparison, most of the aforementioned
      paid approaches hide their programs and functioning methodology. Accordingly, the student/
      instructor does not gain these details and can face a key barrier in developing knowledge on how
      the outcome is generated and presented thereby making it virtually impossible to get actual
      insights on the uploaded files. Further, it also complicates the process of assessing the efficiency
      of these models. In this context, DOCODE approach proposed in this paper differs from the
      aforementioned tools in that its inbuilt programs and their functioning is widely known by
      scientists and the clients. This proposed manuscript is made freely available for communities
      interested in the research.

                            3   Similarity Check by Concept Relevance
      The proposed model compares the target document with one or more given source documents.
      The proposal is an unsupervised learning model; hence the features and their optimality should
      be defined from the source documents. The overall process (see table 1) of the proposal is
      explored following:
      Table 1:       Main Process

                                                 1957
International Journal of Pure and Applied Mathematics                                               Special Issue

      Main Process
      Inputs:
                TDC (source documents set)
                wst (word sequence length threshold)
         Begin
             Let the two-dimension word vector dwv that engaged each row with vector of words
             obtained from each of the document belongs to given documents set TDC .
                pdwv  preprocess(dwv)
                fas  findFAS (wst , pdwv) // fas represents feature attributes (set of words in sequence of
                size wst ) set
                cofs  findCOFS ( fas, pdws)
                // The frequent itemsets mining that performing step 5 can be done through any of the
                contemporary models like éclat [23], or fpgrowth [24].
                Find Similarity Between suspect document and source documents
         End

      Let TDC be the set of source documents to be used to perform similarity check.Initially data
      preprocessing step (see table 2) will be applied on source documents to obtain processed
      documents as word vector matrix pdwv .

      Table 2           Data Preprocessing

                                                   1958
International Journal of Pure and Applied Mathematics                                          Special Issue

      Preprocessing
         preprocess(dwv) Begin
               Set pdwv  
               For each row dr of dwv Begin
                  Set pdr  
                  Remove non-English characters from dr
                  Trim leading and trailing spaces of each word of dr
                  For each word w of dr Begin
                     if (w  sws) then remove w from dr //here sws is stop words set.
                     else Begin
                       Apply stemming process on w and add w to pdr
                       Add pdr to pdwv
                     End
                  End
               End
               Return pdwv
         End

               The preprocessing phase extracts words from each document and forms a row in a 2-
      dimensionalword vector pdwv , then removes stop words and noise (special characters). Further
      perform stemming on left over words of each vector in pdwv . Next the word sequences of size
       wst will be considered as feature attributes from each row of the2-dimensional vector pdwv and
      forms set of feature attributes fas with no duplicate elements.
               A word sequence is set of words with size wst appear in any row of pdwv in
      sequence.Then co-occurrence feature sets cofs (see table 3) such that each feature of feature set
      { fsfs  cofs} is belong to fas will be formed and size of each set { fsfs  cofs} can vary.

      Table 3          Finding Concepts

                                                    1959
International Journal of Pure and Applied Mathematics                                                     Special Issue

      Finding concepts
                findFAS (wst , pdwv) Begin
                     Set fas  
                     For each row dr of the pdwv Begin
                        For each word w of dr Begin
                          if ((index _ of (w)  wst )  size _ of (dr )) Begin
                             fas  word sequence of size wst begins from index _ of (w)
                          End
                        End
                     End
                     Return fas
              End

          Each of the co-occurrence feature set will be referred further as concept, which are framed as
      follows
         a. Initially one size feature sets will be formed and moved to cofs
         b. And then two to max possible size co-occurrence feature sets will formed and moved to
              cofs
         c. Then prunes co-occurrence feature sets as follows
           i.  If { fsi fsi  cofs} , fsi  { fs j fs j  cofs} and co-occurrence frequency of fsi is identically
               equals to co-occurrence frequency of fs j then fsi will be pruned from cofs
          The co-occurrence feature sets of cofs are further sorted in descending order of length, and if
      length of these features is similar, then they will be sorted in descending order of their frequency.
      Further it performs similarity check (see table 4) of the input document with source documents
      as follows:
         a. For each document {d d  TDC}
            i. Choose row wvd from the pdwv that represents document d
           ii. For each concept from cofs
          iii. Find similarity score ss(wvd , c) between suspect document and source document as
               follows:
               a) Find the support of each concept in suspect document and source document
                   respectively, find the ratio of the concepts common to both documents against
                   concepts exists in suspect document.
      Table 4          Finding Similarity

                                                         1960
International Journal of Pure and Applied Mathematics                                         Special Issue

      Finding Similarity
                 Begin
                     For each source document d in TDC Begin
                        Select row dr from pdwv that represents d
                        Prepare word vector tr from target
                        document t
                                 |cofs |
                                         1, if (ci  dr ci  cofs) 
                           sfc                                    
                                  i 1 0                            
                                   |cofs|
                           sf   ci , if (ci  dr ci  cofs )
                                    i 1

                                      1, if (ci  tr ci  cofs ) 
                                 |cofs |
                           tfc                                  
                                 i 1 0                           
                                  |cofs|
                           tf   ci , if (ci  tr ci  cofs)
                                   i 1

                                                sf  tf
                              ss (dr , tr )               // finding similarity
                                                 | tf |
                              score of source document d and target
                              document t
                           if (s  mst ) Begin
                              Move document d into clusterkcs[i ]
                          End
                          Else Move document d to ncg
                       End
                 End

                                                    4     Empirical Study
          The evaluation of procedure comprises four key steps like applying the depicted model to a)
      input data set preparation a) Corpus preprocessing c) similarity verification between source and
      target documents using proposed model, d) performance analysis by comparative study carried
      on proposed model and other contemporary models.
      4.1 The Training set of documents
          Preprocessing of Corpus: The input corpus comprising 234 documents prior to preprocessing.
      Among these, only 185 documents are significant to use to assess the model proposed. The rest
      of the documents are insignificant due to multiple reasons, such as duplicates, no readable
      content, blank pages, and missing or incomplete references.
      4.2 The test set of Document
      In regard to estimate the performance advantages of the proposed model over other
      contemporary models like writecheck [25], wcopyfinder [26], and docdiff [27], the divergent
      ratio of test document‟s content was rephrased (see table 5), such that each resultant document
      contains content from single source of test document. In regard to this 25% of these documents

                                                             1961
International Journal of Pure and Applied Mathematics                                             Special Issue

      were manually rephrased, and rest 75% of the documents were rephrased with divergent
      computer aided tools called chimp-writer [28], WordAI [29].
      4.3 Performance Analysis
           In order to verify the similarity of the test documents from training documents, he test
      documents were submitted to proposed “concept relevance-based similarity detection”, and other
      contemporary models stated above.
      The obtained ratio of detection similarities is exhibited in table 5.
      Table 5        Results obtained from depiction of similarity (in %) with single source
      Sentences from single source of document rephrased with word jumbling and comprehensive
      writing
      Ratio of sentences
      rephrased (in %)  Concept Relevance               Writecheck           Wcopyfinder      Docdiff
      3                           96.16                  87.08                95.18            89.35
      6                           93.93                  80.06                87.66            89.67
      9                           90.39                  66.94                83.01            86.93
      12                          87.82                  60.01                80.47            81.82
      15                          84.43                  43.02                82.71            79.49
      18                          81.33                  19.91                73.47            71.49
      21                          78.4                   16.98                69.59            69.35
      24                          75.45                  10.07                68.5             73.51
      27                          72.8                   9.01                 72.39            63.6
      30                          69.02                  7.05                 68.38            60.05
      33                          66.22                  4                    65.78            63.96

      The results exhibited in table 5 are visualized in Figure 1. According to these results, its notable
      that the performance of these tools considerably good when the content rephrased at lower ratios.
      However the existing models are fewer optimal to detect the similarity ratio against the
      rephrasing carried more than 6%. In contrast to this, the proposed model is most optimal and
      stable to detect similarity against divergent ratios of the rephrasing.

                                                  1962
International Journal of Pure and Applied Mathematics                                            Special Issue

      Figure 1      Similarity ratios observed against the Sentences from single source of document
      rephrased with word jumbling and comprehensive writing
      Table 6       Results obtained from depiction of similarity (in %) with multiple sources
      Sentences from multiple source of Documents rephrased with word jumbling and comprehensive
      writing
      Ratio of sentences Concept
      rephrased (in %)   relevance              Writecheck        Wcopyfinder          Docdiff
      3                        96.26            87.17             48.23                37.87
      6                        93.41            80.08             46.26                30.48
      9                        90.61            66.97             36.5                 19.95
      12                       86.8             60.05             22.18                17.11
      15                       84.24            43.08             10.87                11.1
      18                       81.29            20                10.02                10.27
      21                       78.97            17.02             9.48                 9.52
      24                       75.69            10.08             8.86                 9.1
      27                       72.51            9.07              8.21                 9.08
      30                       69.31            7.13              7.31                 8.43
      33                       65.9             4.07              6.98                 7.99

      The experiments also carried on the documents that are formed by combining the content from
      multiple sources. These documents consist diversified ratio of content that rephrased. The 25%
      of these documents were rephrased manually, and the rest 75% were rephrased using chimp-
      writer, and WordAI.
      The results obtained from the proposed model and the other models considered were projected in
      Table 6 and the same were visualized in figure 2. The similarity score obtained from the
      contemporary models against the documents rephrased more than 6% were considerably low. In
      contrast to this, the proposed model depicts similarity at higher ratios. This since, the

                                               1963
International Journal of Pure and Applied Mathematics                                            Special Issue

      contemporary models rely on the word sequence of different sizes (minimum 3), and if every
      third word of the document is replaced by a synonym, then these tools find lowest percentage of
      similarity. Nevertheless, the proposed model is not only relying on sequence, it also considers
      the concept projected by the sequence of the words, hence the similarity detection by proposed
      model is at the best.

      Figure 2     Sentences from multiple source of Documents rephrased with word jumbling and
      comprehensive writing

                                              5    Conclusion
          The manuscript depicts that concept relatedcopied content identification approach his higher
      performance over earlier proposed text-drivenapproaches in detecting paraphrasing, language
      conversion and certain idea plagiarism. Once the depicted model reflects the concept-oriented
      plagiarism detection is much significant compared to the text-based plagiarism. The text-based
      models limit to detection of local types of content copyingsuch as small paragraphs of copied
      words. On the other hand, it fails to have a comprehensive outlook of the presentation. But the
      proposed concept relevance depicted from reference-oriented plagiarism detection is robust and
      is able to deliver optimal accuracy for detecting paraphrased and the translated kind of
      plagiarism sets. Applying the process of concept relevance-oriented approach supports in
      identifying 82% of the plagiarized fragments, but in the case of semantic-oriented models, the
      performance is very low in terms of detecting paraphrased or the comprehensive set of
      plagiarism.
          The experimental study revealed that the detection of plagiarism through concept relevance
      assessment delivers better performance that compared to word sequence based approaches. Once
      the depicted model reflects the concept-oriented plagiarism detection is much significant
      compared to the text-based plagiarism. The text-based models limit to detection of paraphrased
      texts. In a contrasting scenario, it fails to have a comprehensive outlook of the presentation. But
      the proposed concept relevance depicted from reference-oriented plagiarism detection is robust
      and is able to deliver optimal accuracy for detecting paraphrased and the translated kind of
      plagiarism sets.

                                                  1964
International Journal of Pure and Applied Mathematics                                          Special Issue

                                                 References
        [1] Velsquez, J. D. (2010). Advanced Techniques in Web Intelligence-1.
        [2] Velsquez, J. D. (2012). Advanced Techniques in Web Intelligence-2. Web User Browsing
            Behaviour and Preference Analysis. Springer Publishing Company, Incorporated.
        [3] Scanlon, P. M. (2002). Internet plagiarism among college students. Journal of College
            Student Development, 374-385.
        [4] Kakkonen, T. a. (2010). Hermetic and web plagiarism detection systems for student
            essays—an evaluation of the state-of-the-art. Journal of Educational Computing Research,
            135-159.
        [5] McCabe, D. (2005). Cheating: Why students do it and how we can help them stop. Guiding
            students from cheating and plagiarism to honesty and integrity: Strategies for change, 237-
            246.
        [6] Posner, R. A. (2007). The little book of plagiarism. Pantheon.
        [7] Molina, F. &. (2011). The digital document plagiarism phenomenon: An analysis of the
            current citation in the chilean educational system. RevistaIngeniería de Sistemas, 5-28.
        [8] Lancaster, T. a. (2005). Classifications of plagiarism detection engines. Innovation in
            Teaching and Learning in Information and Computer Sciences, 1-16.
        [9] Maurer, H. a. (2007). Coping with the copy-paste-syndrome. E-Learn: World Conference
            on E-Learning in Corporate, Government, Healthcare, and Higher Education Association
            for the Advancement of Computing in Education (AACE).
        [10] Fialkoff, F. (1993). There's no excuse for plagiarism. Library Journal, 56-56.
        [11] Clough, P. (2000). Plagiarism in natural and programming languages: an overview of
            current tools and technologies.
        [12] Hanks, P. (1986). Collins dictionary of the English language. London: Collins, c1986, 2nd
            ed., edited by Hanks, Patrick.
        [13] Martin, B. e. (1994). Plagiarism: a misplaced emphasis. Journal of Information Ethics 3.2,
            36.
        [14] Clough, P. e. (2003). Old and new challenges in automatic plagiarism detection, National
            Plagiarism Advisory Service, 2003; Retrieved from http://ir. shef. ac. uk/cloughie/index.
            html.
        [15] Ashworth, P. e. (1997). Guilty in whose eyes? University students' perceptions of cheating
            and plagiarism in academic work and assessment. Studies in higher education, 187-203.
        [16] Mozgovoy, M. T. (2010). Automatic student plagiarism detection: future perspectives.
            Journal of Educational Computing Research, 511-531.
        [17] Maurer, H. A. (2006). Plagiarism-a survey. 1050-1084.
        [18] Potthast, M. e. (2012). Overview of the 4th International Competition on Plagiarism
            Detection. CLEF.
        [19] Corpus, P. (2009). Overview of the 1st International Competition on Plagiarism
            Detection.
        [20] Zu Eissen, S. M. (2006). Intrinsic Plagiarism Detection. ECIR.
        [21] Lukashenko, R. V. (2007). Computer-based plagiarism detection methods and tools: an
            overview. Proceedings of the 2007 international conference on Computer systems and
            technologies. ACM.
        [22] Bull, J. e. (2000). Technical review of plagiarism detection software report.
        [23] Zaki, M. e. (1997). New Algorithms for Fast Discovery of Association Rules.

                                                1965
International Journal of Pure and Applied Mathematics                                    Special Issue

        [24] Liu, Y. a. (2008). FP-Growth Algorithm for Application in Research of Market Basket
           Analysis. IEEE International Conference on Computational Cybernetics.
        [25] writecheck. (n.d.). Retrieved from http://en.writecheck.com/?gclid.
        [26] wcopyfinder.                      (n.d.).                 Retrieved           from
           http://plagiarism.bloomfieldmedia.com/wordpress/software/wcopyfind/.
        [27] docdiff. (n.d.). Retrieved from https://www.diffchecker.com/.
        [28] (n.d.). Retrieved from https://chimprewriter.com/.
        [29] (n.d.). Retrieved from https://wordai.com/.

                                             1966
1967
1968
You can also read