A Steganography on Synonym Frequency Distribution

Page created by Marshall Curry

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

A Steganography on Synonym Frequency Distribution
1
Xiaoxi Hu, 2 Gang Luo, 3Yongjing Lu, 4Lingyun Xiang
College of Information Science and Engineering, Hunan University,
410082Changsha, China
E-mail: yezi_xiaoxi@yahoo.cn
College of Information Science and Engineering, Hunan University,
410082Changsha, China
E-mail: luog@yahoo.cn
College of Information Science and Engineering, Hunan University,
410082Changsha, China
E-mail: syhd142@hnu.edu.cn
Changsha University of Science and Technology, 410082Changsha, China
E-mail: suhong210@yahoo.com.cn

Abstract
Linguistic steganography is concerned with hiding secret information in natural language text
imperceptibly. One of the major methods for doing so is synonym substitution. Existing schemes hardly
consider the influence on word frequency so that they can’t resist the text steganalysis which bases on
synonym frequency. This study takes the features of word frequency into account, and proposes a novel
method using a combination of synonym substitution and multiple-base. This method mainly consists of
three steps. Firstly, synonym database is to be preprocessed according to synonym frequency. Then,
the secret data will be transformed into the expression of multiple-base expressions. Finally, the
chosen synonym will be used to hide the secret message according to its continuous dimension and the
proposed algorithm. In this paper, continuous dimension is a new concept to be proposed, and it is
important to the embedding and extracting processes. The proposed method possesses superiority to
other related methods in terms of security and robustness. Moreover, the most important goal in this
method is trying to remain the synonym frequency distribution and achieve good imperceptibility. This
paper describes in detail the steganography algorithm that involves the embedding and extracting
processes, and presents its advantages and limitations.

Keywords: Hiding Information, Synonym Frequency, Continuous Dimension

1. Introduction
Steganography is concerned with hiding information using the redundancy of information in cover
medium and keeps the secret information undetectable without destroying the integrity of cover
medium (Gutub and Fattani, 2007). Therefore, many previous information hiding techniques make use
of cover medium with a lot of redundancy, such as images (Chandramouli and Memon, 2001), sound
signals (Gopalan, 2003), video clips (Doerr and Dugelay, 2003), or texts (Bennett, 2004). In recent
years, since the text is the main form of the exchange of knowledge, the text has become one of the
most popular digital media used to transmitting secret messages.
Text linguistic steganography, as one of steganography based on natural language, plays a more and
more significant role in the area of information security today. Researching text linguistic
steganography is an important task. First of all, if the dissemination carrier for the secret information
becomes more common and ordinary, and the probability of discovered by censors will become smaller.
So text is a good choice. Secondly, in the military area, the secret information is condensed, thus the
requirements for the load capacity of the carrier is low, but for security and robustness are very high
and strict. Therefor improving the text linguistic steganography is a significant thing.
The implementations of Linguistic steganography can mainly be classified into two categories. The
first is to directly generate a new natural-like text using the mimicking technique by certain processes
(Chapman, 1997; Chapman et al., 2001; Maher, 1995). For example, NICETEXT (Chapman, 1997;
Chapman et al., 2001), parses a cover text and extracts syntactic patterns, and then uses sentence
models and large dictionaries of words to produce sentence frames. A lexicon of words classified by

Advances in information Sciences and Service Sciences(AISS) 206
Volume5, Number10, May 2013
doi:10.4156/AISS.vol5.issue10.24

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

part of speech tags is collected from the cover text and each word has a given unique binary code. The
embedding process is accomplished by following procedures, first, randomly select a sentence frame,
and then based on the secret message bit string populate each part of speech tag with an appropriate
word from lexicon. The output produced by NICETEXT is syntactically correct but suffers from
serious drawbacks, such as poor readability, hard understandability and weak resistance to the
statistical analysis, when considering its semantic and grammatical correctness. The second category is
to embed messages by linguistic modifications, including synonym substitution, syntactic
transformation, etc. The syntactic transformation-based method requires an ideal syntactic parser to
analyze the sentence structure, and its embedding capacity is limited. In practical, as the lack of enough
natural language knowledge and matured tools, it is not easy to implement. Synonym substitution
steganography is a relatively straightforward method, and it has rich source of information carries as
well as excellent security, so it is widely studied.
Bender(1996) described the data hiding method based on synonymous substitution roughly. First
they defined a synonym table by a synonym dictionary such as WordNet, for example, use 0 to
represent word ‘big’, 1 to represent ‘large’, then select suitable synonyms to replace words in the cover
text according to the secret bit.
A simplified example of this embedding is given in Figure 1. The secret bit string ‘011’ is to be
embedded, which can be divided into two code words ‘01’ and ‘1’, and the information carriers in the
cover text are words ‘like’ and ‘big’. According to the encoding dictionary, “love” represents ‘01’, and
“large” represents ‘1’, then these words are chosen to replace the original words in the cover sentence.
Finally, the stego sentence “I love this large house” will be sent to the receiver. The receiver can get
the secret message ‘011’ by an encoding dictionary and the decoding algorithm.

Figure 1. An example of synonym substitution

The synonymous substitution approach exploits the attribute of synonyms in natural language. A
synonym is a group of words which have near or same meaning in natural language. If a particular
word is replaced by its synonym in a sentence, the meaning should be preserved (or closely
approximated). The secret message embedded by this approach is difficult to be detected, but there are
still also some detection methods because of the weaknesses of synonymous substitution approach.
There are two limitations in the application of synonyms. Firstly, lots of words have more than one
sense and it is rare for two words to be synonymous in all senses, so some substitution is unsuitable.
(Taskiran et al., 2006) constructed a detector based on characteristics of the evaluated suitability of
synonyms for their context in a text. In the meanwhile, in order to make the substitution suitable and
right, lots of methods were proposed. For example, (Bolshakov, 2004) tested relative synonyms
previously for semantic compatibility with collocations, and then determined whether synonym
substitutions were correct, and at last replaced absolute synonyms directly. (Liu YL et al., 2007) took
the contextual window of sentence into account which containing the substituted synonym and its
surrounding words, then determined which substitution was right. (Muhammad HZ et al., 2006) only
adopted two absolute synonyms of each synset for embedding messages. Secondly, synonyms are
likely to have different syntactic distribution and word frequencies in different texts or particular
contexts. However, there are few scholars researching how to keep the word frequency distribution of
the original text unchanged and how to resist text steganalysis based on the second limitation. This
paper gives a feasible scheme to solve these problems.

207

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

In the secret bit string, “0” and “1” almost appear randomly, so each word has the similar
probability in stego-text. However, in natural text each word has its corresponding different frequency.
It provides an important clue for the synonym detection method. (Zhili Chen et al., 2011) proposed a
straight forward detection method based on Relative Frequency Analysis, which makes use of
the frequency characteristics of the testing texts, to detect substitution-based linguistic
steganography. If there is a hiding method that avoid causing grammar or semantic mistakes after
replaced, and keep the stego-text has the same word frequency as the original text, this method will
result in high imperceptibility and security, and the suspicion of an observer will be reduced and
approach zero. In fact, on one hand, synonyms have the characteristic of similar meaning, so the
ambiguity and grammar mistakes will be reduced, On the other hand, many scholars there have done
the researches on how to avoid grammar mistakes and how to judge whether it is the right replacement
of synonyms. So, in this paper, the point of proposed algorithm is how to keep the word frequency
distribution of the original text unchanged, not to avoid the ambiguity and grammar mistakes.
A good steganography does not focus on limiting or controlling the access to information, but
protecting the hidden information not to be detected or destroyed. There are three main factors that
influence steganography. They are capacity, security and robustness which are restricting each other. It
should be seeking for the appropriate balance between the three aspects according to the specific
requirement. In terms of security, the algorithm in this paper keep the stego-text has the same word
frequency as the original text as far as possible, and in terms of capacity, making concessions is
necessary. High imperceptibility and security are guaranteed.

2. Related work
As we said in previous section, a few works have been done on synonym substitution. There have
been several scholars researched how to avoid grammar mistakes after replaced and how to judge
whether it is the right replacement of synonyms, etc. Hence, it is necessary to make a review of some
works done on synonym substitution.

2.1. Winstein et al.’s data hiding method

Winstein(1998) gave the concept of synonym substitution for steganography and described a
proof-of-concept application Tyrannosaurus Lex that made synonym replacements based on the
WordNet selecting synonyms with correct senses. In the WordNet, Nouns, verbs, adjectives and
adverbs are grouped into sets of cognitive synonyms.

2.2. Chapman et al.’s data hiding method

Chapman and Davida(2001) provide an overview of the basic transformation processes, demonstrate
the quality of the generated text, and improve NICETEXT through embedding secret data by replacing
several special words with other words having not exactly the same but similar meaning.

2.3. Bolshakov et al.’s data hiding method

Bolshakov (2004) previously test relative synonyms for semantic compatibility with collocations to
judge whether it is the right replacement of synonyms, and replace absolute synonyms directly to avoid
grammar mistakes.

2.4. Topkara et al.’s data hiding method

There are close relationship between one word and another. In order to decrease the distortion,
Topkara (2006) just chose one alternative for every synonym to be replaced according to a certain
criterion to hide secret message.

208

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

2.5. Liu et al.’s data hiding method

So as to decrease the interaction between the surrounding words and the substituted word, Liu(2007)
take into account the contextual window of the sentence by using the disambiguation function of
Chinese lexical analysis to determine which substitution was right.

In order to obtain good imperceptibility, there some other methods limited replaceable synonyms in
a narrow scope. For example, (Muhammad HZ, 2009) described a Malay linguistic text steganography
which only adopted two absolute synonyms of each synset and hided a secret message into Bahasa
Melayu cover text. (Shirali-Shahreza MH, 2008) proposed a method for steganography in English text
by substituting the words which have different terms in British and American English. In addition,
synonym substitution cooperated with some auxiliary techniques to design text watermarking schemes,
for example, (Yang JL, 2007) used Spread Spectrum Technique to spread the watermarks over the
entire text via various synonym substitutions. These methods also didn’t consider the influence on
word frequency, and changed the word frequency.

3. Implementation
In this section, a steganography scheme based on synonym substitution and word frequency is
presented. The proposed steganography scheme is described below and comprises of embedding and
extraction process.

3.1. Embedding process

The proposed hiding algorithm composes of four procedures. In the first step, the synonym library is
transformed into the particular form based on words’ frequency. In the second step, the encoder
encrypts the secret information using the encrypt methods such as DES, RC4, RC2, etc. In the third
step, the whole article is scanned and judged firstly whether the currently processing text is
embeddable, and then the secret data will be transformed into the particular expressions. The
expression will be presented in next part. In the last step, the algorithm which retains synonym
frequency distribution is used for embedding the encrypted secret data. Figure 2 delineates the
proposed embedding process. The basic intention in the fourth step keeping the synonym distribution is
to ensure that the censor cannot detect word’s frequency unusual.

Figure 2. The embedding process

In the first step, the synonym library is transformed into a particular form based on words’
frequency, as the following description.
The synonym list L refers to a list of substitution sets used in certain text when applying
substitution-based linguistic steganography, and it is denoted as  = { ,  , ⋯ ,  }; The synonym set
 , the set of substitution elements that can be replaced each other, is denoted as
 = , , , , ⋯ , ,  and it satisfies the conditions: ,  ≥ , , (, ) ≈ (, ) ,

209

A Steganography on Synonym Frequency Distribution
                               Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

where  = 0, ⋯ ,  and  = 0, ⋯ ,  − 2. ; ,  denotes the frequency of synonym , ;
(, )represents a lexical meaning of , ; , is a substitution element, e.g. a word, a phrase, a
sentence and so on. In this paper, a substitution element is a word.
   In the second step, the encoder encrypts the secret information by the encrypt method. The secret
data is an integer , and any integer numbers can be expressed into the particular expressions as
follows.

                             = (  ⋯  )⋯ , 1 ≤  <  ( = 1, ⋯ , ).                        (1)

    The values of sequence  ,  , … ,  indicate different bases corresponding to the symbols  , ,
… ,  . For example,  =  mod  and   =  div  . And then,  =   mod  ,   ′ = ′   .
So, the decimal value of  can be computed as follows.

                                                        =                                             (2)
                                        
                               = ∏          −  − ∑        
                                                             ∙ ∏      ,  > 1.              (3)
                                        

                                                  =  + ∑ ∙ ∏
                                                                                                        (4)

   Equations above-mentioned can be used to generate a series of multiple-base notations with a set of
given bases; for example, assume that the secret data is an integer x = 96, and b , b , b , b … , b
respectively is 7, 5, 4, 2, …, n. According to the equation (2),  =  mod  = 96 mod 7 = 5
                                                                                          
and   =  div  = 96 div 7 = 13. And then,  =   mod  = 13   5 = 3,   =     =
13 div 5 = 2. So  = 2. According to the equation (1), 96=(235)457.
   According to the transformed synonym library, the encoder first scans the whole article, then judges
whether the currently processing text is embeddable. For clear description in third step, the related data
structures are explained as follows.

3.1.1. Text linked list

   The text linked list is generated by scanning the whole article from start to end word by word. A
simplified example of the text linked list is given in Figure 3. It is comprised of four fields: Serial
number, Content, Shifting bit, and Next pointer. The Serial number stores the serial number of the node,
and the default value of Shifting bit is zero. The Serial number is the appearing position of Word n in
the text. And the content is the single word extracted from the text when scanning the whole article
from start to end.

                                            Figure 3. Text linked list

                                      Table 1. Frequency counted table
                                                                                             Occurrence
        i                 {Si} in L               {Si’} in text      Serial numbers
                                                                                             frequency
                            {S1}                       S10            {3, 13, 55, 69}
                                                       S11                                  f1 (f1=8=4+3+1)
        1                                                                {4, 45,108}
                                                       S12                  {80}
                            {S2}                       S20               {6, 20, 40}
        2                                              S21                {19, 77}          f2 (f2=6=3+2+1)
                                                       S22                  {14}
      ……                    ……                        ……                    ……                     ……

                                                                                                                    210

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

3.1.2. Frequency counted table

The frequency counted table is generated by calculating the repeated words and their synonyms
from the text linked list. It is significant for judging the hiding capacity. It concludes serial numbers,
synonym set  from the synonym list, synonym  ′ appearing from the text and occurrence
number  . Serial numbers contain all the appearing positions of a word in the text.  is the
occurrence sum of  ′ which appear in the text and belong to {Si} in L.  is equivalent to  and
 satisfies the condition:  <  , where 1 <  < .
For example, in Table 1, synonym set {S1’} in text belongs to {S1} in L, and {S1’} conclude three
synonym {S10, S11, S12}. The serial numbers of a synonym S10 is {3, 13, 55, 69}, representing that
synonym S10 appears in 3, 13, 55, 69 nodes in the text linked list. From the table, S10 appears four times,
S11 appears three times, S12 appears once, so the total occurrence number of {S1 } is  =4+3+1=8.
The cover-text hiding capacity must be guaranteed before embedding. Assume that the cover-text has
∑   synonyms, and it represents the hiding capacity from 1 to  + ∑  ∙ ∏
   . If the

hiding capacity is smaller than the secret data, the embedding progress will be stopped.

3.1.3. Continuous dimension.

Continuous dimension is defined as the number of neighboring synonym words according to a
specific direction and a particular relationship. There is an assumption that the particular relationship
refers to whether the neighboring positions repeat the same word, and the specific direction is
clockwise. Continuous dimension is equivalent to the fixed weight of each word. It is important to the
embedding and extracting processes. As shown in Figure 4, synonyms which belong to the same {Si} in
L are extracted from the text in accordance with the appearance order, and each word has its
corresponding continuous dimension. And their continuous dimension is shown in Table 2.
Suppose that the synonym set { S 1 } appear in the article with the highest occurrence frequency, so
in the frequency counted table, its  =  . From the frequency counted table 1, Synonyms in
 = { ,  ,  } appear eight times totally in the text, namely f1 = 8, so b1 = 8, where S10 appears
four times, S11 appears three times and S12 appears once, then the maximum possible dimensions of S10,
S11, S12 are 4, 3, 1 respectively. In fact, the order of {S10, S11, S12} appearing in the text is shown in the
figure 4, and the relevant continuous dimensions are shown in the Table 2. As  =  mod b , b1 = 8,
if d1 = 5, then it is required that the largest dimension value ought to come from the fifth synonym. As
the synonyms S10, S11, S12 can be replaced by each other, it is very easy to meet the requirements.

Figure 4. Appearance order in text linked list

Table 2. Continuous dimensions of { }
Continuous
i Serial number Synonym Max dimension
dimensions
1 3 S10 1 4
2 4 S11 1 3
3 13 S10 1 4
4 45 S11 1 3
5 55 S10 2 4
6 69 S10 1 4
7 80 S12 1 1
8 108 S11 1 3

211

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

3.2. The embedding and extraction algorithm

3.2.1. The embedding algorithm:

Input: a cover-text C; an encrypted message whose decimal value is D.
Output: a stego-text S.
Steps:
a) Scan the cover-text, Construct the Text linked list.
b) Construct the Frequency counted table and judge whether the currently hiding capacity is
embeddable. If not, the progress will be shut down.
c) Get bn = fn. Compute  = (  ⋯  ) ⋯ and the value of dn.
d) Calculate the continuous dimension of corresponding synonyms which belong to synonym
set  , exchange synonyms between each other and assign the continuous dimension value
of the special word the maximum, minimum or particular value to mark the value of dn.
e) Output the stego-text S.

3.2.1. The extraction algorithm:

a) Scan the stego-text, Construct the Text linked list.
b) Construct the Frequency counted table.
c) Compute the value of bn according to the Frequency counted table, and get the value of dn
by computing the Continuous dimensions of each synonym.
d) Compute an integer  =  + ∑ ∙ ∏    and then derive the secret data D using
binary transformation.

4. Discussion and limitation
A synonym is a word having the same or nearly meaning as another in the language. The meaning
of a sentence should be closely approximated after a particular word is replaced by a synonym. But this
application of synonyms has its advantages and limitations. Firstly, synonyms have the characteristics
of similar meaning so that the ambiguity and grammar mistakes are rarely, but there are also some
words have more than one sense and it is rare for two words to be synonymous in all senses, so some
substitution is unsuitable. According to the above, there have been many scholars researched how to
avoid grammar mistakes after replaced and how to judge whether it is the right replacement of
synonyms. It’s not the point in this paper. Secondly, synonyms are likely to have different word
frequencies in a given text or a particular context. And previous schemes hardly consider the influence
on word frequency. So, in this paper, the point of the algorithm is to keep the word frequency
distribution unchanged, reaching high imperceptibility and security.
A good steganography does not focus on limiting or controlling the access to information, but
protecting the hidden information not to be detected or destroyed. Capacity, Security and Robustness
are three main factors that influence steganography, which are restricting each other. It should be
seeking for the appropriate balance between the three aspects according to the specific requirement.
In terms of capacity, in original synonym methods, a synonym usually contains 1 bit information,
"0" or "1". But owing to “0” and “1” almost appear randomly in the secret bit string, original
steganography method results in almost the same probability of synonyms appearing in related article,
and dangers in resisting text steganalysis based on synonym frequency. In our algorithm, according to
the expression (1) mentioned earlier, if there are 8 synonyms in article, so bn =8, and they represent the
three bit valid information together, including "000", "001", "010", "011", "100", "101", "110", "111".
In terms of the capacity, it makes some concessions.
The capacity of our method is related to the following aspects: First is the number of synonyms
included in the selected articles. The exact proportion of synonyms in cover text is not fixed, so
different text might have different capacity. Second is the construction of synonyms library. Synonyms
library construction is not an easy task that needs a lot of efforts and time. In our method, the proper
cover text is very important factor. This method is fit for translating the secret message which is small
and need very high requirement for security and robustness.

212

A Steganography on Synonym Frequency Distribution
Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

In general, this method keeps the semantics unchanged on the one hand, and maintains the synonym
frequency on the other hand. The capacity of single synonym containing secret information reduces the
large part, but the security and robustness are very well. However the capacity related to the body of
text and the size itself. What must be note is the situation this method is used for, in some situations we
need to hide little data, but other steganography methods are not suitable and can be broken. For
example, in the military field, secret information is condensed and sensitive, thus the requirements for
loading capacity of the carrier is low, but for security and robustness are very high and strict. Our
suggested method is advanced and suitable for these situations, and high imperceptibility and security
are guaranteed.

5. Future work and conclusion

Reviewing all the points, considering the word frequency makes the synonym substitution more
security and robust than the previous effort of word replacement which changes the frequency of word
itself. In the future work, how to improve the capacity is a pivotal and important task. In order to
improve the capacity of algorithm and hold the word frequency constant, the algorithm should be
improved in following directions.
1） Screening cover texts based on the length and the type, and exploring more and more suitable
texts for spreading secret message.
2） Improving and enhancing the coding scheme. A better coding would be achieved, and less
hidden bits are needed to hide same secret information.
3） Hidden text contains secondary or multiple times repeated embedding process.
4） Combine with the previous and original replace method to improve the hiding capacity. And
what must be guaranteed is after some words replaced by original method, the frequency of
words keeping constant.
How to reach the goal will be the next essential point in future work. Of course, there should be more
ways to increase the capacity, so we need study and research more in this regard. In short, this method is
suitable for the situations which demand very high security and transmit a small amount of the secret
information.

6. Acknowledgements
This work was partly supported by National Natural Science Foundation of China (No. 61070196),
National Natural Science Foundation of China (61103215), and National Natural Science Foundation of
China (61202439).

7. References
[1] Walter R. Bender, Daniel Gruhl, Norishige Morimoto, “Techniques for data hiding”, IBM
Systems Journal, vol.35, no.3.4, pp.313-336, 1996.
[1] Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani, “A novel Arabic text steganography
method using letter points and extensions”, In Proceedings of the WASET International
Conference on Computer, Information and Systems Science and Engineering, May 25-27, Vienna,
Austria, pp.38-31, 2007.
[2] Ramaswamy Chandramouli, Nasir Memon, “Analysis of LSB based image steganography
techniques”, In Proceedings of the International Conference on Images Processing, Oct. 7-10,
IEEE Computer Society, Washington DC., USA., pp.1019-1022, 2001.
[3] Kaliappan Gopalan, “Audio steganography using bit modification”, In Proceedings of the
International Conference on Acoustics, Speech and Signal Processing, April 6-10, IEEE
Computer Society, Washington, DC. USA, pp.412-424, 2003.
[4] Gwenael Doerr, Jean-Luc Dugelay, “A guide tour of video watermarking”, In Proceedings of the
International Conference on Signal Processing and Image Communication, pp. 263-282, 2003.
[5] Mazdak Zamani, Azizah Bt Abdul Manaf, Shahidan M. Abdullah, "An Overview on Audio
Steganography Techniques", JDCTA: International Journal of Digital Content Technology and its
Applications, vol. 6, no. 13, pp. 107-122, 2012

213

A Steganography on Synonym Frequency Distribution
                              Xiaoxi Hu, Gang Luo, Yongjing Lu, Lingyun Xiang

[6] Chapman, Mark, George Davida, “Hiding the hidden: A software system for concealing
     ciphertext as innocuous text”, In Proceedings of the International Conference on Information and
     Communications Security, Lecture Notes in Computer Sciences, Springer, Berlin, vol. 1334,
     pp.333-345,1997.
[7] Liu Yuling, Sun Xingming, Gan Can, Wang Hong “An efficient linguistic steganography for
     Chinese text”, In Proceedings of the 2007 IEEE International Conference on Multimedia and
     Expo, pp. 2094-2097, 2007.
[8] Shirali-Shahreza Hassan, Shirali-Shahreza Mohammad, “A new synonym text steganography”, In
     Proceedings of the 4th International Conference on Intelligent Information Hiding and
     Multimedia Signal Processing, pp. 1524-1526, 2008.
[9] Umut Topkara, Mercan Topkara, Mikhail Atallah, “The hiding virtues of ambiguity: quantifiably
     resilient watermarking of natural language text through synonym substitutions”, In Proceedings of
     the 8th Workshop on Multimedia and security, ACM press, pp.164-174, 2006.
[10] Yang Jianlong, Wang Jianmin, Wang Chaokun, Li Deyi, “A novel scheme for watermarking
     natural language text”, In Proceedings of the 3rd International Conference on Intelligent
     Information Hiding and Multimedia Signal Processing, vol. 2, pp.481-484, 2007.
[11] Muhammad Hasanah Zulcefli, Rahman Sharifah Mumtazah Syed Ahmad Abdul, Shakil Asma,
     “Synonym based Malay linguistic text steganography”, In Proceedings of the 2007 IEEE
     International Conference on on Innovative Technologies in Intelligent Systems and Industrial
     Applications, pp. 423-427, 2009.
[12] Bolshakov Igor A. , “A method of linguistic steganography based on collocationally-verified
     synonymy”, In Proceedings of 6th International Workshop Information Hiding, Lecture Notes in
     Computer Sciences, Springer, Berlin, vol. 3200, pp. 180-191, 2004.
[13] Atallah Mikhail J., Raskin Victor, Crogan Michael, Hempelmann Christian, Kerschbaum Florian,
     Mohamed Dina, Naik Sanket, “Natural language watermarking: Design, analysis, and a
     proof-of-concept implementation”, In Proceedings of 4th International Workshop Information
     Hiding, Lecture Notes in Computer Science, Springer, Berlin, vol. 2137, pp. 185-199, 2001.
[14] Chiang Yuei-Lin, Chang Lu-Ping, Hsieh Wen-Tai, Chen WC, “Natural language watermarking
     using semantic substitution for Chinese text”, In Proceedings of 2nd International Workshop
     Digital Watermarking, Lecture Notes in Computer Sciences, Springer, Berlin, vol.2939, pp.
     129-140, 2003.
[15] Bennett Krista, “Linguistic Steganography: Survey, analysis and robustness concerns for hiding
     information in text”, Purdue University, CERIAS Technical Report 2004-13, 2004.
[16] Luo Gang, Sun Xingming, Xiang LingYun, Liu Yuling, Gan Can, “Steganalysis on synonym
     substitution steganography”, Journal of Computer Research and Development (Chinese), vol. 45,
     no. 10, pp.1696-1703, 2008.
[17] Chen Zhili, Liusheng Huang, Wei Yang, “Detection of substitution-based linguistic
     steganography by relative frequency analysis”, Journal of Digital Investigation, vol. 8, no. 1,
     pp.68-77, 2011.
[18] Tian Zuwei, Sun Xingming; Yang Hengfu, “ Information hiding algorithm based on PE file icon
     resource”, Journal of Convergence Information Technology, vol. 6, no. 6, pp.469-475, June 2011

                                                                                                         214

You can also read