Hands-Free Searching Using Google Voice

Page created by Frances Burns
 
CONTINUE READING
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010                          47

                       Hands-Free Searching Using Google Voice

                   Hamzeh Al_bool1, Maytham Ali2, Hamdi Ibrahim3, and Adib M.Monzer Habbal4

                              InterNetWorks Research Group, UUM College of Arts and Sciences,
                                 University Utara Malaysia, 06010 UUM, Sintok, MALAYSIA

                                                                many search engines have been developed to facilitate
Abstract                                                        information search across the globe via the World Wide
Google is one of the most popular information retrieval         Web (www). Google is one of the most popular search
systems which it has a fast increasing in the number of         engines and information retrieval systems. It is a huge,
users. It can provide a foundation for improving the            global web-based application with search engine.
learning and teaching environments. However, not all            Furthermore, it is very easy to use [17] with very simple
potential users have the capabilities that allow them to use    interface. On top of that, it is linked with more than
Google without impediments. This is specifically true for       hundreds of millions of web pages involving similar
motor-disabled users. Therefore, for this category of users     numbers of different conditions. It has been recorded with
a special approach of interaction with Google interface is      over ten millions of queries daily [3].
necessary. The missing capabilities of this category of
users should be substituted by capabilities that these users    Unfortunately not all potential users have all capabilities
have in order to deal with Goggle. This study entails the       that allow them to use Google information retrieval system
development of Hands-Free voice recognition Google              without impediments. There are many people who have
Search Engine to allow physically disabled the                  injuries and disabilities which prevent many of them to
opportunity to operate a Google and browse the result of        deal with computer applications to suit their conditions. It
search without using a keyboard or mouse. The                   has been revealed that more than 50% of newly hired
implementation is carried out with Java programming             computer users reported musculoskeletal with symptoms
language and the result of usability testing shows that         like eyestrain, neck and shoulder pain, low back pain,
93% of the participants found the system usable.                elbow pain tendonitis), forearm pain (muscles) and nerve
                                                                entrapments [6]. Additionally, there are also disability
Key words:                                                      constraints in using the traditional keyboard input system.
Voice Recognition, motor-disabled users, Google, Java           Hence, another form of input is necessary, such as voice in
Programming Language                                            recognition systems.

                                                                The development of information retrieval system with
1. Introduction                                                 voice recognition products provides a foundation for
                                                                improving the learning and teaching environments through
Recent advances in computer technology have enabled             technology. Voice recognition might be able to help the
users of voice recognition products to achieve desirable        disabled users to explore information through a set of
results which were previously impossible with the largest       voice commands that provide easy ways and controls
mainframe computers or workstations. One of the most            mechanism for searching information. It is expected that
important reasons that led to the production a large            voice will replace the physical manipulation as the
number of voice input systems is the ability of the voice       dominant input modality. This shift will dramatically alter
input systems to remove a lot of literacy barriers in           input needs, and the way users interact with computers.
communicating with the instructions of the computer.
                                                                This study is to introduce a new service which is not
Besides, the inventions of the technological innovations        currently available in Google search engine (GSE) [18]. It
have suggested the emerging knowledge society, in which         proposes the implementation of voice recognition input in
all our engagements in life will be focused on having           GSE. The voice recognition input will          entail the
knowledge of something. Such knowledge can be acquired          development of Hands-Free GSE Using Voice
in many ways and in different forms. On the other hand,         Recognition to provide physically challenged individuals

  Manuscript received August 5, 2010
  Manuscript revised August 20, 2010
48                   IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010

especially the visually-impaired with the opportunity to         Voice recognition system is not an exception in this regard.
operate the GSE and browse the result of searching               The usability of any voice recognition system is very
without using keyboards or mouse. Furthermore, it will           important and is proven by [5]. Also, they suggest that
assist people who normally have problem with                     human factors should surround the use of voice
musculoskeletal disorders when working with computers            recognition system.
via typing of instruction.
                                                                 However, it was also revealed that using voice recognition
This paper is organized into six main sections as follows:       to replace the traditional keyboard input system requires
It starts with this section that introduces the study and this   additional support [8]. This implies that there is a need to
paper. It is followed with a section that discusses some         take care of the sensitivity of the interaction mode in terms
related works. Section 3 follows next, discussing on the         of noise management and the like exists.
concept of hands-free voice recognition GSE. The
implementation and evaluation are provided in Sections 4         The related works are reviewed to help deeply understand
and 5, which are finally followed with a concluding              the missing and the impediments capabilities of motor-
section.                                                         disabled users that should be substituted by capabilities
                                                                 that these users have in order to deal with Google. Mainly,
                                                                 the studies in the literature come up with important
                                                                 information that is needed to be more aware of each of end
2. Related Works                                                 users for Hands-Free searching using voice recognition
                                                                 GSE. The importance of developing such system is to help
In the quest to assist the disabled individuals, in terms of     the motor-disabled users to explore information by
their mode of interactions with computer-enhanced                developing a set of voice commands that provide easy way
technologies, [5], proposed an assistive system utilizing        and control mechanism for searching information over
voice recognition input technology. In the application, the      Google information retrieval system.
voice commands are generated to assist the disabled
people to operate basic electronic equipments without            Unlike other types of computer system, developing a voice
necessarily touching the equipments. In addition, the            recognition system is better done using dynamic
authors used mouse emulators and keyboard emulators for          programming approach [9]. This serves as the justification
interaction with computer for different computing                for choosing Java programming language for system
purposes.                                                        development in this study.
Later, for physically disabled people to enjoy part of the
dividends of the 21st century technological innovations, [1]
propose a framework for controlling computer desktop             3. Hands-Free Searching Using Google Voice
with issuance of voice commands.                                 Concepts
In fact, efforts on utilizing voice recognition for searching    The concept behind the proposed voice recognition system
purposes are not new. In relation, a voice recognition           over GSE is based on a native method, robot class;
system has also been designed for visually impaired              Grammar file, sphinx-4, and Wiener filter as discussed in
people [7]. Although the system is specific to people with       the following paragraphs.
vision problem, the developed voice recognition system in
their work is more general. If not for the provision of          Native method (a.k.a Exec Method): it is primarily
alternative input system, these categories of challenged         designed to enhance the interaction between the program
individuals might not have the opportunity to survive in         and the operating system. It is a way of combining the
this knowledge world.                                            power of C or C++ programming with Java [13]. In this
                                                                 study, such advantage is utilized to be able to run the
On the other hand, voice recognition has also been               internet browser application and also for easy display of
employed as security information in place of the                 website addresses.
conventional knowledge-based authentication approach
[4]. This implies that voice message is a promising stress-      Robot Class in java: This class is used to generate native
free approach to replace the conventional typing-based           operating system input events for the purposes of testing
input system.                                                    automation, self-running demos, and other applications
                                                                 where control of the mouse and keyboard are needed [2].
On top of the robustness of a computer implementation, its       The primary purpose of Robot is to facilitate automated
friendly usage experience is considered more important.          testing of Java platform implementations. It is employed to
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010                         49

deal with voice commands by taking over the control of          4. Implementation
the keyboard and mouse commands. The type of command
method in this case is voice command.                           A system has been implemented with the technique
                                                                discussed in the previous section using Java based
Grammar File: for the grammar file, a context free              interface. The command and control grammar-based
grammar is specified as the input is capable of accepting       recognition is implemented to make the system robust and
the voice input and produces a java language functions          speaker independent. At present, the system includes two
that recognizes the appropriate instances of the grammar.       types of functions. Firstly, through a microphone that is
The implication of this is that instances are created for all   connected to a computer, the user can search information
possible grammars.                                              by using Google search engine through a write function
                                                                method voice commands. Write function is a method that
Sphinx-4: it is a state-of-the-art voice recognition system     is built to allow users to enter any sentences they want to
written entirely in Java, and is considered as a suitable       search for, in Google search text box by voice. Secondly,
framework for Voice recognition to adopt. The main              the user can access all Google search engine functions,
blocks are the FrontEnd, the Decoder, and the Linguist.         such as Web, Images, Maps, Books and deal with all sub
Supporting blocks include the Configuration Manager and         functions Google provides through using mouse voice
the Tools blocks. The communication between the blocks,         commands and keyboard voice commands. Table 1 shows
as well as communication with an application, is described      all the voice commands with resulting events to be carried
in [16]. Sphinx-4 is used to analyze the uses of voice          out by the commands.
commands as shown in figure 1.
                                                                Table: 1 Voice Command Description
                                                                Voice
                                                                Commands                    Command Events
                                                                Name
                                                                Search           To put the mouse pointer in the Google
                                                                                 text box.
                                                                Go               To run the Google process.
                                                                Up,      Down, To control the mouse in four trends
                                                                Right,     and
                                                                Left.
                                                                Open             To click on any button or link in which
                                                                                 it depends on the mouse position
                                                                Tab              To move to the next section of a Google
                                                                                 page.
                                                                Enter            To begin the desired process, which is
                                                                                 usually an alternative over the pressing
                                                                                 an OK button.
                                                                Back             To return to the previous page.
                                                                Home             To return to the Google Home Page.
          Fig. 1 Sphinx-4 Decoder Framework.
                                                                Having specified the commands, with specified desired
Wiener filter: it is based on tracking a priori SNR using
Decision-Directed method, proposed by [19]. The two-            actions as listed in Table 1 this study proceeds with
step noise reduction (TSNR) technique removes the               initializing the steps in implementing the system. Figure 2
annoying reverberation effect while maintaining the             shows the steps involved in the system implementation by
benefits of the decision-directed approach. However,            opening the Google page. The diagram explains that users
classic short-time noise reduction techniques, including        can utter any sentence that they want to search for. If the
TSNR, introduce harmonic distortion in the enhanced             sentence is not included in the grammar file then the
speech. To overcome this problem, a method called
                                                                system will show expiration message that the sentence is
harmonic regeneration noise reduction (HRNR) is
implemented in order to refine the a priori SNR used to         not included in the grammar file. But if the sentences are
compute a spectral gain able to preserve the speech             in the grammar file the sentences will be displayed in
harmonics.                                                      Google text box search.
50                  IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010

                                      Start
                                                             5. Test and Evaluation
                          Voice Input                             A. User test
                                                             The user test has been carried out. It was heavily based on
                                                             the users’ own words. This study chooses not to collect
                                                             any quantitative data or define goals for the tests to reach
                            Voice
                          Recognition
                                                             in order to pass as successful or not. The reason for this is
                                                             the same as made us go with a qualitative investigation in
                                                             the first place. The users had informative opinions of the
                [No]                  Is word in a           program after the tests. The evaluation include direct
                                      gram m ar file?
                                                             questions on whether each user thinks he or she will be
                                [Yes ]
                                                             using voice recognition input-based interaction in GSE
                                                             situations in the future, and if the tested program is a real
                    The word will be wrote in                alternative in this sense.
                    Google search text box

                                                             In terms of procedure, this study invited users personally.
                                      End                    In the invitation, users were provided with the system and
                                                             a suggestion on the details to be tested with each user.
              Fig. 2 Opening Google page.                    This study intends to test different aspects of the reliability
                                                             and easy to use concepts with different users. The reason
Further, this study also emphasizes on using voice to        is that it is more effective to have the test users inspired
                                                             and performs the tasks spontaneously, than to make them
control the input and navigate in a web page. Figure 3
                                                             all try every detail. The invitation also involves educating
illustrates the algorithm for the function. The diagram      the users in terms of dictation software as a way of
explains that users can use the voice command shown in       preparing them for the actual tests that are following.
Table 1 to control the search process and explore the        Along with the invitation process, the program will be
search result. In addition, users can control mouse          customized in order to fit each task and user.
movements and keyboard commands with voice to move
to any place in the Google web page and to access any link   When this study has made-up decisions with the selected
                                                             participants in the test and evaluation process, the
or function.
                                                             investigation were decided to be on reliability, easy to use,
                                                             and satisfactions of Hands-Free searching using voice
                               Start                         recognition GSE. Therefore this study concluded that the
                                                             users should all be familiar of using GSE. Furthermore,
                                                             this study thought it would be rewarding to have at least a
                       Voice Input
                                                             couple of the users already familiar with normal speech
                                                             recognition. Finally, the participants were of total thirty,
                                                             divided as follows:
                         Voice
                       Recognition                           Table 2: The participants of user test
                                                              Numb           Users          Age        Gender
                                                               er of     description       averag
            [No]                Is it com mand ?              Users                           e
                                                              Two      programmers/e        42.5          M
                             [Yes ]                                         ngineers
                                                                        working with
                   Apply the action of                                       speech
                       command                                            recognition
                                                                         development
                                                              Five          students         36        M/F
                                End
                                                                       already familiar
                                                                         with normal
 Fig. 3 Using voice commands to control search process.                      speech
                                                                          recognition
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010                           51

 Fiftee     Students that      27.2     M/F                    hundred words and keyphrases organized for the purposes
 n          using Google                                       of the search processes. these documents also contain the
           search a lot but                                    voice commands and the description of these commands
               without                                         where the main purpose to provide these voice commands
            experience of                                      to make participants more familiar about the voice
                speech                                         commands that are available and more clearly about the
             recognition.                                      functions of these commands in order to browse Google
                                                               interfaces.
 Two         Users have        23.8     M
                motor                                          Finally, we asked the participants to use the system and
             disabilities.                                     browse the result of searching without using keyboards or
                                                               mouse.
 Six        Newly hired        26.6     M/F
           computer users                                      The results of this laboratory experiment have been
                                                               analyzed and presented in the next section.
All participants were gathered from University Utara
Malaysia, and users who have motor disabilities from                B. Result
outside the university. They were volunteered to perform       There are two obvious problems found in the developed
the user test sessions.                                        system.      Firstly, technical problems such as voice
                                                               recognition limitations in the current voice input systems.
Besides, the vocabularies that are stored in the grammar       Secondly, the limitations of the grammar file size where it
file consist of more than 3000 words. These vocabularies       can only contain 120,000 words. These problems reduce
were collected manually from different sources; also they      the reliability of gathered result.
are related to learning and education area. This study gives
special attention to the participants about the domain of      In the study, the observations reveal that users deal
words that are availably to be search about, in which they     directly with Google interface. This study interprets that
are only related to English education and learning area.       the developed system is easy to use. This is partly because
                                                               of the interface of GSE itself, which is very simple and
The user test was focused on three aspects which are: (1)      minimal.
how would users describe good reliability of our
techniques? (2) how would users describe good usability        In addition, from the qualitative observation, this study
of our techniques?; and (3) are users satisfied?.              quantifies the data for better view of the results. As stated
                                                               earlier, this study investigates the ease-of-use, reliability,
Therefore, the valuation of the program in the context         and satisfaction among users in the user test. Hence, the
scenarios had come in order to identify the main three         results were gathered accordingly, and are presented in
aspects of this study as follows:                              Figure 4.

Firstly, we introduced Hands-Free GSE Using Voice
                                                                        Hands-free searching using Google
Recognition program to all participants and explained the
main functions of our system to increase the awareness                                voice
among the participants on what this system can do and                        93%     86.60%
                                                                    100%        76.60%
how they can deal with it.                                           80%
                                                                     60%
Secondly, we installed our prototype software on thirteen            40%
systems and we divided participants into randomly groups,            20%                                   Total
except the users they have motor disabilities we put them             0%                                   participants …
in an independent group to be more focused. Also,
programmers/engineers working with speech recognition
development they do not have any group where those
played the oversight role on all groups within test and
evaluation process.

Thirdly, before we started we distributed a set of               Fig. 4 The result of the participants satisfaction on the
documents to participants, which it contains a list of one              Voice-based Google search Instruction
52                  IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010

From the graph in Figure four, this study interprets that      7. References
users found that the system is easy to use. In addition, the
results of search activities were found highly reliably.       [1] Abdeen, M., Mohammad, H. & Yagoub, M. C. E. An
With the high reliability and ease of use, the users were          Architecture for Multi-lingual Hands-free Desktop Control
found highly satisfied with system.                                System for PC Windows. IEEE Xplore, 1747- 1750. (2008).
                                                               [2] Baldwin. RIntroduction to the Java Robot Class in Java,
                                                                   retrieved,        August         10,       2010,       from
Two different noise suppression methods were tested in
                                                                   http://www.dickbaldwin.com/java/Java1472.htm. . (2003)
Matlab [19]. They were noise suppression using Spectral        [3] Brin S., Page L. The anatomy of a large-scale hypertextual
enhancement and noise suppression using Wiener filter. A           Web search engine* 1. Computer Networks and ISDN
wiener filter of the method proposed by Plapous was                Systems 30, 107-117. . (1998).
tested. Furthermore, a better approach will be adding an       [4] Cui, B. & Xue, T. Design and Realization of an Intelligent
implementation of Wiener filter in Java to the front end of        Access Control System Based on Voice Recognition,
Sphinx-4, as the front end of Sphinx-4 is flexible and             International Conference on computing, communication,
pluggable.                                                         Control, and Management, ISECS, IEEE, 229 – 232. . (2009).
                                                               [5] Gao, X. T., Ong, S. K., Yuan, M. L. & Nee, A. Y. C. Assist
                                                                   Disabled to Control Electronic Devices and Access
This study has been introduced a new service which is not
                                                                   Computer Functions by Voice Commands, Communication of
currently available in Google search engine (GSE) [18].            the ACM, 37 – 42. (2007).
Furthermore, to the best of our knowledge, there is no         [6] Gerr F., Marcus M., Ensor C., Kleinbaum D., Cohen S.,
similar or close study to ours in order to compare our             Edwards A., Gentry E., Ortiz D., Monteilh C.A prospective
approach with other studies approaches or to comparison            study of computer users: I. Study design and incidence of
between our results with other studies results.                    musculoskeletal symptoms and disorders. American Journal
                                                                   of Industrial Medicine 41`. 221-235. (2002).
                                                               [7] Halimah, B. Z., Azlina, A., Behrang, P. & Choo, W. O. Voice
                                                                   Recognition System for the Visually Impaired: Virtual
6. Conclusion                                                      Cognitive Approach, IEEE Xplore. (2008).
                                                               [8] Mills, S., Saadat, S. & Whiting, D. Is Voice Recognition the
Previous studies have shown that there is a need to                solution to keyboard-Based RIS?, IEEE Xplore, 1 -6, (2006).
provide an alternative to the conventional keyboard and        [9] Ney, H. & Ortmanns, S. Dynamic Programming Search for
mouse input system, not only because of disability on the          Continuous Voice recognition, Signal Processing Magazine,
                                                                   IEEE Xplore, 16(5), 64 – 83. (1999).
part of the users but also, the associated stress with the     [10] Rashid, R. A., Mahalin, N. H., Sarijari, M. A. & Abdul
modes of interaction among normal people. This has led to          Azizi, A. A. Security System Using Biometric Technology:
several techniques in the area of voice recognition. In this       Design and Implementation of Voice Recognition System
age, the world is being ruled with knowledge throughout            (VRS), International Conference on Computer and
the potentials of technological innovations. Voice input           Communication Engineering, IEEE, 898 – 902. (2008).
system will remove all barriers imposed by the traditional     [11] Sergey B., Lawrence P. The anatomy of a large-scale
input system and will enable the physically challenged             hypertextual web search engine. Computer Networks and
individuals access to necessary information as long as they        ISDN Systems 30:107-117.(1998).
can communicate with audible voice. In this study, the         [12] Shrawankar U., Thakare V. Feature Extraction for a Voice
                                                                   Recognition System in Noisy Environment: A Study, Second
developed voice-based GSE is found usable. It is therefore
                                                                   International Conference on Computer Engineering and
recommended that future researches should look at the              Applications (ICCEA), 2010, 1. 358-361. (2010).
integration of language translation that can make such a       [13] Vairale V., Honwadkar K. Wrapper Generator using Java
system to be able to work with variety of world languages.         Native Interface. International Journal of Computer Science,
However, there are some limitations in the existing voice          2(2). 126-139. (2010).
input systems. One of which is the recognition accuracy.       [14] Yuan, K., Hou, W. & Zhao, Y. Human Factor Research of
Sphinx-4 is written in java and it does not have any noise         Voice Search on Mobile Internet, IEEE Xplore. (2008).
suppression module. It is very important to have one to        [15] Vaishnavi, V., & Kuechler, W. Design research in
improve its performance in the presence of noise. It will          information systems, retrieved, July 19, 2010, from
                                                                   http://www.isworld.org/Researchdesign/drisISworld.htm.
be interesting to implement the recommended Wiener
                                                                   (2005).
filter implementation in java. It can be easily added to the   [16] Walker W., Lamere P., Kwok P., Raj B., Singh R., Gouvea
Front end block of Sphinx-4.                                       E., Wolf P., Woelfel J. Sphinx-4: A flexible open source
                                                                   framework for voice recognition. Sun Microsystems, Inc.
                                                                   Mountain View, CA, USA:18. (2004).
                                                               [17] SeniorNet. Lesson 4. retrave in 12 augest, 2010 from
                                                                   http://www.seniornet.org/php/default.php?ClassOrgID
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010                                 53

    =5337&PageID=5920Hamzeh Boul is typing a message.                                    Adib M.Monzer Habbal received his
    (2004).                                                                              degree in computer engineering and post
[18] Google, ANNUAL REPORT PUR-SUANT TO SECTION                                          graduate     Diploma     in    Informatics
    13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF                                        engineering from Aleppo University, Syria,
    1934 For the fiscal year ended December 31, 2008. retrieved,                         in 2003 and 2005 respectively. In March
    July 19, 2010, from http://investor.google.com/ (2009).                              2007, he received the Master’s degree in
[19] Plapous, C.; Marro, C.; Scalart, P. Im-proved Signal-to-                            Information Technology from University
    Noise Ratio Estima-tion for Speech Enhancement, IEEE                                 Utara Malaysia, Malaysia. Currently, He is
    Transactions on Audio, Speech, and Language Processing,                              a lecturer at UUM, Malaysia. His main
    14 . 2098 -2108. (2006).                                                             research    interests  include     Internet
                                                                                         performance engineering, network security,
                                                                    network application, TCP and congestion control in Mobile Ad-
                                                                    hoc Networks. He serves on the Technical program committee of
                         Hamzeh       Mohammad        Al_Abool      many international conferences, such as NETAPPS 2010,
                         received the B.S. degree in Software       ICCAIE 2010, and ISCI 2011. Also he is a technical reviewer for
                         engineering from Al-Balqa’ Applied         IEEE TENCON 2009 and IEEE WiMob 2010.
                         University, Jordan, in 2008. Currently,    
                         he doing Master studies in Information
                         Technology (IT) at University Utara
                         Malaysia and expected to graduate by
                         this year. He has participated as
                         presenter in the national conference
                         titled " National Conference on Rural
                         ICT Development (RICTD)" held on
2009, also he participated in the ITU-UUM Asia Pacific Centre
of Excellence Training Workshop on " Wireless communication
System: Wireless Network Fundamentals" held on 2009.

                        Maytham Abdulhussein Ali received
                        the B.S. degree in Computer Science
                        from The University of Mustansiriya of
                        Iraq in 2007. Currently, he doing Master
                        studies in Information Technology (IT)
                        at University Utara Malaysia and
                        expected to graduate by this year. He has
                        participated in the national conference
                        titled " National Conference on Rural
                        ICT Development (RICTD)" held on
2009, also he participated in the ITU-UUM Asia Pacific Centre
of Excellence Training Workshop on” Wireless communication
System: Wireless Network Fundamentals" held on 2009.

                         Hamdi Khalifa Ibrahim received the
                        B.S. degree in Computer Science from
                        Sebha University, Libya, in 2004.
                        Currently, he doing Master studies in
                        Information    Technology     (IT)    at
                        University Utara Malaysia and expected
                        to graduate by this year. He got on
                        International Computer Driver License
                        (ICDL) certificate in 2006. He has been
                        working for two years as a teacher in
computer institutes and programmer in the General Company of
Electricity for four years. He has participated in The 2010
International Conference on Education Technology and
Computer.
You can also read