Googlesearch Documentation - Mario Vilas - Jul 22, 2019 - Read the Docs

Page created by Dwight Haynes
 
CONTINUE READING
googlesearch Documentation

                   Mario Vilas

                     Jul 22, 2019
Contents

1   Indices and tables          1

2   Reference                   3

Python Module Index            11

Index                          13

                                i
ii
CHAPTER    1

             Indices and tables

• genindex
• modindex
• search

                              1
googlesearch Documentation

2                            Chapter 1. Indices and tables
CHAPTER             2

                                                                                                    Reference

googlesearch.search(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None,
                           domains=None, pause=2.0, only_standard=False, extra_params={}, tpe=”,
                           user_agent=None)
    Search the given query string using Google.
          Parameters
                • query (str) – Query string. Must NOT be url-encoded.
                • tld (str) – Top level domain.
                • lang (str) – Language.
                • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                  last month).
                • safe (str) – Safe search.
                • num (int) – Number of results per page.
                • start (int) – First result to retrieve.
                • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
                • of str or None domains (list) – A list of web domains to constrain the search.
                • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                  search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                  vary!
                • only_standard (bool) – If True, only returns the standard results from each page. If
                  False, it returns every possible link from each page, except for those that point back to
                  Google itself. Defaults to False for backwards compatibility with older versions of this
                  module.
                • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                  eters, which must be URL encoded. For example if you don’t want Google to filter similar
                  results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                  query.

                                                                                                                   3
googlesearch Documentation

               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.
googlesearch.search_images(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                               stop=None, pause=2.0, domains=None, only_standard=False, ex-
                               tra_params={})
    Shortcut to search images.
         Note Beware, this does not return the image link.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.
               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.
               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar
                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.

4                                                                                         Chapter 2. Reference
googlesearch Documentation

googlesearch.search_news(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                             stop=None, domains=None, pause=2.0, only_standard=False, ex-
                             tra_params={})
    Shortcut to search news.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.
               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.
               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar
                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.
googlesearch.search_videos(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                               stop=None, domains=None, pause=2.0, only_standard=False, ex-
                               tra_params={})
    Shortcut to search videos.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.

                                                                                                                  5
googlesearch Documentation

               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.
               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar
                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.
googlesearch.search_shop(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                             stop=None, domains=None, pause=2.0, only_standard=False, ex-
                             tra_params={})
    Shortcut to search shop.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.
               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.

6                                                                                         Chapter 2. Reference
googlesearch Documentation

               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar
                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.
googlesearch.search_books(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                              stop=None, domains=None, pause=2.0, only_standard=False, ex-
                              tra_params={})
    Shortcut to search books.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.
               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.
               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar

                                                                                                                  7
googlesearch Documentation

                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str
         Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
             loop forever.
googlesearch.search_apps(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0,
                             stop=None, domains=None, pause=2.0, only_standard=False, ex-
                             tra_params={})
    Shortcut to search apps.
         Parameters
               • query (str) – Query string. Must NOT be url-encoded.
               • tld (str) – Top level domain.
               • lang (str) – Language.
               • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                 last month).
               • safe (str) – Safe search.
               • num (int) – Number of results per page.
               • start (int) – First result to retrieve.
               • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
               • of str or None domains (list) – A list of web domains to constrain the search.
               • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                 search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                 vary!
               • only_standard (bool) – If True, only returns the standard results from each page. If
                 False, it returns every possible link from each page, except for those that point back to
                 Google itself. Defaults to False for backwards compatibility with older versions of this
                 module.
               • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                 eters, which must be URL encoded. For example if you don’t want Google to filter similar
                 results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                 query.
               • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                 ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                 applications: ‘app’}
               • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                 default.
         Return type generator of str

8                                                                                         Chapter 2. Reference
googlesearch Documentation

          Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will
              loop forever.
googlesearch.lucky(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, only_standard=False, ex-
                           tra_params={}, tpe=”)
    Shortcut to single-item search.
          Parameters
                • query (str) – Query string. Must NOT be url-encoded.
                • tld (str) – Top level domain.
                • lang (str) – Language.
                • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                  last month).
                • safe (str) – Safe search.
                • num (int) – Number of results per page.
                • start (int) – First result to retrieve.
                • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
                • of str or None domains (list) – A list of web domains to constrain the search.
                • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                  search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                  vary!
                • only_standard (bool) – If True, only returns the standard results from each page. If
                  False, it returns every possible link from each page, except for those that point back to
                  Google itself. Defaults to False for backwards compatibility with older versions of this
                  module.
                • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                  eters, which must be URL encoded. For example if you don’t want Google to filter similar
                  results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                  query.
                • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                  ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                  applications: ‘app’}
                • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                  default.
          Return type str
          Returns URL found by Google.
googlesearch.hits(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, domains=None, extra_params={},
                        tpe=”, user_agent=None)
    Search the given query string using Google and return the number of hits.
          Note This is the number reported by Google itself, NOT by scraping.
          Parameters
                • query (str) – Query string. Must NOT be url-encoded.
                • tld (str) – Top level domain.
                • lang (str) – Language.

                                                                                                                   9
googlesearch Documentation

                • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” =>
                  last month).
                • safe (str) – Safe search.
                • num (int) – Number of results per page.
                • start (int) – First result to retrieve.
                • or None stop (int) – Last result to retrieve. Use None to keep searching forever.
                • of str or None domains (list) – A list of web domains to constrain the search.
                • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the
                  search slow, but a lapse too short may cause Google to block your IP. Your mileage may
                  vary!
                • only_standard (bool) – If True, only returns the standard results from each page. If
                  False, it returns every possible link from each page, except for those that point back to
                  Google itself. Defaults to False for backwards compatibility with older versions of this
                  module.
                • of str to str extra_params (dict) – A dictionary of extra HTTP GET param-
                  eters, which must be URL encoded. For example if you don’t want Google to filter similar
                  results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every
                  query.
                • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow-
                  ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’,
                  applications: ‘app’}
                • or None user_agent (str) – User agent for the HTTP requests. Use None for the
                  default.
          Return type int
          Returns Number of Google hits for the given search query.
googlesearch.ngd(term1, term2)
    Return the Normalized Google distance between words.
     For more info, refer to: https://en.wikipedia.org/wiki/Normalized_Google_distance
          Parameters
                • term1 (str) – First term to compare.
                • term2 (str) – Second term to compare.
          Return type float
          Returns Normalized Google distance between words.
googlesearch.get_random_user_agent()
    Get a random user agent string.
          Return type str
          Returns Random user agent string.

10                                                                                         Chapter 2. Reference
Python Module Index

g
googlesearch, 3

                                   11
googlesearch Documentation

12                           Python Module Index
Index

G
get_random_user_agent() (in module google-
       search), 10
googlesearch (module), 3

H
hits() (in module googlesearch), 9

L
lucky() (in module googlesearch), 9

N
ngd() (in module googlesearch), 10

S
search() (in module googlesearch), 3
search_apps() (in module googlesearch), 8
search_books() (in module googlesearch), 7
search_images() (in module googlesearch), 4
search_news() (in module googlesearch), 4
search_shop() (in module googlesearch), 6
search_videos() (in module googlesearch), 5

                                                 13
You can also read