TAGS :Viewed: 8 - Published at: a few seconds ago

[ Server error with using urllib2 on Google AppEngine ]

I am unsure why hosting this simple code on Google AppEngine returns a server error when any query is submitted to the form. The problem seems to be with the line html = urllib2.urlopen("http://google.com/search?q=" + q).read() as the code works fine without it.

import webapp2
import urllib2


form="""
<form action="/process">
    <input name="q">
    <input type="submit">
</form>
"""


class MainHandler(webapp2.RequestHandler):
    def get(self):
        self.response.out.write(form)


class ProcessHandler(webapp2.RequestHandler):
    def get(self):
        q = self.request.get("q")
        html = urllib2.urlopen("http://google.com/search?q=" + q).read()
        self.response.out.write(html)


app = webapp2.WSGIApplication([('/', MainHandler),
                               ('/process', ProcessHandler)],
                               debug=True)

This is the error returned:

Error: Server Error
The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.

Answer 1


Probably www.google.com doesn't accept this kind of direct connections, canceling connections from a particular user agent. In a simple python environment, you could change the user-agent string, but I think it's not possible to do that through google app engine.

Answer 2


Google is returning a 403 to your search string

>>> import urllib2
>>> html = urllib2.urlopen("http://google.com/search?q=Test").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 442, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 629, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

This works however:

html = urllib2.urlopen("http://google.com").read()

So it looks like google are trying to stop this kind of searching. As the other poster suggested, changing the User Agent string might stop the 403. Pick something common!

I've just tested with a Mozilla user agent set and I can get the results I think you are looking for

import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://google.com/search?q=Test', None, headers)
html = urllib2.urlopen(req).read()
print html