solydxk.nl 'forbidden' according to Python urllib2 [solved]

Contribute code to SolydXK and make it even better.
iain1940
Posts: 23
Joined: 09 Mar 2013 08:22
Location: UK

solydxk.nl 'forbidden' according to Python urllib2 [solved]

Postby iain1940 » 23 Dec 2013 00:13

This is very strange !

I have written a python (2.7) program to generate javascript for an html page which will display a series of web pages for a given amount of time. This is driven by an xls file containing
- url to display
- time to display in seconds

To test it out I loaded a set of pages including 'www.solydkx.com' it worked fine.

Today I decided to fine tune it by checking that the webpages were valid using the python 'urllib2' module
When I processed the same list "solydxk.nl" was thrown out with error '403 forbidden'

The problem occurred originally on Windows XP (running on VirtualBox) but was replicated on Solydx .
the following code demonstrates the problem

Code: Select all

import urllib2

url='http://www.solydxk.com/'

ob=urllib2.urlopen(url)

print 'done'


# on windows XP 
#Traceback (most recent call last):
#  File "//VBOXSVR/window_share/websites/url_prob1.py", line 5, in <module>
#    ob=urllib2.urlopen(url)
#  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
#    return _opener.open(url, data, timeout)
#  File "C:\Python27\lib\urllib2.py", line 410, in open
#    response = meth(req, response)
#  File "C:\Python27\lib\urllib2.py", line 523, in http_response
#    'http', request, response, code, msg, hdrs)
#  File "C:\Python27\lib\urllib2.py", line 448, in error
#    return self._call_chain(*args)
#  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
#    result = func(*args)
#  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
#    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
# HTTPError: HTTP Error 403: Forbidden 

Linux

#urllib2.HTTPError: HTTP Error 403: Forbidden
#File "/home/iain/Data/iain/Desktop/window_share/websites/url_prob1.py", line 5, in <module>
  #ob=urllib2.urlopen(url)
#File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
  #return _opener.open(url, data, timeout)
#File "/usr/lib/python2.7/urllib2.py", line 410, in open
  #response = meth(req, response)
#File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
  #'http', request, response, code, msg, hdrs)
#File "/usr/lib/python2.7/urllib2.py", line 448, in error
  #return self._call_chain(*args)
#File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
  #result = func(*args)
#File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
  #raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
 
The problem seems to occur with all the normal URL variations ( without http: , 'www' , '.com',
nb. I have noticed that when the Solydxk site comes up in a browser the 'www.' is dropped

I have never had any problems access the Solydxk site on either system - clicking the line in the spreadsheet brings up the site as normal.
Other web sites seem to respond OK - but not a lot of testing !
Any Ideas ?

snoewchen
Posts: 10
Joined: 29 Oct 2013 10:16
Location: Vienna, Austria

Re: solydxk.nl 'forbidden' according to Python urllib2

Postby snoewchen » 23 Dec 2013 15:11

The 403 error means basically that the server is reachable, the server understood the request but refuses to take any further action.

The reason in your case might be the "user-agent"-attribute used by "urllib2".

If you try for example

Code: Select all

wget http://www.solydx.com
you will get the 403 error.

But if you try

Code: Select all

wget http://www.solydxk.com --user-agent="Mozilla/5.0"
the answer is "301 Moved Permanently" and wget makes a successful second attempt.

You would get the same result (without the 301 error) if you try

Code: Select all

wget http://solydxk.com --user-agent="Mozilla/5.0"
If you try

Code: Select all

wget http://solydxk.com
without the user-agent-parameter you will get the 403 error again.

I guess there is some reason why the solydx-web-server doesn't want to serve clients other than web browsers. Because of that I would suggest you treat the 403-error as a valid response instead of spoofing a wrong user-agent...

iain1940
Posts: 23
Joined: 09 Mar 2013 08:22
Location: UK

Re: solydxk.nl 'forbidden' according to Python urllib2

Postby iain1940 » 23 Dec 2013 17:34

Thanks for that - I think I understand it.
[we are fooling the server into believing that we are a Mozilla 5 browser ?]
By default urllib2 identifies itself as Python-urllib/2.7 : GET / HTTP/1.1" 200
151 "-" "Python-urllib/2.7 python for beginners "


I'm only checking that the page exists so that another browser (google-chrome) can read it later.
.
after consulting http://stackoverflow.com/questions/8021 ... b2-urlopen

I changed the code to

Code: Select all

import urllib2

url='http://www.solydxk.com/'
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(url)

print 'done'
again thanks for quick response
and it works.

User avatar
Arjen Balfoort
Site Admin
Posts: 9223
Joined: 26 Jan 2013 19:36
Location: Netherlands
Contact:

Re: solydxk.nl 'forbidden' according to Python urllib2 [sol

Postby Arjen Balfoort » 23 Dec 2013 22:28

Moved to a more appropriate forum.

I bet you'll get that with any wordpress site.
You won't have that issue if you use WebKit.WebView


SolydXK needs you!
Development | Testing | Translations

User avatar
Arjen Balfoort
Site Admin
Posts: 9223
Joined: 26 Jan 2013 19:36
Location: Netherlands
Contact:

Re: solydxk.nl 'forbidden' according to Python urllib2 [sol

Postby Arjen Balfoort » 26 Dec 2013 09:08

For Python3:

Code: Select all

from urllib.request import Request, build_opener
url = 'http://solydxk.com'
req_headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0'}
request = Request(url, headers=req_headers)
opener = build_opener()
response = opener.open(request)
if response.code != 200:
    print("not good!")


SolydXK needs you!
Development | Testing | Translations


Return to “Code”

Who is online

Users browsing this forum: No registered users and 1 guest