I have been doing a lot of http retrieval lately and the most efficient way to do that is with gzip compression enabled. Fortunately Python makes that really easy. All you have to do is derive from urllib2.HTTPHandler and override http_open().
import httplib, urllib, urllib2 class GzipHandler(urllib2.HTTPHandler): def http_open(self, req): req.add_header('Accept-encoding', 'gzip') r = self.do_open(httplib.HTTPConnection, req) if 'Content-Encoding'in r.headers and \ r.headers['Content-Encoding'] == 'gzip': fp = gzip.GzipFile(fileobj=StringIO(r.read())) else: fp = r resp = urllib.addinfourl(fp, r.headers, r.url, r.code) resp.msg = r.msg return resp
The Accept-encoding header tells the server that this client supports gzip compression and if the Content-Encoding header is set to gzip the server returned an compressed response. Now you just need to build your opener.
def retrieve(url): request = urllib2.Request(url) opener = urllib2.build_opener(GzipHandler) return opener.open(request)
For more information see: