Compress the mobile web even further - both HTTP and HTTPS

As already stated in a previous post, there are a couple of reasons why you may want to compress the data you receive from the Internet. The main reason is the cost associated with mobile data transfers.

Unfortunately, the previous post only dealt with HTTP, whereas more and more web sites force HTTPS on us, starting with Google. This post exposes another solution, that is suitable for both protocols.

Introduction

The problem with HTTPS is that it was created so that no third-party could alter the data (it is secure). Yet we want to alter the data!

Achieving this goal is done through a technique known as “Man-In-The-Middle attack”: the proxy talks to the web browser as if it was the server, and talks to the real server while pretending to be the web browser; thus the “man-in-the-middle” (mitm, i.e. the proxy) has to decrypt the data from the server, and encrypt it again before sending it to the web browser. Since the mitm does not know the real server’s private encryption key, it has to use its own, which does not go unnoticed (HTTPS is secure, after all); more on this later.

Beware, decrypting HTTPS data before the web browser gets it is an attack of sorts, since such data might be very sensitive (bank balance, private emails…). This is why you should only run such a proxy on a server which you completely trust (your own)! You’ve been warned. On the other hand, with HTTPS being overused these days, it could be acceptable not to follow this advice: after all, nobody cares for example if you look for cupcakes recipes on Google…

The tool for the job is Mitmproxy; as the name implies, it is a proxy that acts as a “man-in-the-middle”. The way Mitmproxy deals with HTTPS is well explained. In particular, it is explained that cryptographic certificates get generated, one of which you have to install in any browser you intend to configure for your proxy.

So, first step is to install Mitmproxy. Although you may have Mitmproxy available in your distribution’s repository, I urge you to check that the version is at least 0.9, so that you can restrict access to your proxy using a password. In particular, Debian Stable’s version is too old. So I’ll build on the recommended installation method instead (the method from the authors’ web site).

Preparation

From here on, I will use the # prompt to indicate a command run as user root (or with sudo), and the $ prompt to indicate a command run as user mitmproxy (see below).

Mitmproxy does not have be run as root, so it should not. So first create a user named mitmproxy:

# useradd -c mitmproxy -d /opt/mitmproxy -g 65534 -m -k /dev/null -N -u 10000 -s /bin/bash mitmproxy

I chose gid 65534 for the group, which is group nogroup on Debian. As for the uid, pick any value that is free on your system. Bash as a shell is needed for later, if your Linux distribution is Debian or Ubuntu.

As a prerequisite, you also have to install screen and PIP; Debian example:

# apt-get install python-pip
# apt-get install screen

Installation

Now comes the time to install Mitmproxy. You have to choose if you want to install it system-wide or just for the mitmproxy user. I chose the latter, since I don’t like “random files” to land inside /usr, which is managed by the distribution. Thus, the installation command I recommend would be:

$ pip install --user mitmproxy

Notice that the command is run by the mitmproxy user. Files will get installed under /opt/mitmproxy (this user’s home directory); expect about 20MB.

However, with Debian or Ubuntu, it seems that PIL is unable to find the JPEG and ZLIB software libraries. So, if you are using one of those Linux distributions, and PIL is not already installed on your system, then instead of the above single command, the following steps have to be followed:

  1. Find where the files are:
    $ find /usr/lib \( -name libz.so -o -name libjpeg.so \) -print
    /usr/lib/arm-linux-gnueabi/libjpeg.so
    /usr/lib/arm-linux-gnueabi/libz.so
    In my case, the files are located in /usr/lib/arm-linux-gnueabi.
  2. Download PIL without installing:
    $ pip install --user --no-install PIL
  3. Fix the installation file with the location noted at step one:
    $ sed -i $'/^[[:blank:]]*add_directory(library_dirs,[[:blank:]]*"\\/usr\\/lib")/a\\\n\tadd_directory(library_dirs, "/usr/lib/arm-linux-gnueabi")' /opt/mitmproxy/build/PIL/setup.py
  4. Install PIL without downloading:
    $ pip install --user --no-download PIL
  5. Install Mitmproxy:
    $ pip install --user mitmproxy

You may get a number of errors due to missing development files. If this happens, just install the packages that provide the missing files in your distribution, and resume the process at the step where the error occurred. In my case, I had to run these commands:

# apt-get install python2.7-dev
# apt-get install libxml2-dev
# apt-get install libxslt1-dev

For some reason, PIP did not look for the libxml include file deep enough in the filesystem hierarchy, so I had to run this additional command:

# ln -s /usr/include/libxml2/libxml /usr/include/

When the installation is done, you should have a mitmproxy executable file inside /opt/mitmproxy/.local/bin, which you can try with this command:

$ /opt/mitmproxy/.local/bin/mitmproxy -h

Configuration

For the configuration, I thought it would be better to do things the standard way, so I looked at the way Ziproxy files were organized, and I did the same. Run these commands:

# mkdir /etc/mitmproxy
# chown mitmproxy:nogroup /etc/mitmproxy
# echo 'DAEMON_OPTS="-p 8081 -q -z --singleuser myuser:mypassword"' >/etc/default/mitmproxy

In the /etc/default/mitmproxy file above, you should change the user and password to your liking; their intent is to make sure that only you can use your proxy. Now create an executable file named /etc/init.d/mitmproxy with this content:

#!/bin/sh
### BEGIN INIT INFO
# Provides: mitmproxy
# Required-Start: $remote_fs $network
# Required-Stop: $remote_fs $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Init script for mitmproxy
# Description: This is the init script for mitmproxy.
### END INIT INFO

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/opt/mitmproxy/.local/bin/mitmproxy
RUNAS=mitmproxy
NAME=mitmproxy
DESC='Man-In-The-Middle Proxy with data compression'

test -x $DAEMON || exit 0

# Include mitmproxy defaults if available
if [ -f /etc/default/mitmproxy ] ; then
. /etc/default/mitmproxy
fi
DAEMON_OPTS="$DAEMON_OPTS --confdir=/etc/mitmproxy -s /etc/mitmproxy/zyp.py"

set -e

running() {
ps --no-headers -u $RUNAS | grep -q .
}

do_start() {
off_echo=$1

[ $off_echo ] || echo -n "Starting $DESC: "
if running; then
echo "$NAME is already running."
exit 0
fi

su $RUNAS -c "screen -d -m -h 0 -S $NAME $DAEMON $DAEMON_OPTS"
if running; then
[ $off_echo ] || echo "$NAME."
else
echo " ERROR."
fi
}

do_stop() {
off_echo=$1

[ $off_echo ] || echo -n "Stopping $DESC: "
killall $* -u $RUNAS
[ $off_echo ] || echo "$NAME."
}

case "$1" in
start)
do_start
;;
stop)
do_stop
;;
force-stop)
echo -n "Forcefully stopping $DESC: "
do_stop
sleep 5s
if running; then do_stop -9; fi
;;
restart|force-reload)
echo -n "Restarting $DESC: "
do_stop 1
sleep 5s
do_start 1
echo "$NAME."
;;
status)
echo -n "$NAME is "
if running ; then
echo "running."
else
echo "not running."
exit 1
fi
;;
*)
N=/etc/init.d/$NAME
echo "Usage: $N {start|stop|restart|force-reload|status|force-stop}" >&2
exit 1
;;
esac
exit 0

As you can see, this is a standard init script, so you’ll be able to manage Mitmproxy like any other service :-)

What about data compression?

So far, nothing at all deals with data compression. That is Ziproxy’s job. So, how do you make Ziproxy and Mitmproxy work together?

You don’t.

Unfortunately, it is not possible to use Ziproxy’s features while using Mitmproxy. Thankfully though, Mitmproxy is easily scriptable; I have to thank mhils on IRC (#mitmproxy @ irc.oftc.net) for pointing this out to me. So I wrote a script that would do most of what Ziproxy does: compress images that are big enough to expect some gain, and compress data with gzip where applicable. Create a file named /etc/mitmproxy/zyp.py, with this content:

import Image, cStringIO, gzip
def response(context, flow):
ct = flow.response.headers["content-type"]
if len(ct) > 0:
if ct[0][:6] == "image/" and len(flow.response.content) > 400:
s = cStringIO.StringIO(flow.response.content)
img = Image.open(s).convert("L")
s2 = cStringIO.StringIO()
img.save(s2, "jpeg", quality=10)
flow.response.content = s2.getvalue()
flow.response.headers["content-type"] = ["image/jpeg"]
flow.response.headers["content-length"] = [len(flow.response.content)]
elif ct[0][:5] == "text/" or ct[0][:12] == "application/":
s2 = cStringIO.StringIO()
gz = gzip.GzipFile(fileobj=s2, mode='w')
gz.write(flow.response.content)
gz.close()
flow.response.content = s2.getvalue()
flow.response.headers["content-encoding"] = ["gzip"]
flow.response.headers["content-length"] = [len(flow.response.content)]

I chose to put this file in the configuration directory, as I see it as something you may want to tweak to your liking. In particular, you may want to improve the elif condition based on the precise list of mime types that are deemed compressible by Ziproxy…

Summing up…

With all of the above done, the proxy should be ready to be started.

Like with the HTTP-only solution, you have to configure the devices that will use the proxy; on Android, I use ProxyDroid. More over, you also have to tell your browsers that they can trust your proxy as a Certificate Authority (CA). Once the proxy has been started, you'll find that these files were created:

/etc/mitmproxy/mitmproxy-ca-cert.cer
/etc/mitmproxy/mitmproxy-ca-cert.p12
/etc/mitmproxy/mitmproxy-ca-cert.pem
/etc/mitmproxy/mitmproxy-ca.pem

Pick one of the former three to be be imported on your device, alongside other Certificate Authorities.

The end result? As a test, I loaded the start page of LinuxFR: images get compressed by an average of 85%, and the text files (CSS, JS, HTML) get smaller than their default already-compressed size!

As far as I am concerned, that is at least 10€ spared from my monthly bill :-P

I want more!

If you have come this far, you have probably noticed that the proxy is a bit slow, although reasonably so. If this is your case, and you get better performance with Ziproxy, you may want to use both: Ziproxy for HTTP, and Mitmproxy for HTTPS ;-)

To achieve this, you only need to configure both proxies with a different port, for example 8001 for Ziproxy, and 8002 for Mitmproxy. Then configure your client device to use a different proxy for HTTP and HTTPS.

If your client device does not allow such detailed settings, but allows the use of a “PAC” file (Proxy Automatic Configuration), then use a file such as this one, assuming your proxy lives at proxy.example.net:

function FindProxyForURL(url, host) {
if (shExpMatch(url, "https:*")) return "PROXY proxy.example.net:8002";
return "PROXY proxy.example.net:8001";
}

Changelog:

  • 2014-01-24 — Update “content-length” in the Python script.

Commentaires

1. Le mardi 5 juin 2018, 15:47 par phisik

Thank you for the guide, it helped a lot! But the exact script given above did not work for me. For some reason "flow.response.content = s2.getvalue()" is not understood by browser as gzip content. Updated script for Python 3.6.5 & mitmproxy v4.0 that worked for me is below:

from mitmproxy import http
from PIL import Image
import gzip, io

def response(flow: http.HTTPFlow):

if "Content-Type" in flow.response.headers:
ct = flow.response.headers["Content-Type"]

# compress raw text data
if ct[0:5] == "text/" or ct[0:12] == "application/" or ct == "image/svg":
if "Content-Encoding" in flow.response.headers:
# if already compressed with gzip/deflate/LZW/Brotli etc. - skip it
print("Skipping compressed response")
return

print("Processing text: " + flow.request.url)
print("Compressing content...")
content = gzip.compress(flow.response.content)
size_difference = len(flow.response.content)-len(content)
if size_difference < 0:
print("Original content was smaller than compressed. Skipping conversion...")
return
print("Saved " + str(size_difference) + " bytes")

headers = flow.response.headers
status_code = flow.response.status_code

# this works
flow.response = http.HTTPResponse.make(
200, content,
)

# this does not works, can anyone say why?
#flow.response.content = content

flow.response.headers = headers
flow.response.status_code = status_code
flow.response.headers["Content-Type"] = "text/html"
flow.response.headers["Content-Encoding"] = "gzip"
flow.response.headers["Content-Length"] = str(len(content))
flow.response.headers["Modified-By-Mitmproxy"] = "true"
# compress images
elif ct[0:6] == ("image/") and len(flow.response.content) > 400:
print("Processing image:" + flow.request.url)

# convert to BW
s = io.BytesIO(flow.response.content)
img = Image.open(s).convert("L")

width, height = img.size
if max([width, height]) > 300:
img = img.resize((round(width/2), round(height/2)))

# save as jpeg
s2 = io.BytesIO()
img.save(s2, "jpeg", quality=25)

size_difference = len(flow.response.content)-len(s2.getvalue())
if size_difference < 0:
print("Original image was smaller. Skipping conversion...")
return

print("Saved " + str(size_difference) + " bytes")

flow.response.content = s2.getvalue()
flow.response.headers["content-type"] = "image/jpeg"
flow.response.headers["content-length"] = str(len(s2.getvalue()))
flow.response.headers["Modified-By-Mitmproxy"] = "true"

2. Le vendredi 15 juin 2018, 23:09 par theYinYeti

Thank you phisik for your contribution! To be honnest, I had forgotten about this article :-D
I am now running Archlinux, and you make me want to try tris again in Archlinux, using better tech (systemd…).
Cheers

Ajouter un commentaire

Le code HTML est affiché comme du texte et les adresses web sont automatiquement transformées.

La discussion continue ailleurs

URL de rétrolien : http://yalis.fr/cms/index.php/trackback/34

Fil des commentaires de ce billet