Two REDoS vulns in cpython

2019年11月
security,
research,
redos,
regex,
python

I ran my top-secret REDoS-finding engine over the python code in cpython and found two remotely-exploitable vulnerabilities. Making a request to a malicious web server leads to denial of service (approximately infinite CPU time).

http.cookiejar #

This issue (bpo38804) was serious because the vulnerable code could be as simple as:

import requests
requests.get("http://malicious/")

To exploit this, simply cause your target to hit a malicious server running the proof-of-concept code in my fix PR which sends massive Set-Cookie response headers.

The underlying issue was the regular expression http.cookiejar.LOOSE_HTTP_DATE_RE used to parse the Expires field from Set-Cookie response headers. Ignoring the ?-optional capture groups, the original regex can be simplified to

\d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance. LOOSE_HTTP_DATE_RE backtracked if last character didn't match \s or (?![APap][Mm]\b)[A-Za-z]+.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-1-1" + (" " * 2000) + "!")

caused catastrophic backtracking. Timing on my computer when doubling the number of spaces:

 n_spaces |  seconds
      512       .383
     1024      3.02
     2048     23.4
     4096    184
     8192   1700

As expected, it's approx O(n³). The maximum n_spaces to fit in a Set-Cookie header is 65506 which will take days.

‌

I fixed this bug with my first contribution to python/cpython.

urllib.request #

Less code is vulnerable to this bug (bpo38826/bpo39503) as it requires that you are using an auth handler. Vulnerable client:

import urllib.request
opener = urllib.request.build_opener(
    urllib.request.HTTPBasicAuthHandler()
)
opener.open("http://malicious/")

A malicious server just has to send back a 401 with a crafted WWW-Authenticate header such as my proof-of-concept code.

The vulnerable regular expression is urllib.request.AbstractBasicAuthHandler.rx:

rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
                'realm=(["\']?)([^"\']*)\\2', re.I)

The first line can act like:

(,*,)*(,+)[ \t]

showing that there are many different ways to match a long sequence of commas.

Input from the WWW-Authenticate or Proxy-Authenticate headers of HTTP responses will reach the regex via the http_error_auth_reqed method as long as the header value starts with "basic ".

We can craft a malicious input:

urllib.request.AbstractBasicAuthHandler.rx.search(
    "basic " + ("," * 100) + "A"
)

Which causes catastrophic backtracking and takes a large amount of CPU time to process. I tested the length of time (seconds) to complete for different numbers of commas in the string:

n_commas | seconds
      18     0.289
      19     0.57
      20     1.14
      21     2.29
      22     4.55
      23     9.17
      24    18.3
      25    36.5
      26    75.1
      27   167

Showing an exponential relationship O(2^x) !

The maximum length of comma string that can fit in a response header is 65509, which would take my computer just 6×10¹⁹⁷⁰⁶ years to complete. Compare this to the worst case of the cubic http.cookiejar vulnerability above being measured in days.

Another researcher later requested CVE-2020-8492 for this vulnerability. It is in the process of being fixed.

Previous: Big Data Lake, Big Data Leak
Next: Exploit Grafana (CVE‑2019‑15043)