Detection of Vulnerabilities in Web Applications – Server-Side Request Forgery

Abhishek Singh, Ramesh Mani, Chih-Wei Chao, Anjan Venkatramani

In 2019 the server-side request forgery exploitation technique [1] was used to retrieve AWS (Amazon Web Services) credentials that were subsequently used to steal the personal information of over 100 million Capital One customers.

In any traditional network, localhost, web-based services, and the internal networks are behind a firewall. SSRF allows a threat actor to exploit a vulnerability in a web application and to make an HTTP request to the localhost, web-based services, or in the internal networks. Figure 1 shows the vulnerable code of the Google Forms WordPress plug-in [2], which is prone to SSRF.

Figure 1.png

Figure 1: Vulnerable check in the Google Forms plug-in, leading to SSRF.

If a threat actor sends "http://docs.google.com@internalip.com" the request will pass the regular expression check in the code shown in Figure 1 and will be sent to the internal IP at the address denoted by "internalip.com.” This will prompt a response from the services hosted in the internal network. As per the RFC 3986 [3], the structure of the URI will be as shown in Figure 2.

Figure 2.png

Figure 2: Structure of URI as per RFC 3986.

As per RFC 3986, the authority component is preceded by a double slash (“//”) and is terminated by the next slash (“/”), question mark (“?”) or number sign (“#”) character, or by the end of the URI.

RFC 3986 [3] also specifies the format of authority, as shown in Figure 3.

Figure 3.png

Figure 3: Format of authority as per RFC 3986.

So if the exploit "http://legitimatewebsite.com@internalip.com" is sent to a function that parses the input, such as urllib.parse, it will be parsed and will give the output value of the host as legitimatewebsite.com while urllib.urlopen() will show the value of the input as internalip.com. This mismatch in the value of the host allows a threat actor to bypass the checks in web applications. Besides the mismatch in the value of the host, RFC 3986 also specifies the option of providing host as IP‑literal, IPv4Address, or reg-name.

Figure 4.png

Figure 4: Options for IP address.

This means that any check for IP by a web application must ensure that the legitimate IP addresses are checked in every format.

 

Detection of SSRF

The algorithm to detect SSRF instruments APIs such as urllib.urlopen(), urllib.request.urlopen(), ldap.initialize(), dictclient.Connection() etc., which takes a URI as an input parameter and opens a network object denoted by the URI to read it. In addition, methods that accept user inputs, such as GET, POST, etc., are also instrumented. A program dependency graph is then used to identify the APIs that make network connections and accept inputs from methods that accept user inputs such as GET() and POST(). For every invocation of an API that opens a URI, a check is made to determine if the IP address of the URL to which the connection is going is either local, a loopback address, or the local link address. If the condition is found to be true, then by using the data flow graph, it can be checked whether the parameters passed to the API, which opens a network object denoted by the URI, are from a method that accepts external input. If this condition is found to be true, then an alert for SSRF is raised. The internal IP address as per RFC 1918 is shown in Figure 5.

Figure 5.png

Figure 5: Internal IP address as per RFC 1918.

The loopback IP address for most operating systems is 127.0.0.1 ~ 127.255.255.254. If the URI is a file, then the data flow graph is used to check whether the parameters passed to the APIs in which open files are from methods that accept external inputs. If the condition is found to be accurate, then an alert for SSRF is raised.

Test results to detect the variation of exploitable exploits for SSRF are available here.

Conclusion

The algorithm to detect injection-based exploitation has the following inherent advantage:

 

  • The algorithm identifies the SSRF in the code during the invocation of the functions which opens URL. With each detected exploitation attempt by a threat actor, the vulnerable code path automatically gets detected. This automatic identification of the vulnerable part of the code will aid to patch the code preventing further exploitation.

  • The algorithm only leverages binary instrumentation of the application to detect injection-based exploitation. Hence the detection is independent of the deployment of an application and the manner it accepts external inputs. The application can be deployed as a backend microservices and can accept batched requests which get broken down by the middle layer and served to the rear end microservices. In this scenario also the algorithm will detect injection exploits.

The algorithm to detect SSRF in this blog follows the principle of detect, response, and remediate. Not only the algorithm detects exploitation, responsive measures can be applied to stop exploitation; it also provides remedial action, which is the identification of the vulnerable code path. With each detected attempt of exploitation, if the vulnerable code gets patched, detection alerts will decrease, increasing the exploitation complexity for a  threat actor.

References

[1] Krebs, B. What we can learn from the capital one hack. https://krebsonsecurity.com/tag/capital-one-breach/.

[2] Google Forms <= 0.91 - Unauthenticated Server-Side Request Forgery (SSRF). https://wpvulndb.com/vulnerabilities/9013.

[3] Uniform Resource Identifier (URI): Generic Syntax RFC 3986. https://tools.ietf.org/html/rfc3986.

Leave a Reply

Your email address will not be published. Required fields are marked *