Friday, May 30, 2008

What's the Score of the Game - Part 2: Web Security Metrics

In my earlier post entitled "What's the Score of the Game?" I presented the concept that what ultimately matters with web application security is how the application performs during a "Real Game" which means when it is deployed live on the production network. Everything else that happens before that time is preparation. It is important preparation, however no one is given the Lombardi Trophy based on how hard they practiced! No, you actually have to play in and win the Super Bowl in order to obtain that hardware :) So, referencing back to the title of this post, if the production network is "where" the webappsec game is actually played, then the next logical question is "How do we keep score?"

This is where web security metrics come into play.

When I say web security metrics, I am referring to data that is relevant to answering the question - Are we winning or losing the game? By that I mean - in football in order to win the game you have to have more points than your opponent at the end of the game. In webappsec, if you are in charge of defending a web application, you win if an attacker tries to compromise your site and is unable. This seems like a "Duh" statement but it actually isn't when you consider how many web security vendors try and tell you the "Score of the Game". Most vendors will present customers with colorful pie charts stating that a web site has X number of vulnerabilities or that they blocked X number of attacks. This is like asking someone who won the Giants vs. Patriots game and getting a response of - the Patriots completed 35 passes while the Giants intercepted them 3 times. Not really answering the question is it? While some customers may be distracted by eye-catching graphical displays of this information, the savvy ones will ask this key question - Were there any successful attacks? The answer to this question will tell you the score of the game - did the opponent score any touchdowns??? All other data is corollary.

Sidebar - realistically you are dealing with two different types of attacks - the automated kind where an attacker is targeting a specific vuln and searching for sites that have it or the manual type where the attacker is targeting your site specifically and they must then find a vuln to exploit. In the former case, it is like when a football opponent tries one desperate Hail-Mary pass down the field as time expires. If you bat the football down, you win. If you don't, you lose. In the latter case, it is like a methodical 99-yard drive down the field where your opponent is running 2-yard dive plays left and right and slowly marching down the field. If you give them enough time, then they will most likely score. In this case, it is critical that your webappsec defenses are such that you are able to force the attacker to manually try and search for complex logical attacks or develop evasions, you might then be able to implement countermeasures to prevent them access before being successful.

With this concept as a backdrop, let me present the web security metrics that I feel are most important for the production network and gauging how the web application's security mechanisms are performing -

  1. Web Transactions per Day - This should be represented as a number (#) and establishes a baseline of web traffic and provides some perspective for the other metrics. Some WAFs will be able to keep track of this data on their own. If this is not possible then you would need to correlate web server log data.
  2. Attacks Detected/True Positive - This data should be represented as both a number (#) and as a percentage (%) of item 1 - total web transactions. This data is a general indicator of malicious web traffic. These numbers are presented in a all WAF reporting functions.
  3. Missed attacks/False Negative - This data should be represented as both a number (#) and as a percentage (%) of item 1 - total web transactions. This data is a general indicator of the effectiveness of the web application firewall's detection accuracy. This is the key metric that is missing when people are trying to determine their webappsec success. If the WAF is configured for "alert-centric" logging and therefore only logging blocked attacks then you will not be able to report on this data automatically using the WAFs built-in reporting facilities. If, on the other hand, the WAF audit logging all relevant traffic (requests for non-static resources, etc...) then an Analyst would have raw data to conduct batch analysis and identify missed attacks. The fall-back data source would be whatever incident response processes exist for the organization. There may be other sources of data (web logs, IDS, etc...) where a web attack may be confirmed.
  4. Blocked Traffic/False Positive - This data should be represented as both a number (#) and as a percentage (%) of item 1 - total web transactions. This data is a general indicator of the effectiveness of web application firewall's detection accuracy. This is very important data for many organizations because blocking legitimate customer traffic can mean missed revenue. This can usually be identified by evaluating any Exceptions that needed to be implemented by the WAF in order to allow "legitimate" traffic that was otherwise triggering a negative security rule or signature. Besides Exceptions, this data can be identified in Reports if the WAF has appropriate alert ticketing interfaces where an Analyst can categorize the alert.
  5. Attack Detection Failure Rate - This data should be represented as a percentage (%) and is derived by adding items 3 (False Negatives) and 4 (False Positives) and then dividing by item 2 - True Positives. This percentage will give an overall effectiveness of web application firewall's detection accuracy - meaning the Score of the Game.

Sidebar - I am referencing web application firewalls in the metrics as they are specifically designed to report on this type of data. You could substitute any security filtering code mechanism built directly into the webapp code, however many implementations do not adequately provide logging and reporting for this data. This code may present users with a friendly error message, however they are essentially performing "silent drops" of requests from the back-end logging perspective.

As stated earlier, many webappsec vendors will only provide statistics related to item #2 (blocked attacks). While this data does factor into the equation, it does not provide the best indicator of overall web application security on its own. So, if you really want to know what the score of the game is for you web applications, I suggest you start tracking the metrics provided in this post.

Tuesday, May 20, 2008

What's the Score of the Game?

We, as the webappsec community, should try and move away from "Holy Wars" debating that there is only one right way to address web application vulnerabilities - source code reviews, vulnerability scanning or web application firewalls - and instead focus on the end results. Specifically, instead of obsessing on Inputs (using application x to scan) we should turn our attention towards Outputs (web application hackability). This concept has been skillfully promoted by Richard Bejtlich of TaoSecurity and is called Control-Compliant vs. Field-Assessed security. Here is a short paragraph intro:

In brief, too many organizations, regulators, and government agencies waste precious time and resources devising and auditing "controls," regardless of the effect these controls have or do not have on security. They are far too input-centric; they should become more output-aware. They obsess over recording conditions they believe may be helpful while remaining ignorant of the "score of the game." They practice management by belief and disregard management by fact.

While the impetus for Richard's soapbox rant was the Goverment auditing mindsets, we can still apply this same "input-centric" focus to our current state of webappsec. Due to regulations such as PCI, we are unfortunately framing web security in an input-centric lens and forcing users to checkmark a box stating that they are utilizing process x rather than formulating a strategy to conduct field-assessments to obtain proper metrics on how difficult is it to hack into the site. We are focusing too much on whether a web application's code was either manually or automatically reviewed or if it was scanned with vendor X's scanner, rather than focusing on what is really important - did these activities actually prevent someone from breaking into the web application? If the answer is No, then who really cares what process you followed. More specifically, the fact that your site was PCI compliant at the time of the hack is going to be of little consequence.

Let's take a look at each of these input-centric processes through another great analogy by Richard:

Imagine a football (American-style) team that wants to measure their success during a particular season. Team management decides to measure the height and weight of each player. They time how fast the player runs the 40 yard dash. They note the college from which each player graduated. They collect many other statistics as well, then spend time debating which ones best indicate how successful the football team is. Should the center weigh over 300 pounds? Should the wide receivers have a shoe size of 11 or greater? Should players from the north-west be on the starting line-up? All of this seems perfectly rational to this team. An outsider looks at the situation and says: "Check the scoreboard! You're down 42-7 and you have a 1-6 record. You guys are losers!"

This is an analogy that I have been using more and more recently when discussing source code reviews as they are somewhat like the NFL Scouting Combine. Does measuring each players physical abilities guarantee a player or teams success? Of course not. Does it play a factor in the outcome of an actual game? Usually, however a team's Draft Grade does not always project to actual wins the following season. Similarly, is using an input validation security framework a good idea? Absolutely, however the important point is to look at the web application holistically in a "real game environment" - meaning in production - to see how it performs.

Sticking with the analogy, vulnerability scanning in dev environments is akin to running an Intra-squad scrimmage. It is much more similar to actual game conditions - there is an offense and defense, players are wearing pads and there is a time clock, etc... While this is certainly more realistic to actual game conditions, there is one key element missing - the opponent. Vulnerability scanners do not act in the exact same way that attackers do. Attackers are unpredictable. This is why, even though a team reviews film of their opponent to identify tendencies and devise a game plan to protect their own, it is absolutely critical that a team is able to make adjustments on the fly during a game. It is for this reason that running vulnerability scans in production is critical as you need to test the live system.

Running actual zero-knowledge penetration tests is like Pre-season games in the NFL. The opponent in this case is acting much more like a real attacker would and is actually exploiting vulnerabilities rather than probing and making inferences about vulnerabilities. It is as close as you can get to the real thing, except that the outcome of the game doesn't matter :)

Web application firewalls, that are running in Detection Only modes, are like trying to have a real football game but only doing two-hand touch. If you don't really try and tackle an opponent to the ground (meaning implement blocking capabilities) then you will never truly prevent an attack. Also, as most of you have seen with premiere running backs in the NFL - they have tremendous "evasion" capabilities such as spin moves and stiff-arms that make it difficult for defenders to tackle them. This is the same for web application layer attacks, WAFs need to be running in block mode and have proper anti-evasion normalization features to be able to properly prevent attacks.

It is on the live production network where all or your security preparations will pay off, or on the other hand, it is also where your web application security will crash and burn. I very seldom see development and staging areas that adequately mimic the production environment, which means that you will not truly know how your web application security will fair until it is allowed to be accessed by un-trusted clients. When your web application goes live, it is critical that your entire "team" (developers, assessemnt and operations) is focused and able to quickly respond to the unexpected behavior of clients. The problem is that these groups do not always communicate effectively and coordinate their efforts. Specifically, these three groups should be sharing their output with the other two:

Conduct code reviews on all web applications and fix the identified issues. The code reviews should be conducted when applications are initially being developed and placed into production and also when there are code changes. Any issues that can not be fixed immediately should be identified and passed onto the vulnerabilty scanning and WAF teams for monitoring and remediation.

Conduct vulnerability scans and penetration tests on all web applications. Should be conducted prior to an application going online and then at regularly scheduled intervals and on an on-demand basis when code changes are made. Any issues identified should be passed onto the Developement and WAF teams for remediation.

Deploy a Web Application Firewall in front of all web server. A WAF will provide protection in production. When the WAF identifies issues with the web application, it can provide reports back to both Development and Vulnerability Scanning teams for remediation and monitoring. It is this on-the-fly, in-game agility where a WAF shines.

Are game time adjustments always the absolute best option when they are being reviewed in film sessions the following day or on Monday by Arm-Chair Quaterbacks? Nope, but that is OK. They can be adjusted. Also, this film review will also allow for the identification of root-cause issues so that they can be fixed (source code changes) in preparation for the next game.