CVSSv3: New System, Next Problem (Exploit Reliability)

Last week in our CVSSv3 blog series, we discussed one of the bigger problems introduced by CVSSv3 related to file-based attack vectors. This week, we discuss another concern also introduced with the new version of CVSS.

Attack Complexity – Exploit Reliability / Ease of Exploitation

When studying the part of the CVSSv3 specification that describes how to assess the ‘Attack Complexity (AC)’ metric, one thing specifically stood out to us, as a bullet point states the following criteria for when to treat a vulnerability as High (AC:H):

“The attacker must prepare the target environment to improve exploit reliability. For example, repeated exploitation to win a race condition, or overcoming advanced exploit mitigation techniques.”

We previously warned the CVSS SIG about this particular criteria when providing private feedback on CVSSv3 Preview 2. While they removed some of the very problematic phrasing, they apparently decided to leave this criteria in place.

We believe that this appears to be a poor attempt at creating a score such as the Microsoft “Exploitability Index” that takes into account “Exploit Reliability”. We do not dispute that it would provide value knowing this information, but considering exploit reliability and which advanced exploit mitigation techniques need to be defeated seems to fall outside the scope of CVSS. It is especially concerning as this criteria is something we believe should not be considered in the base score.

There are a couple of reasons for us not supporting this:

  • This is more appropriate to consider in the temporal metrics. In fact, it is already there to a quite fitting degree. The temporal ‘Exploit Code Maturity (E)’ metric covers not only the availability of PoCs and exploits, but how reliable these are as well. The ‘Functional (F)’ value describes an exploit that “works in most situations”, while the ‘High (H)’ value describes an exploit that “works in every situation”. We find this to adequately cover exploitation for the purpose of CVSS without adding too much complexity or impacting the base score.
  • When thinking about how this impacts scoring it helps to clarify the concern. If someone did manage to create a reliable, fully working exploit for a given vulnerability, why should the “Attack Complexity (AC)” base metric reduce the overall score? No-one cares if the vulnerability was difficult or easy to exploit once an exploit is out and works reliably. With this current criteria, if scoring is followed as written in the guidelines, having the base score lowered and downplay the severity may ultimately result in improper prioritization of addressing the issue. Only the reliability of the exploit should have relevance to the risk rating and score, which is already factored into the temporal ‘Exploit Code Maturity (E)’ metric as already described.

Consider for a moment a real world example; your standard remote code execution vulnerability that has a score of 9.8 i.e. a “Critical” issue as deemed by CVSS. Based on the current guidelines if exploitation requires “overcoming advanced exploit mitigation techniques”, the same vulnerability is suddenly lowered to a score of 8.1 i.e. a “High” severity even if a fully functional and reliable exploit is available for both.

  • We believe that it becomes too complex to factor in if / which advanced exploit mitigation techniques e.g. ASLR, DEP, SafeSEH must be overcome to qualify. It would be too time-consuming to assess for starters, and many people responsible for scoring CVSS may not have sufficient technical insight or qualifications to properly determine it. Unless it was to be simply assumed that e.g. all “memory corruption” type vulnerabilities automatically should be assigned High (AC:H).

In fact, amusingly it seems that the CVSS SIG itself is struggling with this criteria. When looking at provided documentation they score their own buffer overflow example for Adobe Reader, CVE-2009-0658, as Low (AC:L) in example 16. That type of vulnerability would today very likely require bypassing one or more “advanced exploit mitigation techniques”, so this just adds to the confusion and muddles when and how this criteria should be considered.

Even if we completely removed the requirement to assess the bypass of advanced exploit mitigation techniques and looked at the reliability on a more general scale, there are still a lot of unanswered questions for how to assess this such as:

  1. What if an exploit is reliable 95% of the time? 80%? 50%? How do we reasonably and consistently assess this metric? Where should the line be drawn for when a vulnerability qualifies for High (AC:H)? If everything that isn’t 100% should be treated as AC:H, it is clear that quite a few issues would end up being scored lower, and the impact to an organization downplayed improperly.
  2. What if an exploit works reliably e.g. against Linux Kernel version 4.9.4, but only half the time against version 4.4.43? Does that justify Low (AC:L) or High (AC:H) i.e. should we factor in best or worst case?

Ultimately, while it would add value to consider such criteria, in our view it has no place in the CVSSv3 base score. Even if moved into the temporal metrics, it would still end up being too complex and problematic for both the reasons previously described and also the additional concerns discussed in our paper on exploitability / priority index rating systems.

What should be done?

We recommend that this whole bullet is removed immediately from the CVSSv3 specification. It is an unreasonable criteria to include. If left in, it most likely will not be honored properly during scoring or provide a back door option for anyone that wants to reduce the base score, which defeats the point of having it in the first place. The ‘Attack Complexity (AC)’ metric in our opinion should solely focus on how easy or complex it would be to launch an attack e.g. factoring in typical configurations and other conditions that may have relevance.  It should not be part of the base score, and if still desired even with the issues outlined, moved to the temporal metrics.

Until next time, when we will be discussing our concerns with the newly introduced “Scope (S)” metric.