Anyway, having an idea is one thing, to realize it is completely another. And in this paper, the authors did both very well! In short, it is an excellent paper with a lot of information to digest! So, I strongly recommend anyone who's in security field to study it carefully. I'll put here some notes what I found novel and/or interesting while I was reading it. Note that for someone else, something else in the paper may be interesting or novel, and thus this post is definitely not replacement for reading the paper yourself. Also, if you search a bit on the Internet you'll find that others also covered this paper.
Contributions
The contributions of this paper are:
- Analysis of dynamics and characteristics of zero-day attacks, i.e. how long it takes before zero-day attacks are discovered, how many hosts are targeted, etc.
- A method to detect zero-day attacks based by correlating anti-virus signatures of malicious code that exploits certain vulnerabilities with a database of binary file downloads across 11 million hosts on the Internet.
- Analysis of impact of vulnerability disclosure on number of attacks and their variations. In other words, what happens when new vulnerability is disclosed, how exactly does that impact number and variations of attacks.
The key finding of this research is that zero day attacks are discovered, on average, 312 days after they first appeared. But in one case it took 30 months to discover the vulnerability that was exploited. Next finding is that zero day attacks, by themselves, are quite targeted. There are of course exceptions, but majority of them hit only several hosts. Next, after vulnerability is disclosed there is a surge of new variants of exploits as well as number of attacks. The number of attacks can be five orders of magnitude higher after they've been disclosed than before.
During their study, the authors found 11 not previously known zero-day attacks. But be careful, it isn't a statement that they found vulnerabilities now previously known. It means there are known vulnerabilities, but up to this point (i.e. this research) it wasn't know that those vulnerabilities were used for zero-day attacks.
So, here is my interpretation of implications of these findings. This means that currently there are at least dozen exploits in the wild no one is aware of. So, if you are a high profile company, this means that you are in a serious trouble. Now, as usual, everything depends on many things are you, or will you, be attacked. Next, when there is a disclosure of a vulnerability and there is no patch available, you have to be very careful because at that point there is a surge of attacks.
Data and Code used for research
In experimental research, data is of utmost importance. If you have data, then you can experiment on it, otherwise, you can only theoretize. Furthermore, the more data you have, the better experiment can be done. Personally, I believe that the access to data is one of the main differentiation that makes productive researchers. So, it is interesting to see what sources of data the authors used, and is this data available to others.
So, reading the paper the authors state that they used the following sources:
- Worldwide Intelligence Network Environment (WINE).
- Open Source Vulnerability Database (OSVDB) - public database with information about vulnerabilities starting going back to 1998. The authors use it to take discovery, disclosure and exploit release days.
- Symantec Threat Explorer - Publis site with data about latest threats, risks and vulnerabilities. It also has historical data available, what was the primary reason auhtors used it in this research.
- Symantec data set with dynamic analysis results for malware samples
The authors restricted their research on Windows vulnerabilities, which I believe, is caused by the fact that WINE probably collects data mainly from Windows hosts. If there are Linux, or some other operating system, then there's probably a lot less data available. Also, as the authors themselves state, Windows is a primary platform for attacks so, again, much more data will be available.
Method
The idea underlying the method used by authors is simple: Look at known vulnerabilities and try to see when they were for the first time exploited. If this exploit time is earlier than disclosure date, then it's zero-day exploit.
In order to realize this idea the authors go through several steps:
- First, they search for potential vulnerabilities they will analyze. They do that by querying OSVDB, Microsoft and Adobe Bulletins. The key information that the authors get from those databases are CVE identifiers of vulnerabilities and their disclosure date. Here we get the following set:
cve_idi = {Tdiscovery_time, Tdisclosure, Texploit_relase, Tpatch_release}
- Then, the authors query Symantec's Threat Explorer using CVE numbers. Threat Explorer provides Symantec's ID of a threats (i.e. mallicious code) that exploited given vulnerabilities. This ID will be used to connect vulnerabilities with other Symantec's databases. Note: I was unable to find virus id in Threat Explorer! Did I miss something? Anyway, after this step we have the following set:
Zi = {virus_idi, cve_idi}
- The authors now use anti-virus telemetry data to link virus_id with exploits, that is, binary files that contain exploits. Since, whenever anti-virus product detects virus, it records which virus was detected (virus_id) and also hash of the file in which the virus was detected. This hash is important and used in the fifth step.
Eij = {virus_idi, file_hash_idj}
- This is an optional step. Namely, the problem that authors have is that lately there are more and more attacks that don't rely on executable files, but are embedded in different data files. There are signatures in anti-virus telemetry database of those files, but there are no data in binary reputation database to track them when they first appeared. So, assumption is that after non-binary data is downloaded and it compromises a machine, then some binary data is downloaded. In this step, the authors try to identify that downloaded binary file which then can be tracked in binary reputation database. To implement this idea, the authors looked in dynamic analysis data (part of Symantec's Threat Explorer database) which provides information what was downloaded after the successful attack. Authors note that this actually doesn't imply that what was downloaded is really consequence of successful attack so this is the reason why this is an optional step! Note: I was unable to find this data Threat Explorer!
- Now that the authors have hash of malicious file, the go to binary reputation database and they look what is the earliest time this file was detected on the Internet. The time is important because if it is earlier than disclosure time, it means zero-day attack was detected!
Because the authors are also interested in attack intensity, they also collect additional information about each occurrence of malicious file in binary reputation data. This is used to analyze what happens after disclosure, i.e. how many attacks are there and how many variants.
Only vulnerabilities that have CVE assigned were analyzed.
Only Symantec customers that opted in for data collection. No research of those that don't have AV software, or use software from other vendors.
There is possibility of underestimation of how long zero-day attacks last because for some attacks that were found closer to a beginning of a research period it could be that they appeared even earlier that the recording started.
Lately more and more attacks are embedded into non-executable files, e.g. pdf, xlsx, doc, etc. Which might skew results. The authors state that binary reputation database, starting from late 2011, records also hashes of non-executable files, but this was to late for this study to take into account.
Symantec Internet Threat Report lists 31 zero-day vulnerabilities, of those, the method devised and used by the authors found only 7. The authors analyzed why the rest 24 were not found and the reasons are:
- Web attacks are not covered by this research, but they count for significant number of zero-day attacks. Binary reputation data and anti-virus telemetry monitor only host based attacks.
- Polymorphic malware makes binary data hashes completely different and it is very hard to correlate those with cve_ids they exploit.
- Then, there are non-executable exploits already mentioned.
- Zero day exploits are targeted attacks and it is likely they attack someone that isn't participating in binary reputation database.
Some interesting side info
First of, the following graph (source in dot language) neatly defines life-cycle of zero-day vulnerabilities, i.e. what might happen from the day they are discovered till the day they are remediated:
Actually, this is a graph that shows potential lifecycle of each unknown vulnerability in the code. The state Vulnerability is instantiated the moment programmer has made a mistake in the code. Note that this graph is valid for each vulnerability in some specific code base but not in some particular instance of this code base that exists at some user, who, for example, forgot to patch his software. The moment the patch is created that fixed vulnerability, the instance of the graph for that particular vulnerability goes into state Remediation, no matter that there is a lot of unpatched code on the Internet.
The branch testing means that software vendor found the problem and, based on that, produced a patch. Exploit branch, on the other hand, means that blackhat found the vulnerability and started a zero-day attack. Note that blackhats also analyze patches and based on that they try to figure out what vulnerability was corrected and if it's possible to exploit it. After public dissemination of information about vulnerability, AV vendors can include signatures for their customer protections. Note that AV vendors actually track exploits, not vulnerability! This public dissemination of vulnerabilities also allows vendors to react and create patch.
The previous graph shows states of a vulnerability that can generate a lot of different sequences. One of the sequences, when blackhats discover vulnerability first, is described by the following sequence of steps:
- Vulnerability introduced (tv)
- Exploit released in the wild (te)
- Vulnerability discovered by the vendor (td)
- Vulnerability disclosed publicly (t0)
- Anti-virus signatures released (ts)
- Patch released (tp)
- Patch deployment completed (ta)
Attackers, when they find new vulnerabilities, try to maximize their benefit. This means that they wait for a right moment to use vulnerability for zero-day attack. Of course, if it happens that someone else discovered vulnerability too, then they have to use it as soon as possible.
Vulnerabilities markets
Vulnerability markets are very interesting, and I already saw papers trying to analyze this black market. There is info about prices and also investigate data and also papers that deal with insider stuff. I'll have to investigate this deeper in future. This topic is interesting because it shows how easy (or hard) is for someone to get zero day vulnerability and attack its target!
Literature
What follows is copy of references from the paper along with links to on-line versions as well as some of my quick comments.
- Adobe Systems Incorporated. Security bulletins and advisories, 2012.
- R. Anderson and T. Moore. The economics of information security. In Science, vol. 314, no. 5799, 2006. (pdf)
- W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of vulnerability: A case study analysis. IEEE Computer, 33(12), December 2000.
- A. Arora, R. Krishnan, A. Nandkumar, R. Telang, and Y. Yang. Impact of vulnerability disclosure and patch availability - an empirical analysis. In Workshop on the Economics of Information Security (WEIS 2004), 2004.
- S. Beattie, S. Arnold, C. Cowan, P. Wagle, and C. Wright. Timing the application of security patches for optimal uptime. In Large Installation System Administration Conference, pages 233–242, Philadelphia, PA, Nov 2002.
- J. Bollinger. Economies of disclosure. In SIGCAS Comput. Soc., 2004.
- D. Brumley, P. Poosankam, D. X. Song, and J. Zheng. Automatic patch-based exploit generation is possible: Techniques and implications. In IEEE Symposium on Security and Privacy, pages 143–157, Oakland, CA, May 2008.
- H. C. H. Cavusoglu and S. Raghunathan. Emerging issues in responsible vulnerability disclosure. In Workshop on Information Technology and Systems, 2004.
- D. H. P. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos. Polonium : Tera-scale graph mining for malware detection. In SIAM International Conference on Data Mining (SDM), Mesa, AZ, April 2011.
- CVE. A dictionary of publicly known information security vulnerabilities and exposures, 2012.
- N. Falliere, L. O’Murchu, and E. Chien. W32.stuxnet dossier, February 2011.
- S. Frei. Security Econometrics: The Dynamics of (In)Security. PhD thesis, ETH Zurich, 2009.
- S. Frei. End-Point Security Failures, Insight gained from Secunia PSI scans. Predict Workshop, February 2011.
- Google Inc. Pwnium: rewards for exploits, February 2012.
- A. Greenberg. Shopping for zero-days: A price list for hackers’ secret software exploits. Forbes, 23 March 2012.
- [16] A. Lelli. The Trojan.Hydraq incident: Analysis of the Aurora 0-day exploit, 25 January 2010.
- [17] R. McMillan. RSA spearphish attack may have hit US defense organizations. PC World, 8 September 2011.
- M. A. McQueen, T. A. McQueen, W. F. Boyer, and M. R. Chaffin. Empirical estimates and observations of 0day vulnerabilities. In Hawaii International Conference on System Sciences, 2009.
- Microsoft. Microsoft security bulletins, 2012.
- C. Miller. The legitimate vulnerability market: Inside the secretive world of 0-day exploit sales. In Workshop on the Economics of Information Security, Pittsburgh, PA, June 2007.
- OSVDB. The open source vulnerability database, 2012.
- A. Ozment and S. E. Schechter. Milk or wine: does software security improve with age? In 15th conference on USENIX Security Symposium, 2006.
- P. Porras, H. Saidi, and V. Yegneswaran. An anlysis of conficker’s logic and rendezvous points, 2009.
- Qualys, Inc. The laws of vulnerabilities 2.0, July 2009.
- T. Dumitraș and D. Shou. Toward a standard benchmark for computer security research: The Worldwide Intelligence Network Environment (WINE). In EuroSys BADGERS Workshop, Salzburg, Austria, Apr 2011.
- E. Rescorla. Is finding security holes a good idea? In IEEE Security and Privacy, 2005.
- U. Rivner. Anatomy of an attack, 1 April 2011, Retrieved on 19 April 2012.
- SANS Institute. Top cyber security risks - zero-day vulnerability trends, 2009.
- B. Schneier. Cryptogram Newsletter - Full disclosure and the window of exposure, September 2000.
- B. Schneier. Locks and full disclosure. In IEEE Security and Privacy, 2003.
- B. Schneier. The nonsecurity of secrecy. In Commun. ACM, 2004.
- M. Shahzad, M. Z. Shafiq, and A. X. Liu. A large scale exploratory analysis of software vulnerability life cycles. In Proceedings of the 2012 International Conference on Software Engineering, 2012.
- Symantec Corporation. Symantec global Internet security threat report, volume 13, April 2008.
- Symantec Corporation. Symantec global Internet security threat report, volume 14, April 2009.
- Symantec Corporation. Symantec global Internet security threat report, volume 15, April 2010.
- Symantec Corporation. Symantec Internet security threat report, volume 16, April 2011.
- Symantec Corporation. Symantec Internet security threat report, volume 17, April 2012.
- Symantec Corporation. Symantec threat explorer, 2012.
- Symantec.cloud. February 2011 intelligence report, 2011.
(See here for a full list of available reports) - N. Weaver and D. Ellis. Reflections on Witty: Analyzing the attacker. ;login: The USENIX Magazine, 29(3):34–37, June 2004.
The authors observed a number of intrusions happening during each phase of vulnerability livecycle. They showed that even after vendors provide patches, there is still significant number of successful attacks meaning that many users don't patch their machines.
The authors in this study find that 10% of patches have problems of their own.
This paper analyzes possibility of automatically generating exploit given some application and a patch that corrects some unknown vulnerability in the unpatched application. Even though the method isn't so general, and it has problems of its own, it shows that it is possible to some extent to automatically generate exploits. This actually means that in security analysis, which is based on worst case behavior, we should take into account this as possible.
In this work, the author analyzes publicly available vulnerability and exploit databases and gets different statistics from it. For example, he concludes that on disclosure day, 15% of vulnerabilities have exploits available. He also compared how fast Microsoft and Apple react on new vulnerabilities and shows that there are significant differences between them. Still, both have unpached vulnerabilities even 180 days after the vulnerability is disclosed. Frei also looked at who discovered vulnerability and determined that between 2000 and 2007 10% of vulnerabilities were discovered through programs that pay whitehats.
This PhD Thesis seems not to be available on-line, but you have to buy it.
I couldn't manage to find this reference on the Internet. It is supposed to talk about failure of patch management, i.e. many vulnerabilities at the time of their disclosure don't have patches. Furthermore, users have to take care of 14 update mechanisms on a machine. All this means that there is a big problem that needs to be solved.
Google pays to whitehats for discovery of each vulnerability that allows compromise of its browser.
A paper that gives a glimpse on black market that trades with vulnerablities. This includes prices of vulnerabilities for a certain products.
The authors try to estimate the real number of zero-day exploits that existed in the past.
Public database that aggregates all the available sources of information about vulnerabilities that have been disclosed from 1998.
WINE is developed by Symantec Research Labs with the aim of sharing field data with research community. The data is collected from about 11 million hosts on the Internet that have Symantec products installed and for which their users agreed to participate in data collection process.
Because of privacy concerns, in order to access this data you have to sign NDA and be prepared to visit Symantec location as data is not sent outside of Symantec. Details can be found on this Web page (scroll down to How to participate section).
First definition and discussion about the term window of exposure.
Similar study to the one done by S. Frei [13] but on a larger set of data.
This report is referenced because it provides information that starting from 2008 till 2011. there were 43 identified zero-day exploits. Some of them, discovered in this research are not counted in those 43, while the research itself didn't find all of zero-day exploits.
1 comment:
For me the best thing about research is that it's collaborative.
________________
essaywriter.co.uk
Post a Comment