jha-security

Web Attacks

2018-04-25T12:21:00.000-07:00

This blog post was contributed by Vaibhav Rastogi.

The web is one of the most common interfaces between an organization and the outside world and so web attacks, or attacks on web applications, are a fairly frequent attack scenario. They have been studied for decades, projects such as OWASP Top Ten have been there to create awareness about these attacks, and there are numerous tools, which can be used to detect and mitigate common web application vulnerabilities. Here, we outline some of the common categories of attacks on web applications.

Injection attacks

Such attacks happen when untrusted data is incorporated into the server-side application logic without proper sanitization. These attacks can use a variety of vectors: for example, unsanitized input can make its way into a SQL query to result in a so-called SQL injection. Similar attacks can result with injection into noSQL database queries, and into server-side scripts (e.g., a PHP script that evaluates some untrusted input, which then gets executed as code).

These attacks can result in exposure of confidential data or a compromise of data integrity. Ways to prevent these attacks involve proper placement of sanitizers in the server-side logic so as to never let unsanitized, untrusted data reach places where it could execute as part of application logic.

Cross-Site Scripting (XSS)

Cross-site scripting is similar to injection attacks in that the some untrusted input is interpreted as the application logic, but this time the attack happens in the browser rather than the server. Specifically, the attack happens when an attacker injects into the webpage markup data, that is interpreted by the browser as a script (typically, Javascript) and executed.

As an example, consider an attacker inserting a script element (Javascript code enclosed in <script></script> tag) in an input field that requires their name. Upon submission, the data in the field is then stored in a server-side database. An unsuspecting user viewing the names of all people in the database through a web interface then gets the attack script in lieu of the attacker's name. This attack script is then executed by the browser.

Cross-site scripting can again lead to compromise of confidentiality and integrity. The primary way of preventing cross-site scripting to insert sanitizers that modify the inputs in ways such that the browser does not interpret them as executable code. Another line of defense comes from the so called content security policies in web browsers. These policies specify where content (including executable content) can come from (e.g., from specific domains or specially marked-up elements). The browser then ignores any content that has not been white-listed in this manner. We will discuss CSP in greater detail in a future blog post.

Authorization and Authentication Flaws

This category represents several scenarios. For example, an object may be publicly accessible though a URL while it should have been accessible only to authenticated users, who have the right privileges. Another example could be bypassing highjacking an authenticated session by merely using the URL of the authenticated session. To elaborate, suppose a session is authenticated through a session ID, which is present in the URL. An attacker could entice a victim to send them an authenticated session URL and use the session ID-containing URL to get the victim's session. Failure to enforce strong passwords or the use of insecure authentication practices, e.g., authenticating a user based on secret question/answers in lieu of passwords are also authentication flaws.

Man-in-the-middle Attacks

With the client-server communication happening over an unencrypted channel, the data is open to sniffing and spoofing attacks while in transit. This can most easily be fixed with by using HTTP over TLS or HTTPS. HTTPS encrypts the communication channel and provides authentication of the server (i.e., the connected example.com is indeed example.com) and integrity and confidentiality of the data. TLS configuration errors can sometimes result in weakened security. It is therefore important to choose a strong default configuration for your web server.

Cross-Site Request Forgery (CSRF)

In this, the attacker exploits the browser to send a request to the server on victim's behalf. The victim is typically logged-in into the website and so the request is processed in an authenticated context. For example, consider a user logged into their bank website and in a separate browser tab clicking on an attack link that directs the bank website to transfer money from the victim's account to the attacker's account. The main issue here is that the server has no way to tell if the victim created this request or the attacker did it from the victim's browser.

Common ways of dealing with CSRF attacks include the server sending a nonce (a random, unpredictable token) to the client, which is then embedded in every legitimate request to the server. Any attacker-created request will not have the nonce and hence can be rejected by the server.

The above is only a very short summary of the attacks. There are many details we have skipped, including variations on these attacks, and how to prevent them effectively. We will cover some of these attacks in detail in future posts.

Semantic Adversarial Machine Learning (SAML)

2018-04-20T09:21:00.000-07:00

ML is everywhere: Fueled by massive amounts of data, models produced by machine-learning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyberphysical systems (CPS), such as self-driving cars and aviation, where an adversary can cause serious consequences.

Adversarial ML (AML) deals with generating adversarial examples to ML
algorithms (e.g modifying a stop sign slightly so that it is classified as a
yield sign). For a general description of AML see here

Semantic Adversarial Machine Learning: However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the semantics and context of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall
system.

In my recent paper with co-authors (Tommaso Dreossi and Sanjit Seshia from
Berkeley) we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.

Interested in reading? See https://arxiv.org/abs/1804.07045

Note: Have feedback. I would love to hear it. Please email at jha@cs.wisc.edu

Malware Trend in 2007

2008-06-06T07:49:00.000-07:00

I read the report IBM Internet Security System X-Force 2007 Trend
Statistics. This is a report describing trends for various threats in 2007.
This team has been tracking trends since 2000. I found the report
to be quite interesting. In the rest of this post, I highlight some
of the interesting points from the report and what they mean in the
context of malware detection.

(I) The X-Force team reports continued growth in Web browser exploitation. This
clearly shows that the infection vector is changing to the Web. Earlier
the primary infection vectors were email and the network. Therefore,
for detecting malware, drive-by-downloads (DBD) and other threats targeted at hacking through the Web browser need a lot of attention.

(II) X-Force also reports a marked increase in obfuscated exploits, i.e.,
exploits that use various code obfuscation techiques (such as encryption).
Here is a quote, "X-Force estimated that nearly 80 percent of Web exploits
used obfuscation and/or self decryption ... By the end of 2007, X-Force
believed this rate had reached 100 percent, ...". This means that going
forward, Web exploits will increasingly harbor indiscernible code rending signature-based techniques less effective. Advanced
techniques (such as behavior-based detection) are clearly needed to detect
such malware. To exacerbate the situation the X-Force report stated that
there was a 30% increase in new malware samples in 2007 over 2006. This
further drives home the point that signature-based detectors will have trouble
in keeping up with the number of malware as they cannot detect new threats.

(III) There was another very interesting point made by the report. Modern
malware use features from various types of classic malware (such as viruses, worms,
and spyware) by pulling the successful features of each into new strains. To quote the report, "Modern malware is now the digital equivalent
of the Swiss Army knife, and 2007 data continues to support this." This trend
also indicates that the behavior of malware is becoming more sophisticated, which
again supports my claim that detection techniques based on analyzing behavior are
better suited to handle malware of the future. Another interesting tidbit from the
report: "Trojans make up the largest class of malware in 2007 as opposed to downloaders,
which were the largest category in 2006." Recall that a Trojan appears to be a
legitimate file with some hidden functionality (for example, that of a rootkit).
Trojans are historically a problematic class of malware for signature-based
detection.

Overall, I found the report to be very interesting. Read it for yourself.
You can find the report here.

Zero Day Threat by Acohido and Swartz

2008-04-23T15:09:00.000-07:00

I read the book Zero Day Threat (ZDT) by Byron Acohido and Jon Swartz. I really liked the book! Zero Day Threat is about the underground cyber-economy. It makes some surprising points grounded in real truths. I liked that the book paints a complete picture, i.e., how malware,
identity theft, and "drop off" gangs collaborate to facilitate
a well oiled cyber-economy. Since my research area is security,
I was very familiar with the different types of malware brought up in Zero Day Threat. However, this book gave me a complete picture of the problem.

I particularly appreciated two features of the book:

Structure: Each chapter is broken into three sections: exploiters,
enablers, and expeditors. Exploiter sections focus on crooks (such
as scam artists and drug addicts) and how they benefit from the
underground economy. The Enablers sections focus on credit card
companies, banks, and credit bureaus, and how their current practices
enable the underground cyber-economy. Expediters
are guys (good and bad) that allow the cybercrooks to exploit
vulnerabilities in an expeditious manner. I thought this structure
was just brilliant! It really brings out the correlation between
various factors and actors that enable the underground cyber-economy.

Narrative Style: I really enjoyed various anecdotes in the book.
There are several stories about people being scammed or getting
lured into the profitable cyber-underground. For example, there is a story of
a "drop off" gang in Edmonton which is narrated throughout the
book. These anecdotes make the book very interesting and provide
a "human side" to the cyber-underground.

I highly recommend this book.

Botnets in USA Today

2008-03-19T10:54:00.000-07:00

I got a call from Byron Acohido over at the USA Today last weekend,
and we had an interesting talk about botnets. Byron and Jon Swartz ended
up writing an article about botnets which appeared as the cover story
in the Money section of the USA Today on March 17, 2008. Here's a link to the full
story (link). I found the entire article to be a fascinating read
on the nature of botnets. Here are some of the highlights, but
definitely go and read the entire article.

On a typical day, 40% of the 800 million computers connected to the Internet are bots engaged in various nefarious activities, such as spamming, stealing sensitive data, and engaging in denial-of-service attacks. Think about it. Approximately 320 million computers are engaged these illicit actiivities!
Later on in the article they describe various features of Storm, the state-of-the-art for botnets. Storm introduced various innovations into the bot landscape, such as using P2P style communication to converse with the bots and encrypting the command-and-control (C&C) traffic. Command-and-control is the traffic from the bot-herder to the bots instructing them to perform various nefarious activities. Note that this means that various network-based botnet solutions that simply look for centralized C&C communication will not work. Moreover, encrypted traffic is a major problem for the network-based solutions. See my earlier blog where I argue that we should move to a cooperative solution. This is looking like a very good idea. Storm also has a self-defense mechanism, i.e., anyone trying to probe the botnet is punished with a denial-of-service attack. I found this self-defense mechanism of Storm to be very interesting.

Overall a fascinating article!
I plan to drop by Byron's book signing at the RSA Conference in San
Francisco on April 7th. Byron also has an interesting blog which is related to the
material in the book.

Model Checking and Security

2008-03-05T12:23:00.000-08:00

Model checking is a technique of verifying temporal properties of finite-state systems. One of
the attractive features of model checking over other techniques (such as theorem proving)
is that if a property is not true, a model checker provides a counter-example which
explains why the property is not true. Inventors of model checking, Edmund Clarke,
Allen Emerson, and Joseph Sifakis, won the 2008 ACM Turing award (see the announcement here). I have a personal connection to two of the recipients. Edmund Clarke was my adviser
at Carnegie Mellon, and Allen Emerson and I have collaborated on few projects and he
has supported me through out my career.

In this note I try to summarize various applications of model checking to security.

Protocol verification: Protocols in the realm of security (henceforth referred to
as security protocols) are very tricky to get correct. For
example, flaws in authentication protocols have been discovered several years after they have been published. Techniques based on model checking have been extensively used to verify these protocols. The tricky part in applying these techniques for verifying security protocols is
modeling the capabilities of the attacker. Gavin Lowe used the FDR model checker to find
a subtle attack on the Needham-Schroeder authentication protocol (this
publication can found here). Following Lowe's work there
was a flurry of activity on this topic. Interested readers can look at the proceedings of
the Computer Security Foundations Symposium (CSF).

Vulnerability assessment: Imagine you are given an enterprise network with various components (firewalls, routers, and Intrusion Prevention Systems (IPSs)).
Vulnerability assessment tries to ascertain how an attacker can penetrate the specified network. Vulnerability assesment is crucial in updating policies of various security appliances (such as firewalls and IPSs) and ascertaining the risk of various decisions. Traditionally, vulnerability assessment has been performed by red teams. Red teaming is a very valuable activity but can provide no guarantees that the entire state space of vulnerabilities has been explored. I along with (Oleg Sheyner and Jeannette Wing) explored techniques based on model checking for vulnerability assessment. We formally specify the network and express the negation of the attackers goal (e.g., attacker gets root access on a critical server) as a property to be verified. If
the specified network is vulnerable, then the model checker will output a counter-example (which is an attack on the network). The innovation we devised was to output the set of all
counter-examples or attacks as an attack graph, which is succinct representation of all attacks on the network. Analysis of the attack graph can provide a basis for vulnerability assessment. This paper can be downloaded here.

Other applications: There are several problems in security that can be addressed using model checking. For example, I and Tom Reps have used model checking properties to analyze properties of security policies in trust-management systems. Ninghui Li and his collaborators have used techniques based on model checking to analyze several classes of security properties.
In the context of security, the advantage model checking has over other techniques (such as
testing) is that it exhaustively covers the state-space. After all, if you just have one
vulnerability, an attacker will exploit that vulnerability, i.e., an attacker just needs one
door to get through your system. Thus the completeness guarantee that a model checker provides is very valuable in the context of security.

Cooperating Detectors

2008-02-13T13:23:00.000-08:00

A malware detector tries to determine whether a program is malicious (examples
of malicious programs are drive-by-downloads, botnets, and keyloggers).
Malware detection is primarily performed at two vantage points: host and
network. This post explains why cooperation between host-based and network-
based detectors is a good thing.

Traditionally, detection has been performed either at the network or host level, but
not both. First, let me examine both approaches separately.

A network-based detector monitors events by examining a session or
network flow and tries to determine whether it is malicious. The
advantage of a network-based detector is ease of deployment -- there
are not that many points of deployment for a network-based detector
(typically they are deployed behind border routers).

Unfortunately, network-based detectors have a limited view of each
network session. In fact, if a session happens to be
encrypted such as is common with VPNs, Skype, and some bots, a
network-based detector is essentially blind. For example, a botmaster
can hide its communication with the bots by simply encrypting the session.

By contrast, host-based detectors have a more comprehensive view of system activities, i.e.,
they have the potential to observe every event at the host, including malicious ones. However, the major drawback of a host-based detector is that it has to be widely deployed. Typically in a managed
network (such as in an enterprise), a host-based detector has to be deployed at
every host.

Cooperation between host-based and network-based detectors can potentially
address the shortcomings of each detector. I've come up with three possible scenarios.

1) Host-based detector helping the network-based detector.
A network-based detector can pull alerts from a host-based
detector and a host-based detector can push alerts to a network-based
detector. This is a simple solution and I suspect the easiest
scenario for cooperation.

2) Queue up suspicious activity on a virtual machine.
If a network-based detector determines that a session is
"suspicious," it can divert the suspicious traffic to a virtual machine
with a host-based detector for more in-depth analysis. The trick here
is figuring out what events are indeed "suspicious" (you do not want
too much traffic to go through the "slow path" corresponding to a
host-based detector). There is already a
startup called Fireeye adapting this solution. I find this line of work quite intriguing.

3) Pushing signatures.
This third scenario has been explored quite thoroughly in academic
literature. It involves the cooperation of host-based and network-based detectors to
push signatures for malware in real-time. For example, if a host-
based detector recognizes an attack, it pushes out a signature to a
network-based detector. The advantage of this
approach is that, by updating a network-based detector, an entire
enterprise can be protected against that particular threat. However, in my
view this is not a good approach in the long run. Hackers are creating malware variants at
an alarming rate and signatures won't be able to keep up.

Case for kernel-level detection

2008-01-30T13:28:00.000-08:00

Why kernel-level detection?
These are my thoughts on why malware detection should performed at the
kernel level. In general, the lower in the system hierarchy your
detector resides, the harder it is for an attacker to evade your detector.
For example, if a detector uses system-call interposition, an attacker can
evade this system by directly using kernel calls. For example,
system-call interposition can be done on Windows using the following
package. In my conversations with
a guy from NSA (name withheld for obvious reasons:-)) he confirmed that
new malware they are observing in their lab are using kernel calls directly.
Also, look at the following article

The semantic-gap problem:
A natural question that comes to mind is: why not perform detection at even a lower layer
in the heirarchy? Say the VM layer or even better at hardware. As you move
down in the system hierarchy, you lose some high-level semantics. Let me explain.
Lets say you are doing detection at the VM layer. A high-level event (such as
opening a file) manifests itself as a sequence of events (such as writing to
memory page or an interrupt). In other words, there is a gap between
the events you observe at the VM level and the corresponding high-level event. To my knowledge
the "semantic gap" issue was first articulated in the following paper:

Peter M. Chen, Brian D. Noble, "When virtual is better than real",
Proceedings of the 2001 Workshop on Hot Topics in Operating Systems (HotOS),
May 2001.
The paper can be downloaded at the following site.

As you move down in the hierarchy, the semantic gap problem becomes harder. The
semantic gap problem still exists at the kernel level, but it is more tractable
than at the other layers. Therefore, I think kernel-level detection hits
the "sweet spot". Implementing detectors at kernel level is harder than other
approaches (such as system-call interposition), but then everything good in life takes
effort:-) I strongly believe that detectors that use system-call interposition are very
easy to evade, and so what is the point in having them. The next generation of malware
will definitely use kernel calls directly.