Saturday, December 5, 2009

Google Hacking Basics

0 Summary

0 Solutions Fast Track

0 Frequently Asked Questions

Introduction

A fairly large portion of this book is dedicated to the techniques the "bad guys" will use to locate sensitive information. We present this information to help you become better informed about their motives so that you can protect yourself and perhaps your customers. We've already looked at some of the benign basic searching techniques that are foundational for any Google user who wants to break the barrier of the basics and charge through to the next level: the ways of the Google hacker. Now we begin to look at the most basic techniques, and we'll dive into the weeds a bit later on.

For now, we'll first talk about Google's cache. If you haven't already experi­mented with the cache, you're missing out. We suggest you at least click a few various cached links from the Google search results page before reading further. As any decent Google hacker will tell you, there's a certain anonymity that comes with browsing the cached version of a page. That anonymity only goes so far, and there are some limitations to the coverage it provides. Google can, however, very nicely veil your crawling activities to the point that the target Web site might not even get a single packet of data from you as you cruise the Web site. We'll show you how it's done.

Next, we'll talk about directory listings.These "ugly" Web pages are chock full of information, and their mere existence serves as the basis for some of the more advanced attack searches that we'll discuss in later chapters.

To round things out, we'll take a look at a technique that has come to be known as traversing: the expansion of a search to attempt to gather more informa­tion. We'll look at directory traversal, number range expansion, and extension trolling, all of which are techniques that should be second nature to any decent hacker—and the good guys that defend against them.

Anonymity with Caches

Google's cache feature is truly an amazing thing.The simple fact is that if Google crawls a page or document, you can almost always count on getting a copy of it, even if the original source has since dried up and blown away. Of course the down side of this is that hackers can get a copy of your sensitive data even if you've pulled the plug on that pesky Web server. Another down side of the cache is that the bad guys can crawl your entire Web site (including the areas you "forgot" about) without even sending a single packet to your server. If your Web server doesn't get so much as a packet, it can't write anything to the log files.

(You are logging your Web connections, aren't you?) If there's nothing in the log files, you might not have any idea that your sensitive data has been carried away. It's sad that we even have to think in these terms, but untold megabytes, giga­bytes, and even terabytes of sensitive data leak from Web servers every day. Understanding how hackers can mount an anonymous attack on your sensitive data via Google's cache is of utmost importance.

Google grabs a copy of most Web data that it crawls.There are exceptions, and this behavior is preventable, as we'll discuss later, but the vast majority of the data Google crawls is copied and filed away, accessible via the cached link on the search page. We need to examine some subtleties to Google's cached document banner.The banner shown in Figure 3.1 was gathered from www.phrack.org.

This is G o a [ e's cache of http://www.phrack.org/ h ard c ov er62/ as retrieved on Sep 3, 2004 21:44:24 GMT.

Go g I e's cache is the snapshot that we took of the page as we crawled the web.

The page may have changed since that time. Click here for the current page without highlighting.

This cached page may reference images which are no longer available. Click here for the cached text only.

To link to or bookmark this page, use the following url: httpt//w™.googia.cWaaaichv

g=cache: 17?nt KDMrMIJ : www. phr ac~M . or g/hardcover 62 / + + BJ. te: www. phracK. org+phrac\+ -show, ptipshl=en



If you've gotten so familiar with the cache banner that you just blow right past it, slow down a bit and actually read it. The cache banner in Figure 3.1 notes,"This cached page may reference images which are no longer available." This message is easy to miss, but it provides an important clue about what Google's doing behind the scenes.

To get a better idea of what's happening, let's take a look at a snippet of tcp-dump output gathered while browsing this cached page. To capture this data, tcp-dump is simply run as tcpdump —w.Your installation or implementation of tcpdump might require you to also set a listening interface with the —i switch. The output of the tcpdump command is shown in Figure 3.2.

Figure 3.2 Tcpdump Output Gathered While Viewing a Cached Page



21:39:24.648422


IP


192.168.2.32.51670


>


64.233.167.104.80

21:39:24.719067


IP


64.233.167.104.80


>


192.168.2.32.51670

21:39:24.720351


IP


64.233.167.104.80


>


192.168.2.32.51670

21:39:24.731503


IP


192.168.2.32.51670


>


64.233.167.104.80

21:39:24.897987


IP


192.168.2.32.51672


>


82.165.25.125.80

21:39:24.902401


IP


192.168.2.32.51671


>


82.165.25.125.80

21:39:24.922716


IP


192.168.2.32.51673


>


82.165.25.125.80

21:39:24.927402


IP


192.168.2.32.51674


>


82.165.25.125.80

21:39:25.017288


IP


82.165.25.125.80


>


192.168.2.32.51672

21:39:25.019111


IP


82.165.25.125.80


>


192.168.2.32.51672

21:39:25.019228


IP


192.168.2.32.51672


>


82.165.25.125.80

21:39:25.023371


IP


82.165.25.125.80


>


192.168.2.32.51671

21:39:25.025388


IP


82.165.25.125.80


>


192.168.2.32.51671

21:39:25.025736


IP


192.168.2.32.51671


>


82.165.25.125.80

21:39:25.043418


IP


82.165.25.125.80


>


192.168.2.32.51673

21:39:25.045573


IP


82.165.25.125.80


>


192.168.2.32.51673

21:39:25.045707


IP


192.168.2.32.51673


>


82.165.25.125.80

21:39:25.052853


IP


82.165.25.125.80


>


192.168.2.32.51674


Let's take apart this output a bit. On line 1, we see a Web (port 80) connec­tion from 192.168.2.32, our Web browsing machine, to 64.233.167.104, one of Google's servers. Lines 2 and 3 show two response packets, again from the Google server.This is the type of traffic we should expect from any transaction from Google, but beginning on line 5, we see that our machine makes a Web (port 80) connection to 82.165.25.125.This is not a Google server, and if we were to run an nslookup or a host command on that IP address, we would dis­cover that the address resolves to a15151295.alturo-server.de.The connection to this server can be explained by rerunning tcpdump with more options specifically designed to show a few hundred bytes of the data inside the packets as well as the headers.The partial capture shown in Figure 3.3 was tcpdump -Xx -s 500 -nand shift-reloading the cached page. Shift-reloading forces most browsers to con­tact the Web host again, not relying on any caches the browser might be using. Lines 1 and 2 show that we are downloading (via a GET request) an image file—specifically, a JPG image from the server. Line 3 shows the Host field, which specifies that we are talking to the www.phrack.org Web server. Because of this Host header and the fact that this packet was sent to IP address 82.165.25.125.80, we can safely assume that the Phrack Web server is virtually hosted on the phys­ical server located at 82.165.25.125:80.This means that when we viewed the cached copy of the Phrack Web page, we began pulling images directly from the Phrack server itself. If we were striving for anonymity by viewing the Google cached page, we just blew our cover! Furthermore, lines 6—12 show that the REFERER field was passed to the Phrack server, and that field contained a URL reference to Google's cached copy of Phrack's page.This means that not only were we not anonymous, our browser informed the Phrack Web server that we were trying to view a cached version of the page! So much for anonymity.

It's worth noting that most real hackers use proxy servers when browsing a target's Web pages, and even their Google activities are first bounced off a proxy server. If we had used an anonymous proxy server for our testing, the Phrack Web server would have only gotten our proxy server's IP address, not our actual IP address.

Google Hacker's Tip

It's a good idea to use a proxy server if you value your anonymity online. Penetration testers use proxy servers to emulate what a real attacker would do during an actual break-in attempt. Locating working, high-quality proxy servers can be an arduous task, unless of course we use a little Google hacking to do the grunt work for us! To locate proxy servers using Google, try these queries:

inurl:"nph-proxy.cgi" "Start browsing"

or

"this proxy is working fine!" "enter *" "tjrl***" * visit

These queries locate online public proxy servers that can be used for testing purposes. Nothing like Googling for proxy servers! Remember, though, that there are lots of places to obtain proxy servers, such as the atomintersoft site or the samair.ru proxy site. Try Googling for those!

The cache banner gives us an option to view only the data that Google has captured, without any external references. As you can see in Figure 3.1, a link is available in the header, titled "Click here for the cached text only." Clicking this link produces the tcdump output shown in Figure 3.4, captured with tcpdump —n.

Figure 3.4 Cached Text Only Captured with Tcpdump

IP 192.168.2.32.52912 > 64.233.167.104.80: S 2057734012:2057734012(0) win
65535

IP 64.233.167.104.80 > 192.168.2.32.52912: S 42 0502 8 956:42 05 02 89 56(0) ack
2057734013 win 8190

IP 192.168.2.32.52912 > 64.233.167.104.80: . ack 1 win 65535

IP 192.168.2.32.52912 > 64.233.167.104.80: P 1:699(698) ack 1 win 65535

IP 64.233.167.104.80 > 192.168.2.32.52912: . ack 699 win 15885

IP 64.233.167.104.80 > 192.168.2.32.52912: . 1:1431(1430) ack 699 win 15885

23:46:54.127202 IP 64.233.167.104.80 > 192.168.2.32.52912: . 1431:2861(1430) ack 699 win 15885

IP 64.233.167.104.80 > 192.168.2.32.52912: P 2861:3846(985) ack 699 win
15885

IP 192.168.2.32.52912 > 64.233.167.104.80: . ack 3846 win 65535

IP 192.168.2.32.52912 > 64.233.167.104.80: F 699:699(0) ack 3846 win 65535

IP 64.233.167.104.80 > 192.168.2.32.52912: F 3846:3846(0) ack 700 win 8190

IP 192.168.2.32.52912 > 64.233.167.104.80: . ack 3847 win 65535



Lines 1—3 show a standard TCP handshake on the Web port (port 80) between our browsing machine (192.168.2.32) and the Google server (64.233.167.104). Lines 4—9 show our Web data transfer as our browsing machine receives data from the Google server, and lines 10—12 show the normal successful shutdown of our communication with the Google server. Despite the fact that we loaded the same page as before, we communicated only with the Google server, not any external servers.

If we were to look at the URL generated by clicking the "cached text only" link in the cached page's header, we would discover that Google appended an interesting parameter, &strip = 1.This parameter forces a Google cache URL to dis­play only cached text, avoiding any external references. This URL parameter only applies to URLs that reference a Google cached page.

Pulling it all together, we can browse a cached page with a fair amount of anonymity without a proxy server using a quick cut and paste and a URL modi­fication. As an example, let's say that we used a Google query site:phrack.org inurl:hardcover, which returns one result. Instead of clicking the cached link, we will right-click the cached link and copy the URL to the Clipboard, as shown in Figure 3.5. Browsers handle this action differently, so use whichever technique works for you to capture the URL of this link.

Figure 3.5 Anonymous Cache Viewing Via Cut and Paste

Web Images Groups News Frooole more»

Advanced Sganj:

Web Results 1 -1 of 1 from www.phrack.org for inurhhardcover&Z (0.19 seconds)

Tip: Try Google Answers for help from expert researchers

www.phrack.org

Collocated Unix Server - S>65/Month, home | about | all articles | all authors |

all comments | download | search submit article | loopback commentaries | editor...

www.phrack.org/handcover62' - 5k - CacJW - Similar ran™

Open Link in Mew Window

Open Link in Mew Tab

Save Linked File As...

Once the URL is copied to the Clipboard, paste it into the address bar of your browser, and append the &strip=1 parameter to the end of the URL.The URL should now look something like http://216.239.41.104/search?q= cache:Z7FntxDMrMIJ:www.phrack.org/hardcover62/++site:www.phrac k.org+inurl:hardcover62&hl=en&strip=1. Press Enter after modifying the URL to load the page, and you should be taken to the stripped version of the cached page, which has a slightly different banner, as shown in Figure 3.6.

Notice that the stripped cache header reads differently than the standard cache header. Instead of the "This cached page may reference images which are no longer available" line is a new line that reads,"Click here for the full cached version with images included."This is an indicator that the current cached page has been stripped of external references. Unfortunately, the stripped page does not include graphics, so the page could look quite different from the original, and in some cases a stripped page might not be legible at all. If this is the case, it never hurts to load up a proxy server and hit the page, but real Google hackers "don't need no steenkin' proxy servers!"

Fun with Highlights

If you've ever scrolled through page after page of a document looking for a particular word or phrase, you probably already know that Google's cached version of the page will highlight search terms for you. What you might not realize is that you can use Google's highlight tool to highlight terms on a cached page that weren't included in your original search. This takes a bit of URL mangling, but it's fairly straightforward. For example, if you searched for peeps marshmallows and viewed the first cached page, the tail end of that URL would look something like www.marsh-mallowpeeps.com/news/press_peeps_spring_2004.html + peeps + marsh-mallows&hl=en.

To highlight other terms, simply play around with the area after the target URL, in this case +peeps+marshmallows. Simply add or subtract words and press Enter, and Google will highlight the terms right in your browser!

Using Google as a Proxy Server

Although this technique might not work forever, at the time of this writing it's possible to use Google itself as a proxy server.This technique requires a Google-translated URL and some minor URL modification. To make this work, we first need to generate a translation URL. The easiest way to do this is through Google's translation service, located at www.google.com/translate_t. If you were to enter a URL into the "Translate a web page" field, select a language pair, and

click the Translate button, as shown in Figure 3.7, Google would translate the contents of the Web page and generate a translation URL that could be used for later reference.

We discussed most of the parameters in this URL in Chapter 1, but we haven't talked about the langpair parameter yet. This parameter, which is only available for the translation service, describes which languages to translate to and from, respectively. The arguments to this parameter are identical to the hl parame­ters we saw in Chapter 1. Figure 3.7 shows that we were attempting to translate the www.google.com Web page from English to Spanish, which generated a lang­pair of en and es. Here's where the hacker mentality kicks in. What would happen if we were to translate a page from one language into the same language? This would change our translation URL to:

http://www.google.com/translate?u=http%3A%2F%2Fwww.google.com&langpair=en%7C en&hl=en&ie=Unknown&oe=ASCII

First, you should notice that the Google search page in the bottom frame of the browser window looks pretty familiar. In fact, it looks identical to the orig­inal search page. This is because no real language translation occurred. The top frame of the browser window shows the standard translation banner. Admittedly, all this work seems a bit anticlimactic, since all we have to show for our efforts is an exact copy of a page we could have just loaded directly. Fortunately, there is a payoff when we consider what happens behind the scenes. Let's look at another example, this time translating the www.phrack.org/hardcover62/ Web page, monitoring network traffic with tcpdump -n -U -t as shown in Figure 3.9.



Figure 3.9 Monitoring English to English Translation with Tcpdump -n -U -t

IP 192.168.2.32.53466 > 64.233.171.104.80: S 112 016 074 0:112 016 0740(0) win

IP 64.233.171.104.80 > 192.168.2.32.53466: S 23 37757 854:23 37757 854(0) ack

IP 192.168.2.32.53466 > 64.233.171.104.80: . ack 1

IP 192.168.2.32.53466 > 64.233.171.104.80: P 1:678(677) ack

IP 64.233.171.104.80 > 192.168.2.32.53466: . ack 678

IP 64.233.171.104.80 > 192.168.2.32.53466: P 1:529(528) ack

IP 192.168.2.32.53466 > 64.233.171.104.80: . ack 529

IP


64.233.171.104.80 >


192.168.2.32.53466:


P


529:549(20) ack




IP


192.168.2.32.53466 >


64.233.171.104.80:


P


678:1477(799) ack




[snip]













IP


192.168.2.32.53470 >


216.239.37.104.80:


S


3691660195:3691660195(0)


win

IP


216.239.37.104.80 >


192.168.2.32.53470:


S


2470826704:2470826704(0)


ack

IP


192.168.2.32.53470 >


216.239.37.104.80:





ack 1




IP


192.168.2.32.53470 >


216.239.37.104.80:


P


1:752(751) ack




IP


216.239.37.104.80 >


192.168.2.32.53470:


P


1:1271(1270) ack




IP


216.239.37.104.80 >


192.168.2.32.53470:


P


1271:1692(421) ack




IP


216.239.37.104.80 >


192.168.2.32.53470:


P


1692:1712(20) ack




IP


192.168.2.32.53470 >


216.239.37.104.80:





ack 1712






In lines 1—3, we see our Web browsing machine (192.168.2.32) connecting to a Google Web server (64.233.171.104) on port 80. Data is transferred back and forth in lines 4—9, and another similar connection is established between the same addresses at line 10, removed for brevity. In lines 11—13, our Web browsing machine (192.168.2.32) connects to another Google Web server (216.239.37.104) on port 80. Data is transferred back and forth in lines 14—18, and the www.phrack.org/hardcover62/ Web page is displayed in our browser, as shown in Figure 3.10. In this example, no data was transferred directly between our Web browsing machine and the phrack.org Web site! When we submitted our modified translation URL, Google fetched the Web page for us and passed the contents of the page back to our browser. Google, in essence, acted as a proxy server for our request.

Figure 3.10 Google Acting as a Transparent Proxy Server

Translated version of http://wrVvw.phrackJor^/liardcaver62/

This is not a perfect proxy solution and should not be used as the sole proxy server in your toolkit. We present it simply as a example of what a little creative thinking can accomplish. While Google is acting as a proxy server, it is a trans­parent proxy server, which means the target Web site can still see our IP address in the connection logs, despite the fact that Google grabbed the page for us.

Test Your Proxy Server!

If you are conducting a test that requires you to protect your IP address from the target, use a proxy server and test it with a proxy checker like the one available from www.all-nettools.com/pr.htm. If you use this page to check the "Google proxy," you'll discover that it affords little protection for your IP address.

Directory Listings

A directory listing is a type of Web page that lists files and directories that exist on a Web server. Designed to be navigated by clicking directory links, directory list­ings typically have a title that describes the current directory, a list of files and directories that can be clicked, and often a footer that marks the bottom of the directory listing. Each of these elements is shown in the sample directory listing in Figure 3.11.

Much like an FTP server, directory listings offer a no-frills, easy-install solu­tion for granting access to files that can be stored in categorized folders. Unfortunately, directory listings have many faults, specifically:

■ They are not secure in and of themselves.They do not prevent users from downloading certain files or accessing certain directories. This task is often left to the protection measures built into the Web server soft­ware or third-party scripts, modules, or programs designed specifically for that purpose.

■ They can display information that helps an attacker learn specific tech­nical details about the Web server.

■ They do not discriminate between files that are meant to be public and those that are meant to remain behind the scenes.

■ They are often displayed accidentally, since many Web servers display a directory listing if a top-level index file (index.htm, index.html, default.asp, and so on) is missing or invalid.

All this adds up to a deadly combination.

In this section, we'll take a look at some of the ways Google hackers can take advantage of directory listings.

Locating Directory Listings

The most obvious way an attacker can abuse a directory listing is by simply finding it! Since directory listings offer "parent directory" links and allow browsing through files and folders, even the most basic attacker might soon dis­cover that sensitive data can be found by simply locating the listings and browsing through them.

Locating directory listings with Google is fairly straightforward. Figure 3.11 shows that most directory listings begin with the phrase "Index of," which also shows in the title. An obvious query to find this type of page might be ntitle:index.of, which could find pages with the term index of in the title of the document. Remember that the period (".") serves as a single-character wildcard in Google. Unfortunately, this query will return a large number of false positives, such as pages with the following titles:

Index of Native American Resources on the Internet LibDex - Worldwide index of library catalogues Iowa State Entomology Index of Internet Resources

Judging from the titles of these documents, it is obvious that not only are these Web pages intentional, they are also not the type of directory listings we are looking for. As Ben Kenobi might say, "This is not the directory listing you're looking for." Several alternate queries provide more accurate results—for example, intitle:index.of"parent directory" (shown in Figure 3.12) or intitle:index.of name size. These queries indeed provide directory listings by not only focusing on index.ofin the title but on keywords often found inside directory listings, such as parent directory, name, and size. Even judging from the summary on the search results page, you can see that these results are indeed the types of directory list­ings we're looking for.

Results 1 -10 of about 4,660,000 for intitle:index.of "parent directory". (0.58 seconds)

Index of/images

Index of /images. Name Last modified Size Description Parent Directory 29-Jul-2004 16:36 - Actions/ 12-Dec-2003 14:44 - Animation/ 18-Aug-2004 12:24 - Balls/ 18 ... WAw.cit.gu.edu.au/images/ - 26k - Cached - Similar paoes

Index of/dist

Index of/dist. ... Parent Directory - DATE 12-Sep-2004 17:47 11 SOURCE 05-Sep-2004 07:21 16 ant/ 16-Jul-2004 02:18- apr/ 02-Sep-2004 09:47 - avalon/ 28-May-2004 08 ... WAw.apache.org/dist/ - 5k - Sep 12, 2004 - Cached - Similar paoes

Index of/dist/httpd

Index of/dist/httpd. ... Parent Directory - HTTP Server project binaries/ 19-Jul-2004 04:49 - Binary distributions docs/ 12-Sep-2004 06:02 - Extra documentation ... WAW.apache.org/dist/httpd/ - 11k - Sep 12, 2004 - Cached - Similar paoes [ More results from WAW.apache.oro ]

Finding Specific Directories

In some cases, it might be beneficial not only to look for directory listings but to look for directory listings that allow access to a specific directory.This is easily accomplished by adding the name of the directory to the search query.To locate "admin" directories that are accessible from directory listings, queries such as intitle:index.of.admin or intitle:index.of inurl:admin will work well, as shown in Figure 3.13.

Finding Specific Files

Because of the directory tree style, it is also possible to find specific files in a directory listing. For example, to find WS_FTP log files, try a search such as intitle:index.of ws_ftp.log, as shown in Figure 3.14.This technique can be extended to just about any kind of file by keying in on the index.of in the title and the file­name in the text of the Web page.

Web Irnaoes Groups News Frootile more *

Web

Results 1 -10 of about 101,000 for Intitle:index.of ws_ftp.log. (0.69 seconds)

In dex o f /-n bes Be I s/WS FTP LOG

Index of l~nbessels/WS_FTP.LOG. Name Last modified Size Description Parent Directory 02-Sep-2002 11:14 - Images/ 23-Aug-2002 19:03 ... home.tiscali.nl/-nbessels/WS_FTP.LOG/ - 1k - Cached - Similar pages

Index of /mp3

lndBxof/mp3.... 31-May-2M1 18:53 1.8M VflndalsDBsert.Wrjrnan.mp3 31-May.20O1 18:39 1.3M VandalsRighl On Q.mpS 26-Sep-2Q01 18:45 1.8M WS_FTP.LOG 31-May-2001 18:53 1k . kungfurecordscom/mp3r - 1Sk - Sep 12. 2004 - Cached - Simitar e

Index of /gallery

Indox of/gallery. ... 2004 20:45 18k FgallsryB.l.jpg 20-Apr-20_4 20:45 29k

ForEUBrthP.jpg 20-Apr-20O4 20:45 1 0k Thumbs.db 03-Sep-20O4 10:52 9SBk WS_FTP.LOG.

www.lnspired-art.com/gallery/ - 25k - Cached - Similar pages

:

You can also use filetype and inurl to search for specific files. To search again for ws_ftp.log files, try a query like filetype:log inurhws_ftp.log. This technique will generally find more results than the somewhat restrictive index.of search. We'll be working more with specific file searches throughout the book.

Server Versioning

One piece of information an attacker can use to determine the best method for attacking a Web server is the exact software version. An attacker could retrieve that information by connecting directly to the Web port of that server and issuing a request for the HTTP (Web) headers. It is possible, however, to retrieve similar information from Google without ever connecting to the target server. One method involves using the information provided in a directory listing.

Figure 3.15 shows the bottom portion of a typical directory listing. Notice that some directory listings provide the name of the server software as well as the version number. An adept Web administrator could fake these server tags, but most often this information is legitimate and exactly the type of information an attacker will use to refine his attack against the server.

The Google query used to locate servers this way is simply an extension of the intitle:index.of query.The listing shown in Figure 3.15 was located with a query of intitle:index.of " server at". This query will locate all directory listings on the Web with index ofin the title and server at anywhere in the text of the page.

This might not seem like a very specific search, but the results are very clean and do not require further refinement.

Server Version? Who Cares?

Although server versioning might seem fairly harmless, realize that there are two ways an attacker might use this type of information. If the attacker has already chosen his target and discovers this information on that target server, he could begin searching for an exploit (which might or might not exist) to use against that specific software version. Inversely, if the attacker already has a working exploit for a very specific version of Web server software, he could perform a Google search for targets that he can compromise with that exploit. An attacker, armed with an exploit and drawn to a potentially vulnerable server, is especially dangerous. Even small information leaks like this can have big payoffs for a clever attacker.

To search for a specific server version, the intitle:index.of query can be extended even further to something like intitle:index.of"Apache/1.3.27 Server at". This query would find pages like the one listed in Figure 3.15. As shown in Table 3.1, many different servers can be identified through a directory listing.

Table 3.1 Some Specific Servers Locatable Via Directory Listings

Directory Listing of Web Servers

"AnWeb/1.42h" intitle:index.of "Apache Tomcat/" intitle:index.of "Apache-AdvancedExtranetServer/" intitle:index.of "Apache/df-exts" intitle:index.of "Apache/" "server at" intitle:index.of "Apache/AmEuro" intitle:index.of "Apache/Blast" intitle:index.of "Apache/WWW" intitle:index.of "Apache/df-exts" intitle:index.of

Continued

Table 3.1 Some Specific Servers Locatable Via Directory Listings

Directory Listing of Web Servers

"CERN httpd 3.0B (VAX VMS)" intitle:index.of

fitweb-wwws * server at intitle:index.of

"HP Apache-based Web "Server/1.3.26" intitle:index.of

"HP Apache-based Web "Server/1.3.27 (Unix) mod_ssl/2.8.11 OpenSSL/0.9.6g" intitle:index.of

"httpd+ssl/kttd" * server at intitle:index.of

"JRun Web Server" intitle:index.of

"MaXX/3.1" intitle:index.of

"Microsoft-IIS/* server at" intitle:index.of

"Microsoft-IIS/4.0" intitle:index.of

"Microsoft-IIS/5.0 server at" intitle:index.of

"Microsoft-IIS/6.0" intitle:index.of

"OmniHTTPd/2.10" intitle:index.of

"OpenSA/1.0.4" intitle:index.of

"Oracle HTTP Server Powered by Apache" intitle:index.of "Red Hat Secure/2.0" intitle:index.of "Red Hat Secure/3.0 server at" intitle:index.of SEDWebserver * server +at intitle:index.of

Figure C.2 Directory Listings of Apache Versions

Queries That Locate Apache Versions Through Directory Listings

"Apache/1.0" intitle:index.of "Apache/1.1" intitle:index.of "Apache/1.2" intitle:index.of "Apache/1.2.0 server at" intitle:index.of "Apache/1.2.4 server at" intitle:index.of "Apache/1.2.6 server at" intitle:index.of "Apache/1.3.0 server at" intitle:index.of "Apache/1.3.2 server at" intitle:index.of "Apache/1.3.1 server at" intitle:index.of

"Apache/1.3.1.1 server at" intitle:index.of "Apache/1.3.3 server at" intitle:index.of "Apache/1.3.4 server at" intitle:index.of "Apache/1.3.6 server at" intitle:index.of "Apache/1.3.9 server at" intitle:index.of

"Apache/2.0.49a server at" intitle:index.of "Apache/2.0.50 server at" intitle:index.of "Apache/2.0.51 server at" intitle:index.of "Apache/2.0.52 server at" intitle:index.of

In addition to identifying the Web server version, it is also possible to deter­mine the operating system of the server (as well as modules and other software that is installed). We'll look at more specific techniques to accomplish this later, but the server versioning technique we've just looked at can be extended by including more details in our query.Table 3.2 shows queries that located extremely esoteric server software combinations, revealed by server tags. These tags list a great deal of information about the servers they were found on and are shining examples proving that even a seemingly small information leak can sometimes explode out of control, revealing more information than expected.

Table 3.2 Locating Specific and Esoteric Server Versions

Queries That Locate Specific and Esoteric Server Versions

"Apache/1.3.12 (Unix) mod_fastcgi/2.2.12 mod_dyntag/1.0 mod_advert/1.12 mod_czech/3.1.1b2" intitle:index.of

"Apache/1.3.12 (Unix) modJastcgi/2.2.4 secured_by_Raven/1.5.0" intitle:index.of

"Apache/1.3.12 (Unix) mod_ssl/2.6.6 OpenSSL/0.9.5a" intitle:index.of

"Apache/1.3.12 Cobalt (Unix) Resin/2.0.5 StoreSense-Bridge/1.3 ApacheJServ/1.1.1 mod_ssl/2.6.4 OpenSSL/0.9.5a mod_auth_pam/1.0a FrontPage/4.0.4.3 mod_perl/1.24" intitle:index.of

"Apache/1.3.14 - PHP4.02 - Iprotect 1.6 CWIE (Unix) modJastcgi/2.2.12 PHP/4.0.3pl1" intitle:index.of

"Apache/1.3.14 Ben-SSL/1.41 (Unix) mod_throttle/2.11 mod_perl/1.24_01 PHP/4.0.3pl1 FrontPage/4.0.4.3 rus/PL30.0" intitle:index.of

"Apache/1.3.20 (Win32)" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) PHP/4.0.3pl1 mod_auth_pam_external/0.1 FrontPage/4.0.4.3 mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) PHP/4.0.4 mod_auth_pam_external/0.1 FrontPage/4.0.4.3 mod_ssl/2.8.4 OpenSSL/0.9.6b mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) PHP/4.0.6 mod_ssl/2.8.4 OpenSSL/0.9.6 FrontPage/5.0.2.2510 mod_perl/1.26" intitle:index.of

Table 3.2 Locating Specific and Esoteric Server Versions

Queries That Locate Specific and Esoteric Server Versions

"Apache/1.3.20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.3pl1 mod_auth_pam_external/0.1 FrontPage/4.0.4.3 mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.3pl1 modJastcgi/2.2.8 mod_auth_pam_external/0.1 mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.4 mod_auth_pam_external/0.1 mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.6 mod_auth_pam_external/0.1 FrontPage/4.0.4.3 mod_perl/1.25" intitle:index.of

"Apache/1.3.20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b mod_auth_pam_external/0.1 mod_perl/1.25" intitle:index.of

"Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 mod_dtcl" intitle:index.of

"Apache/1.3.26 (Unix) PHP/4.2.2" intitle:index.of

"Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.6b" intitle:index.of

"Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7" intitle:index.of

"Apache/1.3.26+PH" intitle:index.of

"Apache/1.3.27 (Darwin)" intitle:index.of

"Apache/1.3.27 (Unix) mod_log_bytes/1.2 mod_bwlimited/1.0 PHP/4.3.1 FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.6b" intitle:index.of

"Apache/1.3.27 (Unix) mod_ssl/2.8.11 OpenSSL/0.9.6g FrontPage/5.0.2.2510 mod_gzip/1.3.26 PHP/4.1.2 mod_throttle/3.1.2" intitle:index.of

Going Out on

a Limb: Traversal Techniques

The next technique we'll examine is known as traversal. Traversal in this context simply means to travel across. Attackers use traversal techniques to expand a small "foothold" into a larger compromise.

Directory Traversal

To illustrate how traversal might be helpful, consider a directory listing that was found with intitle:index.of inurl: "/admin/*", as shown in Figure 3.16.

/bpa/acadunits/admin/envr/bowman. If you look closely at the URL, you'll notice an "admin" directory two directory levels above our current location. If we were to click the "parent directory" link, we would be taken up one direc­tory, to the "envr" directory. Clicking the "parent directory" link from the "envr" directory would take us to the "admin" directory, a potentially juicy directory. This is very basic directory traversal. We could explore each and every parent directory and each of the subdirectories, looking for juicy stuff. Alternatively, we could use a creative site search combined with an inurl search to locate a specific file or term inside a specific subdirectory, such as site:cl.uh.edu inurl:bpa/acadunits/admin ws_tp.log, for example. We could also explore this direc­tory structure by modifying the URL in the address bar.

Regardless of how we were to "walk" the directory tree, we would be traversing outside the Google search, wandering around on the target Web server. This is basic traversal, specifically directory traversal. Another simple example would be replacing the word admin with the word student or public. Another more serious traversal technique could allow an attacker to take advantage of software flaws to traverse to directories outside the Web server directory tree. For

example, if a Web server is installed in the /var/www directory, and public Web documents are placed in /var/www/htdocs, by default any user attaching to the Web server's top-level directory is really viewing files located in /var/www/htdocs. Under normal circumstances, the Web server will not allow Web users to view files above the /var/www/htdocs directory. Now, let's say a poorly coded third-party software product is installed on the server that accepts directory names as arguments. A normal URL used by this product might be www.somesadsite.org/badcode.pl?page=/index.html.This URL would instruct the badcode.pl program to "fetch" the file located at

/var/www/htdocs/index.html and display it to the user, perhaps with a nifty header and footer attached. An attacker might attempt to take advantage of this type of program by sending a URL such as www.somesadsite.org/ badcode.pl?page=../../../etc/passwd. If the badcode.pl program is vulnerable to a directory traversal attack, it would break out of the /var/www/htdocs directory, crawl up to the real root directory of the server, dive down into the /etc directory, and "fetch" the system password file, displaying it to the user with a nifty header and footer attached!

Automated tools can do a much better job of locating these types of files and vulnerabilities, if you don't mind all the noise they create. If you're a pro­grammer, you will be very interested in the Libwhisker Perl library, written and maintained by Rain Forest Puppy (RFP) and available from www.wiretrip. net/rfp. Security Focus wrote a great article on using Libwhisker. That article is available from www.securityfocus.com/infocus/1798. If you aren't a programmer, RFP's Whisker tool, also available from the Wiretrip site, is excellent, as are other tools based on Libwhisker, such as nikto, written by sullo@cirt.net, which is said to be updated even more than the Whisker program itself.

Incremental Substitution

Another technique similar to traversal is incremental substitution.This technique involves replacing numbers in a URL in an attempt to find directories or files that are hidden, or unlinked from other pages. Remember that Google generally only locates files that are linked from other pages, so if it's not linked, Google won't find it. (Okay, there's an exception to every rule. See the FAQ at the end of this chapter.) As a simple example, consider a document called exhc-1.xls, found with Google.You could easily modify the URL for that document, changing the 1 to a 2, making the filename exhc-2.xls. If the document is found, you have successfully used the incremental substitution technique! In some cases it might be simpler to

use a Google query to find other similar files on the site, but remember, not all files on the Web are in Google's databases. Use this technique only when you're sure a simple query modification won't find the files first.

This technique does not apply only to filenames but just about anything that contains a number in a URL, even parameters to scripts. Using this technique to toy with parameters to scripts is beyond the scope of this book, but if you're interested in trying your hand at some simple file or directory substitutions, scare up some test sites with queries such as filetype:xls inurl:1.xls or intitle:index.of inurl:0001 or even an images search for 1.jpg. Now use substitution to try to modify the numbers in the URL to locate other files or directories that exist on the site. Here are some examples:

■ /docs/bulletin/2.xls could be modified to /docs/bulletin/2.xls

■ /DigLib_thumbnail/spmg/hel/0001/H/ could be changed to /DigLib_thumbnail/spmg/hel/0002/H/

■ /gallery/wel008-1.jpg could be modified to /gallery/wel008-2.jpg

Extension Walking

We've already discussed file extensions and how the filetype operator can be used to locate files with specific file extensions. For example, we could easily search for HTM files with a query such as filetype:HTM HTM. (Remember that filetype searches require a search parameter. Files ending in HTM always have HTM in the URL!) Once you've located HTM files, you could apply the substitution technique to find files with the same file name and different extension. For example, if you found /docs/index.htm, you could modify the URL to /docs/index.asp to try to locate an index.asp file in the docs directory. If this seems somewhat pointless, rest assured, this is, in fact, rather pointless. We can, however, make more intelligent substitutions. Consider the directory listing shown in Figure 3.17.This listing shows evidence of a very common practice, the creation of backup copies of Web pages.

Backup files can be a very interesting find from a security perspective. In some cases, backup files are older versions of an original file. This is evidenced in Figure 3.17.Take a look at the date of the index.htm file.The date is listed as January 19, 2004. Now take a look at the backup copy, index.htm.bak.That file's date is listed as January 9, 2002. Without even viewing these files, we can tell that they are most likely very different, since there are more than two years' difference in the dates. Older files are not necessarily less secure than newer versions, but backup files on the Web have an interesting side effect: They have a tendency to reveal source code. Source code of a Web page is quite a find for a security prac­titioner because it can contain behind-the-scenes information about the author, the code creation and revision process, authentication information, and more.

To see this concept in action, consider the directory listing shown in Figure 3.17. Clicking the link for index.htm will display that page in your browser with all the associated graphics and text, just as the author of the page intended. This happens because the Web server follows a set of rules about how to display types of files to the user. HTML files are sent as is to your browser, with very

little modification (actually there are some exceptions, such as server-side includes). When you view an HTML page in your browser, you can simply per­form a view source to see the source code of the page.

PHP files, by contrast, are first executed on the server. The results of that exe­cuted program are then sent to your browser in the form of HTML code, which your browser then displays. Performing a view source on HTML code that was generated from a PHP script will not show you the PHP source code, only the HTML. It is not possible to view the actual PHP source code unless something somewhere is misconfigured. An example of such a misconfiguration would be copying the PHP code to a filename that ends in something other than PHP, like BAK. Most Web servers do not understand what a BAK file is. Those servers, then, will display a PHP.BAK file as text. When this happens, the actual PHP source code is displayed as text in your browser. As shown in Figure 3.18, PHP source code can be quite revealing, showing things like SQL queries that list information about the structure of the SQL database that is used to store the Web server's data.

DATE_FORMAT( entries, date, ' Wrl %m/,%d/'%y « %hi%ii*s %p 1 ) AS date, entries.subject, entries.body, users.status FROM entries, users WHERE (entries.jid ^ users.jid AND entries. jog_id = 4jog) ORDER BY id DESC LIMIT 15")i echo mysql_orror();while (Sentry = mysql_fetch_assoc($entries_sql)) gentries[] = $entryj The easiest way to determine the names of backup files on a server is to locate a directory listing using intitle:index.of or to search for specific files withqueries such as intitle:index.of index.php.bak or inurl:index.php.bak. Directory list­ings are fairly uncommon, especially among corporate-grade Web servers. However, remember that Google's cache captures a snapshot of a page in time. Just because a Web server isn't hosting a directory listing now doesn't mean the site never displayed a directory listing.The page shown in Figure 3.19 was found in Google's cache and was displayed as a directory listing because an index.php (or similar file) was missing. In this case, if you were to visit the server on the Web, it would look like a normal page because the index file has since been cre­ated. Clicking the cache link, however, shows this directory listing, leaving the list of files on the server exposed.This list of files can be used to intelligently locate files that still most likely exist on the server (via URL modification) without guessing at file extensions.

Directory listings also provide insight into the file extensions that are in use in other places on the site. If a system administrator or Web authoring program creates backup files with a .BAK extension in one directory, there's a good chance that BAK files will exist in other directories as well.

Summary

The Google cache is a powerful tool in the hands of the advanced user. It can be used to locate old versions of pages that may expose information that normally would be unavailable to the casual user. The cache can be used to highlight terms in the cached version of a page,even if the terms were not used as part of the query to find that page. The cache can also be used to view a Web page anony­mously via the &strip = 1 URL parameter, and it can even be used as a transparent proxy server with creative use of the translation service. An advanced Google user will always pay careful attention to the details contained in the cached page's header, since there can be important information about the date the page was crawled, the terms that were found in the search, whether the cached page con­tains external images, links to the original page, and the text of the URL used to access the cached version of the page.

Directory listings, although somewhat uncommon contain a great deal of information that are interesting from a security perspective. In this chapter, we saw that directory listings can be used to locate specific files and directories and that directory listings can be used to determine specific information about the software installed on a server. Traversal techniques can be used to locate informa­tion often outside the piercing gaze of Google's crawlers. Some specific tech­niques we explored included directory traversal, incremental substitution, and extension walking. When combined with effective Google searching, these tech­niques can often unearth all sorts of information that Google searching alone can not reveal. In addition, some traversal techniques can be used to actually compro­mise a server, giving an attacker wide-open access to a server.

Solutions Fast Track

Anonymity with Caches

0 Clicking the cache link will not only load the page from Google's database, it will also connect to the real server to access graphics and other non-HTML content.

0 Adding &strip = 1 to the end of a cached URL will only show the

HTML of a cached page. Accessing a cached page in this way will not connect to the real server on the Web and could protect your anonymity if you use the cut and paste method shown in this chapter.

Using Google as a Proxy Server

0 Google can be used as a transparent proxy server, thanks to the transla­tion service.

0 This technique requires URL modification, specifically the modification of the langpair parameter. To use this technique, set the langpair values to the same language, such as langpair=en%7Cen.



Locating Directory Listings

0 Directory listings contain a great deal of invaluable information.

0 The best way to home in on pages that contain directory listings is with a query such as intitle:index.of"parent directory" or intitle:index.of name size.

servers running various versions of the Apache Tomcat server.

Directory Traversal

0 Once you have located a specific directory on a target Web server, you can use this technique to locate other directories or subdirectories.

0 An easy way to accomplish this task is via directory listings. Simply click the parent directory link, taking you to the directory above the current directory. If this directory contains another directory listing, you can simply click links from that page to explore other directories. If the parent directory does not display a directory listing, you might have to resort to a more difficult method, guessing directory names and adding them to the end of the parent directory's URL. Alternatively, consider using site and inurl keywords in a Google search.

Incremental substitution

0 Incremental substitution is a fancy way of saying "take one number and replace it with the next higher or lower number."

0 This technique can be used to explore a site that uses numbers in direc­tory or filenames. Simply replace the number with the next higher or lower number, taking care to keep the rest of the file or directory name identical (watch those zeroes!). Alternatively, consider using site with either inurl or filetype keywords in a creative Google search.

Extension Walking

0 This technique can help locate files (for example, backup files) that have the same filename with a different extension.

0 The easiest way to perform extension walking is by replacing one extension with another in a URL—replacing html with bak, for example.

0 Directory listings, especially cached directory listings, are easy ways to determine whether backup files exist and what kinds of file extensions might be used on the rest of the site.

Links to Sites

■ www.all-nettools.com/pr.htm A simple proxy checker that can help you test a proxy server you're using.

Frequently Asked Questions



The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com.

Q: Can Google find Web pages that aren't linked from anywhere else on the Web?

A: This question requires two answers.The first answer is "Yes." Anyone can add a URL to Google's database by filling out the form at www.google.com/ addurl.html.The second answer is "Maybe" and requires a bit of explanation.. The Opera Web browser includes a feature that sends data to Google when a user types a URL into the address bar.The entered URL is sent to Google, and that URL is subsequently crawled by Google's bots. According to the FAQ posted at www.opera.com/adsupport:

The Google system serves advertisements and related searches to the Opera browser through the Opera browser banner 468x60 format. Google determines what ads and related searches are rele­vant based on the URL and content of the page you are viewing and your IP address, which are sent to Google via the Opera browser.

There is no substantial evidence that proves that Google includes this link in its search engine. However, testing shows that when a previously unin-dexed URL (http://johnny.ihackstuff.com/temp/suck.html) is entered into Opera 7.2.3, a Googlebot crawls that URL moments later, as shown by the following log excerpts:

64.68.87.41 - "GET /robots.txt HTTP/1.0" 200 220 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"

64.68.87.41 - "GET /temp/suck.html HTTP/1.0" 200 5 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"

Opera users should not expect typed URLs to remain "unexplored." Q: I use Opera. Can I turn off the Google crawling feature?

A: Yes.This feature can be turned off within Opera by selecting Show generic selection of graphical ads from File | Preferences | Advertising.

Q: Searching for backup files seems cumbersome. Is there a better way?

A: Better, meaning faster, yes. Many automated Web tools (such as Weblnspect from www.spidynamics.com) offer the capability to query a server for varia­tions of existing filenames, turning an existing index.html file into queries for index.html.bak or index.bak, for example.These scans are generally very thorough but very noisy and will almost certainly alert the site that you're scanning. Weblnspect is better suited for this task than Google Hacking, but many times a low-profile Google scan can be used to get a feel for the secu­rity of a site without alerting the site's administrators or intrusion detection system (IDS). As an added benefit, any information gathered with Google can be reused later in an assessment.

Q: Backup files seem to create security problems, but these files help in the development of a site and provide peace of mind that changes can be rolled back. Isn't there some way to keep backup files around without the undue risk?

A: Yes. A major problem with backup files is that in most cases, the Web server displays them differently because they have a different file extension. So there are a few options. First, if you create backup files, keep the extensions the same. Don't copy index.php to index.bak but rather to something like index.bak.php.This way the server still knows it's a PHP file. Second, you could keep your backup files out of the Web directories. Keep them in a place you can access them but where Web visitors can't get to them. The third (and best) option is to use a real configuration management system. Consider using a CVS-style system that allows you to register and check out source code. This way you can always roll back to an older version, and you don't have to worry about backup files sitting around.

No comments:

Post a Comment