Saturday, December 5, 2009

Document Grinding and Database

Introduction

There's no shortage of documents on the Internet. Good guys and bad guys alike can use information found in documents to achieve their distinct purposes. In this chapter we take a look at ways you can use Google to not only locate these documents but to search within these documents to locate information. There are so many different types of documents that we can't hope to cover them all, but we'll look at the documents in distinct categories based on their function. Specifically, we'll take a look at a few categories such as configuration files, log files, and office documents. Once we've looked at distinct file types, we'll delve into the realm of database digging. We won't examine the details of the Structured Query Language (SQL) or database architecture and interaction; rather, we'll look at the many ways Google hackers can locate and abuse database systems armed with nothing more than a search engine.

One important thing to remember about document digging is that Google will only search the rendered, or visible, view of a document. For example, con­sider a Microsoft Word document. This type of document can contain metadata, as shown in Figure 10.1 These fields include such things as the subject, author, manager, company, and much more. Google will not search these fields. If you're interested in getting to the metadata within a file, you'll have to download the actual file and check the metadata yourself.

Figure 10.1 Microsoft Word Metadata

Jo-n.-imr: GrJr d2.dor. Pi. '.ihi

General Summary 1 Statistics Contents. Custom ;

TiUe- fgWDctumenr, Grinding

Subject:

Crjmpfljiy SyngrcJj Media

Category:

Hyperlink base:

Configuration Files

Configuration files store program settings. An attacker (whether a good guy or a bad guy) can use these files to glean insight into the way the program is used and perhaps, by extension, into how the system or network it's on is used or config­ured. As we've seen in previous chapters, even the smallest tidbit of information is of interest to a skilled attacker.

Consider the file shown in Figure 10.2.This file, found with a query such as filetype:ini inurl:ws_ftp, is a configuration file used by the WS_FTP client pro­gram. When the WS_FTP program is downloaded and installed, the configura­tion file contains nothing more than a list of popular, public Internet FTP servers. However, over time, this configuration file can be automatically updated to include the name, directory, username, and password of FTP servers the user connects to. Although the password is encoded when it is stored, some free pro­grams can crack these passwords with relative ease.

Locating Files

To locate files, it's best to try different types of queries. For example, intitle:index.of ws_ftp.ini will return results, but so will filetypeiini inurliws_ftp.ini. The inurl search, however, is often the better choice. First, the filetype search allows you to browse right to a cached version of the page. Second, the directory listings found by the index.of search might not allow you access to the file. Third, directory listings are not overly common. The filetype search will locate your file no matter how Google found it.



Regardless of the type of data in a configuration file, sometimes the mere exis­tence of a configuration file is significant. If a configuration file is located on a server, there's a chance that the accompanying program is installed somewhere on that server or on neighboring machines on the network. Although this might not seem like a big deal in the case of FTP client software, consider a search like file­typexonf inurlfirewall, which can locate generic firewall configuration files.This example demonstrates one of the most generic naming conventions for a configu­ration file, the use of the conf file extension. Other generic naming conventions can be combined to locate other equally common naming conventions. One of the most common base searches for locating configuration files is simply (inurl:conf OR inurl:config OR inurl:fg), which incorporates the three most common configuration file prefixes.This base search uses the inurl operator, since the filetype operator cannot be successfully ORed together at the time of this writing.

If an attacker knows the name of a configuration file as it shipped from the software author or vendor, he can simply create a search targeting that filename using the filetype and inurl operators. However, most programs allow you to refer­ence a configuration file of any name, making a Google search slightly more dif­ficult. In these cases, it helps to get an idea of the contents of the configuration file, which could be used to extract unique strings for use in an effective base search. Sometimes, combining a generic base search with the name (or acronym) of a software product can have satisfactory results, as a search for (inurl:conf OR inurl:config OR inurl:cfg) MRTG shows in Figure 10.3.

Figure 10.3 Generic Configuration File Searching

ririn

Cooalc Searcht (inurlxfg OR murktoflfig OH InurhtonD mrtg

... 3D you migh; have to s et ""MetaDir.1: n ycur arm. e&Jif tile for... neme myrouter.eompiace. edu.2 First appeared in the configi fi;e. Sorr e ewampJe m rtg. cfg Files.... ncx: nyvi netrmrlB <:onf html 25k C-|c:?icae| ■ Si-rii :\-ix\w.

Index of/als.'Bipbrproject/mrtgrccrrfifl

index ry/ortsrSictvpfCiuci.mrtg.'tanrig Nnme Last motf-rf^d -Size Description Parent directory - mrlg.cfg 10-Dec-ЈD03 20:01 2lK mrtg .efg. new 10-Dec-2003 20:15 21* ...

#FraaBSD MRTG Configuration FNa #hy Michael Lucas, mwlucafi _. ... not commented out!}



Although this first search is not far off the mark, it's fairly common for even the best config file search to return page after page of sample or example files, like the sample MRTG configuration file shown in Figure 10.4.

This brings us back, once again, to perhaps the most valuable weapon in a Google hacker's arsenal: effective search reduction. Here's a list of the most common points a Google hacker considers when trolling for configuration files:

■ Create a strong base search using unique words or phrases from live files.

■ Filter out the words sample, example, test, howto, and tutorial to narrow the obvious example files.

■ Filter out CVS repositories, which often house default config files, with —cvs.

■ Filter out manpage or Manual if you're searching for a UNIX program's configuration file.

■ Locate the one most commonly changed field in a sample configuration file and perform a negative search on that field, reducing potentially "lame" or sample files.

To illustrate these points, consider the search filetype:cfg mrtg "target[*]" -sample -cvs —example, which locates potentially live MRTG files. As shown in Figure 10.5, this query uses a unique string ("target[*]") and removes potential example and CVS files, returning decent results.

Figure 10.5 A Common Search Reduction Technique

Web images Groups News Froogle more»

filetype:_fg mrtg "targetH" -sample -cvs -example

Web Resets 1 -10 of about 147 for filetypeicfg mrtg "target!*]" -sample -cvs -example. {0.32 seconds)

#XSizeU: 240 YSizeU: 60 OptionsU: nopereent Colours .„

... var/www/hLml/cacha IconDin ../mrtg/ PageFoot[A]: ) {

foreach my $word (@WORDS) { chomp($word); ++$LineCount; if(m/$word/) { print "$&\n"; last;

}

}

}

close(SEARCHFILE);

This script accepts two arguments: a file to search and a list of words to search for. As it stands, this program is rather simplistic, acting as nothing more than a glorified grep script. However, the script becomes much more powerful when instead of words, the word list contains regular expressions. For example, consider the following regular expression, written by Don Ranta:

Unless you're somewhat skilled with regular expressions, this might look like a bunch of garbage text.This regular expression is very powerful, however, and will locate various forms of e-mail address.

Let's take a look at this regular expression in action. For this example, we'll save the results of a Google Groups search for "@yahoo.com" email to a file called results.html, and we'll enter the preceding regular expression all on one line of a file called wordlfile.txt. As shown in Figure 10.13, we can grab the search results from the command line with a program like Lynx, a common text-based Web browser. Other programs could be used instead of Lynx—Curl, Netcat,Telnet, or even "save as" from a standard Web browser. Remember that Google's terms of service frown on any form of automation. In essence, Google prefers that you simply execute your search from the browser, saving the results manually. However, as we've discussed previously, if you honor the spirit of the terms of service, taking care not to abuse Google's free search service with excessive automation, the folks at Google will most likely not turn their wrath upon you. Regardless, most people will ultimately decide for themselves how strictly to follow the terms of service.

Back to our Google search: Notice that the URL indicates we're grabbing the first hundred results, as demonstrated by the use of the num = 100 parameter. This will potentially locate more e-mail addresses. Once the results are saved to the results.html file, we'll run our ssearch.pl script against the results.html file, searching for the e-mail expression we've placed in the wordfile.txt file.To help narrow our results, we'll pipe that output into "grep yahoo | head —15 | sort —u" to return at most 15 unique addresses that contain the word yahoo.The final (obfuscated) results are shown in Figure 10.13.



As you can see, this combination of commands works fairly well at unearthing e-mail addresses. If you're familiar with UNIX commands, you might have already noticed that there is little need for two separate commands. This entire process could have been easily combined into one command by modifying the Perl script to read standard input and piping the output from the Lynx com­mand directly into the ssearch.pl script, effectively bypassing the results.html file. Presenting the commands this way, however, opens the door for irresponsible automation techniques, which isn't overtly encouraged.

Other regular expressions can come in handy as well. This expression, also by Don Ranta, locates URLs:

We can use an expression like this to help map a target network.These tech­niques could be used to parse not only HTML pages but also practically any type of document. However, keep in mind that many files are binary, meaning that they should be converted into text before they're searched. The UNIX strings command (usually implemented with strings —8 for this purpose) works very well for this task, but don't forget that Google has the built-in capability to translate many different types of documents for you. If you're searching for visible text, you should opt to use Google's translation, but if you're searching for nonprinted text such as metadata, you'll need to first download the original file and search it offline. Regardless of how you implement these techniques, it should be clear to you by now that Google can be used as an extremely powerful information-gathering tool when it's combined with even a little automation.

Google Desktop Search

The Google Desktop, available from http://desktop.google.com, is an application that allows you to search files on your local machine. Currently available for Windows 2000 and Windows XP, Google Desktop Search allows you to search many types of files, as shown in Table 10.10.


The Google Desktop search offers many features, but since it's a beta product, you should check the desktop Web page for a current list of features. For a document-grinding tool, you can simply download content from the target server and use Desktop Search to search through those files.This offers a distinct advantage over searching the content online through Google; you can't OR the filetype operator in an online search. With Google Desktop Search, you can search many different file types with only one query. In addition, the Desktop Search tool captures Web pages that are viewed in Internet Explorer 5 and newer.This means you can always view an older version of a page you've visited online, even when the original page has changed. In addition, once Desktop Search is installed, any online Google Search you perform in Internet Explorer will also return results found on your local machine.

Summary

The subject of document grinding is topic worthy of an entire book. In a single chapter, we can only hope to skim the surface of this topic. An attacker (black or white hat) who is skilled in the art of document grinding can glean loads of information about a target. In this chapter we've discussed the value of configu­ration files, log files, and office documents, but obviously there are many other types of documents we could focus on as well.The key to document grinding is first discovering the types of documents that exist on a target and then, depending on the number of results, narrowing the documents to the ones that might be the most interesting. Depending on the target, the line of business they're in, the document type, and many other factors, various keywords can be mixed with filetype searches to locate key documents.

Database hacking is also a topic for an entire book. However, there is obvious benefit to the information Google can provide prior to a full-blown database audit. Login portals, support files, and database dumps can provide various information that can be recycled into an audit. Of all the information that can be found from these sources, perhaps the most telling (and devastating) is source code. Lines of source code provide insight into the way a database is structured and can reveal flaws that might otherwise go unnoticed from an external assessment. In most cases, though, a thorough code review is required to determine application flaws. Error messages can also reveal a great deal of information to an attacker.

Automated grinding allows you to search many documents programmatically for bits of important information. When it's combined with Google's excellent document location features, you've got a very powerful information-gathering weapon at your disposal.



Solutions Fast Track



Configuration Files

0 Configuration files can reveal sensitive information to an attacker.

0 Although the naming varies, configuration files can often be found with file extensions like INI, CONF, CONFIG, or CFG.
Подпись: ■

0 http://johnny.ihackstufF.com The home of the Google Hacking Database, where you can find more searches like those listed in this chapter.

Frequently Asked Questions

The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com.

Q: What can I do to help prevent this form of information leakage?

A: To fix this problem on a site you are responsible for, first review all docu­ments available from a Google search. Ensure that the returned documents are, in fact, supposed to be in the public view. Although you might opt to scan your site for database information leaks with an automated tool (see the Protection chapter), the best way to prevent this is at the source.Your database remote administration tools should be locked down from outside users, default login portals should be reviewed for safety and checked to ensure that software versioning information has been removed, and support files should be removed from your public servers. Error messages should be tailored to ensure that excessive information is not revealed, and a full appli­cation review should be performed on all applications in use. In addition, it doesn't hurt to configure youjjWerwerver to^onl^ia^w certain file types to be downloaded. It's much easier to list the file types you will allow than to list the file types you don't allow. See the Appendix for more information about Web application security testing.



Q: I'm concerned about excessive metadata in office documents. Can I do any­thing to clean up my documents?

A: Microsoft provides a Web page dedicated to the topic: http://support. microsoft.com/default.aspx?scid=kb;EN-US;Q223396. In addition, several utilities are available to automate the cleaning process. One such product, ezClean, is available from www.kklsoftware.com.

Q: Many types of software rely on include files to pull in external content. As I understand it, include files, like the INC files discussed in this chapter, are a problem because they often reveal sensitive information meant for programs, not Web visitors. Is there any way to resolve the dangers of include files?

A: Include files are in fact a problem because of their file extensions. If an extension such as .INC is used, most Web servers will display them as text, revealing sensitive data. Consider blocking .INC files (or whatever extension you use for includes) from being downloaded.This server modification will keep the file from presenting in a browser but will still allow back-end pro­cesses to access the data within the file.



Q: Our software uses .INC files to store database connection settings. Is there another way?

A: Rename the extension to .PHP so that the contents are not displayed.



Q: How can I avoid our X application database from being downloaded by a Google hacker?

No comments:

Post a Comment