Introduction
The initial phase of an external blind security assessment involves finding targets to assess. Beyond simply locating targets, any good auditor (or attacker) knows that the easiest targets are those lost, forgotten machines that lie "off the radar" of the IT security team. In this chapter, we'll discuss ways Google can help with the network discovery phase of an external blind assessment.This is an important skill for any auditor, since more and more networks are being compromised not through exploitation of vulnerabilities found on heavily guarded carefully monitored "front door" systems, but through exploitation of lost, forgotten systems that fall off the radar of already overworked administrators. We'll begin the chapter by discussing a very basic methodology for network discovery. Next, we'll look at some specific ways Google can be used to help in the discovery process. We'll discuss site crawling, domain name determination, link mapping, and group tracing, techniques that have proven to be excellent ways to enumerate the hosts that exist on a network. As we wrap up this chapter, we discuss various ways that Web-enabled network devices can be discovered and exploited via Google to reveal surprisingly detailed information about a target network. As you read this chapter, bear in mind that the topic of network discovery is quite broad. In fact, an entire book could be dedicated to the mastery of this technique. However, Google plays a valuable role in this process, and it's our hope that this chapter will provide you with just a few more tricks for your network discovery toolkit.
Mapping Methodology
In the context of the Internet, computers are categorized within domains.The most famous top-level domain, .COM, has practically become a household word. Working back from a top-level domain, company and server names are tacked on from right to left until a fully qualified domain name (FQDN) is formed. The FQDN (like www.sensepost.com) serves as a human-friendly address to a virtual location on a network, like the Internet. Although they serve us humans well as handy memory hooks, the machines that make up the Internet care little for these frilly FQDNs, preferring to reference machines on a network by a numeric Internet Protocol (IP) address. Granted, this is a simplistic view of the way things work on the Internet, but the point is that we, like Google, often prefer to speak in terms of FQDNs and domain names, reserving the numeric part of our limited memories for more important things like phone numbers and personal gross
yearly earnings. However, when attempting to discover targets on a network, domain names and IP addresses need to be equally considered.
Since Google works so well with domain names (remember the site operator), a network discovery session can certainly begin with a domain name. We'll use sensepost.com as an example domain since SensePost has pioneered many unique network discovery techniques, some of which we'll discuss in this chapter. SensePost, like most companies, has several registered domain names. In the first phase of a solid mapping methodology, we must first discover as many domain names associated with SensePost as possible. In addition to discovering domains owned by the target, it's often important to review sites linked to and sites linked from the target.This reveals potentially important relationships between domains and could provide important clues about any type of trust relationships between the two domains. Armed with a list of domains owned by the target, a list of subdomains could be gathered. A subdomain extends a domain name by one level. For example, sales.sensepost.com could be a valid subdomain of sensepost.com. In most cases, each subdomain points to a distinct machine on the network. A domain of ftp.sensepost.com could point to a dedicated FTP server, while www.sensepost.com could point to a dedicated Web server. Because of this, it's important to determine IP addresses used by the target network. Since address space on the Internet is regulated, each IP address must be properly registered. Since IP address registration information is public, it's fairly common for security auditors to query the various Internet registrars for information about a particular IP address. This registration information includes contact name, address, telephone number, and information about the IP address block owned by the target. This block of addresses allows you to safely expand the scope of your assessment without worrying about stumbling onto someone else's network during your audit. Once IP addresses are determined, the audit will generally begin to blur into the next phase, the host assessment phase. Each IP address must be tested or "pinged" by any variety of methods to determine if the machine is alive and accessible. Machines are then scanned to determine open ports, and applications running on these ports are tested for vulnerabilities.
Although many different tools and techniques could be employed for each phase of this (admittedly basic) methodology, Google's search capability can play an important role in each of these phases, as we'll see in the following sections.
Mapping Techniques
In this section, we'll see creative ways Google can be used to assist in the network discovery and mapping process. The techniques here are presented in roughly the same order they appear in the mapping methodology.
Domain Determination
Since it's important to gather as many domain names as possible, we need to discuss some techniques for determining domain names the target may own. One of the most common sources for domain information is the various Internet registries. Techniques for exploring Internet registries are well known and well documented. However, a few very simple methods can be used to determine the possible domain names registered by an organization. At the 2003 BlackHat briefings in Las Vegas, SensePost presented an excellent paper entitled "Putting the Tea Back into Cyber Terrorism" in which Roelof Temmingh discussed this very topic. Roelof's suggestions were simple, yet effective.
First, and most obviously, determine where the organization is based. This will affect the top-level domain (TLD). Sites in the United States often use the common .COM, .NET, .ORG domains. Outside the United States, sites will often use a domain name like .co.XX or .com.au, where XX represents a country code. In some cases, it's possible that the target organization has Web sites registered in many different countries. In this case, multiple TLDs should be searched. Once a TLD is determined, the first obvious domain includes the common name of the company, stripped of spaces, followed by the TLD; for example,Telstra's Australian site Telstra.com.au. Other domain names can be determined using these techniques:
■ If the organization's name has a common abbreviation, use that. For example, National Australian Bank, nab.com.au.
■ If the organization is known by a common abbreviation that would create an ambiguous or invalid domain name, a country abbreviation could be included in the domain name. For example, consider Deutsche Telekom at dtag.de or Japan Airlines at jal.co.jp.
■ If the organization name contains spaces, remove them, appending the TLD. For example, Banco do Brasil at bancodobrasil.com.br.
■ If the organization name contains many words, attempt all the words in the name. For example, consider lucent.com.
■ If a domain search returns domain names that don't seem to fit, consider using a correlation function to determine how many sliding three-character instances match between the company name and the domain name. For example, Coca Cola Enterprises found at cokecce.com, or Kansai Electric Power found at kepco.co.jp.
These techniques work very well at determining domain names, even when the domain names are not "public." For example, a Google search for site:nab.com.au returns no hits, even though the site resolves and forwards to the National Australian Bank Web site. However, for the vast majority of domain names, simply entering a company name into a properly formatted Google query will list many viable domain names, as we'll see in the next section.
Site Crawling
Simply popping a company name into Google often returns the most popular domain name for that company. However, gathering a nice list of subdomains can take a bit more work. Consider a search for site:microsoft.com shown in Figure 5.1.
Looking at the first five results from this query, there's not much variety in the returned DNS names. Only two unique domain names were returned— www.microsoft.com and msdn.microsoft.com—the latter of which is most likely a subdomain since it does not begin with a common-looking hostname like "www." One way to narrow our search to return more domain names is by adding a negative search for www.microsoft.com. For example, consider the results of the query site:microsoft.com —site:www.microsoft.com,or site:microsoft.com —site:www.microsoft.com as shown in Figure 5.2.
This search returns more variety, returning four new domain names in the first four results.These names (msdn, msevents, members, and support) could also be added as negative queries to locate even more results. A technique like this is very cumbersome, unless it is automated. We'll cover more automation techniques later, but let's consider two simple examples. First, we'll look at a page scraping technique.
Page Scraping Domain Names
Using the popular command-line browser lynx supplied with most UNIX-based operating systems, we could grab the first 100 results of this query with a command like:
lynx -dump "http://www.google.com/search?\
q=site:microsoft.com+-www.microsoft.com&num=100M > test.html
This would save the results of the query to a file, which we could process to extract domain names. Note that Google does not condone automated queries as mentioned in their Terms of Service located at www.google.com/terms_of_ser-vice.html. However, Google has not historically complained about the use of the lynx browser to perform this type of query. Once the results are saved to the test.html file, a few shell commands can be used to extract domain names as shown in Figure 5.3.
This process yields 13 unique subdomains (including the www.microsoft.com domain) from a single page of 100 Google hits. Extending the search involves simply appending &start=100 to the end of the lynx URL, appending the html into the test.html file, and then running the shell script again.This will return results 100—200 from Google. In fact, this process could be repeated over and over again until 1000 Google results are retrieved. However, keep in mind that the 80/20 rule applies here: In most cases, you'll get 80 percent of the best results
from the first 20 percent of work. For example, extending this search to retrieve 1000 Google results returns the following subdomains:
This list includes only 18 subdomains.This means that over 70 percent of the results came from the first 100 Google results, while less than 30 percent of the results came from the next 900 results! In cases like this, it may be smarter to start reducing the more common domain names (msdn, support, download) from the Google query before trying to grab more data from Google. It's always best to search smart and parse less.
API Approach
Another alternative for gathering domain names involves the use of a Perl script. The Google API allows for 1000 queries per day and is the only approved way to automate Google queries. One excellent script, dns-mine.pl, was written by Roelof Temmingh of SensePost (www.sensepost.com).This script is covered in detail in Chapter 12, but let's look at dns-mine in action. Figure 5.4 shows a portion of the output from dns-mine run against microsoft.com.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment