Hackers Libray: Link Mapping

dns-mine searches for the name of the company combined with different types of common words like site, web, document, internet, link, or about.The script then intelligently parses the query results to find DNS names and subdomains. As you can see from the output in Figure 5.4, dns-mine located nearly twice as many DNA names as our previous technique, with nearly the same number of queries.

Link Mapping

Beyond gathering domain and subdomain names, many times it's important to understand nonobvious relationships between Web sites. In some cases, locating a vulnerability in a poorly secured trusted partner site is a simple way to slip inside a heavily-guarded "big iron" target. One of the easiest ways to determine obvious relationships between Web sites is to take some time to explore a target Web site. If your target links to a page, there may be some kind of trust relationship that could be exploited. If some other site links to your target site, this may also indicate some kind of relationship, but this kind of "inbound link" is less meaningful since any Internet user can throw up a link to any Web site she pleases. In technical terms, a

link from your target site has more weight than a link to your target site. However, if two sites link to each other, this indicates a very strong relationship.This type of relationship exists at the first degree of relevance, but there exists other degrees of relevance. For example, if our target site (siteA) links to another site (siteB), and that site links to a third site (siteC) that hosts a link back to our target (siteA), there is a relationship (albeit a loose relationship) between our target and siteC via siteB. This overly simplifies the very important concept of "link weighting."The researchers at SensePost (www.sensepost.com) have put a lot of time and effort into uncovering online nonobvious relationships and exploiting the relevance of these relationships in the context of security work.Their BlackHat 2003 Paper entitled "The role of non-obvious relationships in the footprinting process" details some very powerful "footprinting" techniques that apply to this topic of network mapping. We won't be able to do SensePost's awesome work justice in a few short pages, but suffice it to say that Google plays a very important role in the mapping process.The link operator, for example can be used to determine what sites link to a target (like www.sensepost.com) at the first level of relevance with a query like link:www.sensepost.com as shown in Figure 5.5.

This query reveals that several sites including dewil.ru, list.ceneca.it, and archives.neophasis.com link to www.sensepost.com. If www.sensepost.com is our target site, these sites provide lightly weighted inbound links to www.sensepost.com. In order to attempt to uncover a more heavily weighted relationship between these sites and SensePost, we need to determine if www.sensepost.com links to them.It might seem logical, then, to reverse our Google query to locate outbound links from SensePost to, say, dewil.ru, with a query like link:dewil.ru site:www.sense-post.com, but unfortunately the link operator is not this flexible. As an alternative, we could begin surfing all of SensePost's Web site, searching for links to dewil.ru, but this is indeed a tedious process, especially if we stop to consider secondary and (God forbid) tertiary degrees of relevance. Simply keeping the list of links straight is too much work. Automation, combined with a decent weighting algorithm, is key to this process. Thankfully, the researchers at SensePost have developed a tool to help this process along.The Bi-directional link extractor (BiLE) program, coded in Perl, uses the Google API to help determine the relevance of the subtle relationships between sites. From the BiLE documentation:

"BiLE tries to do what is normally considered a manual process. It crawls a specified web site (mirrors the site) and extracts all links from the site. It then queries Google via the Google API and obtains a list of sites that link to the target site. It now has a list of sites that are linked from the target site, and a list of sites that link to the target site. It proceeds to perform the same function on all the sites found in the first round.The output of BiLE is a file that contains a list of source site names and destination site names."

Of course, the "magic" in this process is the weighting, not the collection of links to and from our target. Fortunately, BiLE's companion program, BiLE-weigh, comes to the rescue. BiLE-weigh reads the output from the BiLE program and calculates the weight (or relevance) of each link found. Several notes are listed in the documentation:

■ A link from a site weighs more than a link to a site.

■ A link from a site with many links weighs less that a link from a site with a small amount of links.

■ A link to a site with many links to the site weighs less than a link to a site with a small amount of links to the site.

■ The site that was given as input parameter need not end up with the highest weight—a good indication that the provided site is not the central site of the organization.

Let's take a quick look at BiLE in action. To install BiLE, we first need to satisfy a few requirements. First, the httrack program from www.httrack.com must be downloaded and installed. This program performs the Web site mirroring. Next, the expat XML parser from http://sourceforge.net/projects/expat must be downloaded and installed.The SOAP::Lite and HTML::LinkExtor Perl CPAN modules must be installed. The most common method of installation for these modules is perl -MCPAN -e 'install SOAP::Lite' and perl -MCPAN -e 'install

HTML::LinkExtor', respectively. Last but not least, a Google API key must be obtained from www.google.com/apis and the GoogleSearch.wsdl file must be copied to (preferably) the BiLE directory. Once these requirements are met, BiLE must be configured properly by editing the main BiLE Perl script. From the BiLE Readme file:

my $GOOGLEPAGECOUNT=5;

#How many seconds to wait for a page on Google my $HTTRACKTIMEOUT=60;

#How long to wait for the mirror of a site to complete

my $HTTRACKTEMPDIR="/tmp";

# Where to store temporary mirrors

my $HTTRACKCMD="/usr/bin/httrack";

# The location of the HTTtrack executable

my $GOOGLEKEY="<>";

# Your Google API key

my $GOOGLE_WSDL="file:GoogleSearch.wsdl";

# Location of the Google WSDL file

Once these options are set properly, BiLE can be launched, providing the target Web site and an output filename as arguments as shown in Figure 5.6. Depending on the complexity of the target site and the number of links processed, BiLE could take quite some time to run.

Since the main BiLE program simply collects links, the weight program must be run against the BiLE output file. The BiLE-weigh program is run with the name of the target site, the name of the BiLE output file, and the name of the BiLE-weigh output file as arguments as shown in As shown in the output file, relationships are listed in descending order from the most relevant to the least relevant. A higher scored site is more relevant to the target. According to this output file, two of the sites discovered in the first three Google link results are listed here, dewil.ru and list.cineca.it, although other sites are listed as more relevant. BiLE has surprisingly accurate results and is a shining example of how powerful clever thinking combined with intelligent Googling can be. Hats off to SensePost for designing this (and many other) clever tools that showcase the power of Google!

Google Worms

Worms, automated attack programs that spread across the Internet at lightning speed, are truly evil creations. However, consider for a moment how devastating a worm could be if it used Google to both locate and attack targets. Sound far-fetched? It's not. Check out Michal Zalewski's terrific Phrack article entitled "Rise of the Robots" at www.phrack.org/show.php?p=57&a = 10, or Imperva's paper located at www.imperva.com/docs/Application_Worms.pdf.

Group Tracing

It's not uncommon for techies to post questions to newsgroups when they run into technical challenges. As a security auditor, we could use the information in newsgroup postings to glean insight into the makeup of a target network. One of the easiest ways to do this is to put the target company name into a Google Groups author search. For example, consider the Google Groups posting (shown in original format) found with the query author@Microsoft.com shown in Figure 5.8.

The header of this newsgroup posting reveals a great deal of information, but from the standpoint of creating a network map, the NNTP-Posting-Host, listed as 131.107.71.96, is relevant.This host, which resolves to tide133.microsoft.com, can be added to a network map as an NNTP server, without ever sending a single packet to that network, all because of a single Google query. In addition, this information can be reversed in an attempt to find more usernames with a Groups query of 131.107.71.96 as shown in Figure 5.9.

These results reveal that David Downing, Tatyana Yakushev, and Nick are all most likely Microsoft employees since they use MSFT in their descriptions and have posted messages using an apparently nonpublic Microsoft NNTP server. Under normal circumstances, this "Nick" character could be just about anyone, but his use of a Microsoft-only NNTP server confirms his identity, and ties him to both David and Tatyana. There is also the possibility that these three employees work in the same office as they have similar job duties (evidenced by their posting to the same specifically technical newsgroup) and share an NNTP server. This type of information could be handy for a social engineering effort.

Non-Google Web Utilities

Google is amazing and very flexible, but it certainly can't do everything. Some things are much easier when you don't use Google.Tasks like WHOIS lookups,"pings," traceroutes, and port scans are much easier when performed outside of Google. There is a wealth of tools available that can perform these functions, but with a bit of creative Googling, it's possible to perform all of these arduous functions and more, preserving the level of anonymity Google hackers have come to expect. Consider a tool called NQT, the Network Query Tool, shown in Figure 5.10.

Default installations of NQT allow any Web user to perform IP host name and address lookups, DNS queries, WHOIS queries, port testing, and traceroutes.

This is a Web-based application, meaning that any user who can view the page can generally perform these functions, against just about any target.This is a very handy tool for any security person, and for good reason. NQT functions appear to originate from the site hosting the NQT application. The Web server masks the real address of the user.The use of an anonymous proxy server would further mask the user's identity.

We can use Google to locate servers hosting the NQT program with a very simple query.The NQT program is usually called nqt.pbp, and in its default configuration displays the title "Network Query Tool." A simple query like inurl:nqt.php intitle:"Network Query Tool" returns many results as shown in Figure 5.11.

Hackers Libray

Saturday, December 5, 2009

Link Mapping

No comments:

Post a Comment

Subscribe Now: Feed Icon

FeedBurner FeedCount

Subscribe via email

My Headlines

Followers

Blog Archive

About Me