Saturday, December 5, 2009

Related: Show Related Sites

Each of the search results shown in Figure 2.10 contains HTML links to the www.defcon.org Web site.The link operator can be extended to include not only basic URLs but complete URLs that include directory names, filenames, param­eters, and the like. Keep in mind that long URLs are much more specific and could return fewer results.

The only place the URL of a link is visible is in the browser's status bar or in the source of the page. For that reason, unlike other cached pages, the cached page for a link operator's search result does not highlight the search term, since the search term (the linked Web site) is never really shown in the page. In fact, the cached banner does not make any reference to your search query, as shown in Figure 2.14.

It is a common misconception to think that the link operator can actually search for text within a link. The inanchor operator performs something similar to this, as we'll see next. To properly use the link operator, you must provide a full URL (including protocol, server, directory, and file), a partial URL (including only the protocol and the host), or simply a server name; otherwise, Google could return unpredictable results. As an example, consider a search for link:linux, which returns 14,200 results.This search is not the proper syntax for a link search, since the domain name is invalid. The correct syntax for a search like this might be link:linux.org (with 451 results) or link:linux.com (with 97,500 results). Since none of the numbers on these queries match, what exactly is being returned from Google for a search like link:linux? Figures 2.15 and 2.16 show the answer to this question.

When an invalid link: syntax is provided, Google treats the search as a phrase search. Google offers another clue as to how it handles invalid link searches through the cache page. As shown in Figure 2.17, the cached banner for a site found with a link:linux search does not resemble a typical link search cached banner but rather a standard search cache banner with included highlighted terms.

This is an indication that Google did not perform a link search but instead treated the search as a phrase, with a colon representing a word break.

The link operator cannot be used with other operators or search terms.

Inanchor. Locate Text Within Link Text

This operator can be considered a companion to the link operator, since they both help search links.The inanchor operator, however, searches the text represen­tation of a link, not the actual URL. For example, in Figure 2.17, the link "current page" is shown in typical form—as an underlined portion of text. When you click that link, you are taken to the URL www.kerneltraffic.org/kernel-traffic/latest.html. If you were to look at the actual source of that page, you would see something like

current page

The inanchor operator helps search the anchor, or the displayed text on the link, the words "current page." Inanchor accepts a word or phrase as an argument, such as inanchor:click or inanchor:James.Foster.This search will be handy later, especially when we begin to explore ways of searching for relationships between sites.

The inanchor operator can be used with other operators and search terms.

As we've already discussed, Google keeps snapshots of pages it has crawled that we can access via the cached link on the search results page. If you would like to jump right to the cached version of a page without first performing a Google query to

get to the cached link on the results page, you can simply use the cache advanced operator in a Google query such as cache:blackhat.org or cache:http://wuw.netsec.net.If you don't supply a complete URL or hostname, Google could return unpre­dictable results. Just as with the link operator, passing an invalid hostname or URL as a parameter to cache will submit the query as a phrase search. A search for cache:linux returns exactly as many results as "cache linux", indicating that Google did indeed treat the cache search as a standard phrase search. The cache operator does not always work as expected, and in many cases, you're better off getting to a cached page from a Google results page.

The cache operator cannot be used with other operators or search terms.

Numrange: Search for a Number

The numrange operator requires two parameters, a low number and a high number, separated by a dash. This operator is powerful but dangerous when used by malicious Google hackers. As the name suggests, numrange can be used to find numbers within a range. For example, to locate the number 12345, a query such as numrange:12344-12346 will work just fine. When searching for numbers, Google ignores symbols such as currency markers and commas, making it much easier to search for numbers on a page. Two shortened versions of this operator exist as well. Instead of supplying the numrange operator, you can simply provide two numbers in a query, separated by two periods. The shortened version of the query just mentioned would be 12344..12346. Notice that the numrange oper­ator was left out of the query entirely. In addition, the ext operator can be used as in ext:12344-12346. Each of these shorthand versions return the same results as the matching numrange search.

Bad Google Hacker!

If Gandalf the Grey were to author this sidebar, he wouldn't be able to resist saying something like "There are fouler things than characters lurking in the dark places of Google's cache." The most grave examples of Google's power lies in the use of the numrange operator. It would be extremely irresponsible of us to share these powerful queries with you.

Continued

Fortunately, the abuse of this operator has been curbed due to the dili­gence of the hard-working members of the Search Engine Hacking forums at http://Johnny.ihackstuff.com. The members of that community have taken the high road time and time again to get the word out about the dangers of Google hackers without spilling the beans and creating even more hackers. This sidebar is dedicated to them!

Daterange: Search for Pages Published Within a Certain Date Range

The daterange operator can tend to be a bit clumsy, but it is certainly helpful and worth the effort to understand.You can use this operator to locate pages indexed by Google within a certain date range. Every time Google crawls a page, this date changes. If Google locates some very obscure Web page, it might only crawl it once, never returning to index it again. If you find that your searches are clogged with these types of obscure Web pages, you can remove them from your search (and subsequently get fresher results) through effective use of the daterange operator.

The parameters to this operator must always be expressed as a range, two dates separated by a dash. If you only want to locate pages that were indexed on one specific date, you must provide the same date twice, separated by a dash. If this sounds too easy to be true, you're right. It is too easy to be true. Both dates passed to this operator must be in the form of two Julian dates. The Julian date is the number of days that have passed since January 1, 4713 B.C. For example, the date September 11, 2001, is represented in Julian terms as 2452164. So, to search for pages that were indexed by Google on September 11, 2001, and contained the word "osama bin laden," the query would be daterange:2452164-2452164 "osama bin laden".

Google does not officially support the daterange operator.The Google folks prefer you use the date limit on the advanced search form found at http://www.google.com/advanced_search.As we discussed in the last chapter, this form creates fields in the URL string to perform specific functions. Google designed the as_qdr field to help you locate pages that have been updated within a certain time frame. For example, to find pages that have been updated within the past three months and that contain the word Google, use the query http://www.google.com/search?q=google&as_qdr=m3.

This might be a better alternative date restrictor than the clumsy daterange operator. Just understand that these are very different functions. Daterange is not the advanced-operator equivalent for as_qdr, and unfortunately, there is no oper­ator equivalent. If you want to find pages that have been updated within the past year or less, you must either use Google advanced search interface or stick &as_qdr=3m (or equivalent) on the end of your URL.

The daterange operator must be used with other search terms or advanced operators. It will not return any results when used by itself. In addition, daterange only works with Web searches.

Info: Show Google's Summary Information

The info operator shows the summary information for a site and provides links to other Google searches that might pertain to that site, as shown in Figure 2.18.The parameter to this operator must be a valid URL or site name.You can achieve this same functionality by supplying a site name or URL as a search query.

If you don't supply a complete URL or hostname, Google could return unpredictable results. Just as with the link and cache operators, passing an invalid hostname or URL as a parameter to info will submit the query as a phrase search. A search for info:linux returns exactly as many results as "info linux", indicating that Google did indeed treat the info search as a standard phrase search.

The info operator cannot be used with other operators or search terms.

Related: Show Related SitesThe related operator displays sites that Google has determined are related to a site, as shown in Figure 2.19.The parameter to this operator is a valid site name or URL.You can achieve this same functionality by clicking the Similar Pages link from any search results page or by using the "Find pages similar to the page" (shown in Figure 2.19) portion of the advanced search form.

If you don't supply a complete URL or hostname, Google could return unpredictable results. Passing an invalid hostname or URL as a parameter to related will submit the query as a phrase search. A search for related:linux returns exactly as many results as "related linux", indicating that Google did indeed treat the cache search as a standard phrase search.

The related operator cannot be used with other operators or search terms.

Author: Search Groups

for an Author of a Newsgroup Post

The author operator will allow you to search for the author of a newsgroup post. The parameter to this option consists of a name or an e-mail address.This oper­ator can only be used in conjunction with a Google Groups search. Attempting to use this operator outside a Groups search will result in an error. When you're searching for a simple name , such as authorJohnny, the search results will include posts written by anyone with the first, middle, or last name of Johnny, as shown in Figure 2.20.

As you can see, we've got hits for Johnny Lurker, Johnny Walker, Johnny, and Johnny Anderson. Makes you wonder if those are real names, doesn't it? In most cases, these are not real names. This is the nature of the newsgroup beast. Pseudo-anonymity is fairly easy to maintain when anyone can post to newsgroups through Google using nothing more than a free e-mail account as verification.

The author operator can be a bit clumsy to use, since it doesn't interpret its parameters in exactly the same way as some of the operators. Simple searches such as authorJohnny or author:Johnny@ihackstuff.com work just as expected, but things get dicey when we attempt to search for names given in the form of a phrase. Consider a search like author:"Johnny Long", an attempt to search for an author with a full name of Johnny Long. This search fails pretty miserably, as shown in Figure 2.21.

This search found the word Johnny in the author name but passed off the word Long as a generic search, not an author search, as indicated by the lack of Long in the author name and the existence of Long in the post titles. Passing the query of authorJohnny.long, however, gets us the results we're expecting: Johnny Long as the posts' author, as shown in Figure 2.22:

The author operator can be used with other valid Groups operators or search terms.

Group: Search Group Titles

This operator allows you to search the title of Google Groups posts for search terms. This operator only works within Google Groups. This is one of the opera­tors that is very compatible with wildcards. For example,to search for groups that end in forsale, a search such as group:*.forsale works very well. In some cases, Google finds your search term not in the actual name of the group but in the keywords describing the group. Consider the search group:windows, as shown in Figure 2.23. Not all the results of this search contain the word windows, yet all the returned groups discuss Windows software.

In our experience, the group operator does not mix very well with other operators. If you get odd results when throwing group into the mix, try using other operators such as intitle to compensate.

Insubject: Search Google Groups Subject Lines

The insubject operator is effectively the same as the intitle search and returns the same results. Searches for intitle:dragon and insubject:dragon return exactly the same number of results.This is most likely because the subject of a group post is also

the title of the post. Subject is (and was, in DejaNews) the more precise term for a message title, and this operator most likely exists to help ease the mental shift from "deja searching" to Google searching.

Just like the intitle operator, insubject can be used with other operators and search terms.

Msgid: Locate a Group Post by Message ID

The msgid operator, available only for Groups searching, takes only one operator, a group message identifier. A message identifier (or message ID) is a unique string that identifies a newsgroup post.The format is something like xxx@yyy.com.

To view message IDs, you must view the original group post format. When viewing a post (see Figure 2.24), simply click the original format link.You will be taken to a text-only page that lists the entire content of the group post, as shown in Figure 2.25.

[G]rinp://groupsrgooglercorTi/gjroups?selrri»9t89ii© " rQ? Google

From: Lensman Newsgroupsi alt.hacking Subject: Res google primer

Date: Fri, 14 May 2004 10[54:01 +0000 (UTC) Organization: 3T Openworld Lines: 4 3

Message-ID: <9tfl9a0d61aa55 5njo!2 9t99s1ir7eebofcbi4 ax►com* References:

<4ft&33f 9d. Oinewsl .mweb.ee . ia> <^SS7a0h3212rnkn)sc)c2n9iminj2ugk3qab7eP4aji + ccjm> Reply-To: piresidentSwhitohouse. gov

NNTP-Posting-Host: hoBt2l7-45-2.5d-4 9.in-addr.btDponwarld.com Mime-Varaion: 1. 0

Content-TypO: text/plain; eharHet-us-aflCii C o nte n t -Trana for- Eneodi ng: 7 b it

Trace: harcLilea.btlnternat.com 1084532041 11181 217 .45 . 254 . 49 (14 Hay 2004 10:54:01 GMT)

To retrieve the message shown in Figure 2.25, use the query msgid: 9t89a0d6laa555njo129t99s1ir7eebo6b@4ax.com.

The msgid operator does not mix with other operators or search terms.

Stocks: Search for Stock Information

The stocks operator allows you to search for stock market information about a particular company. The parameter to this operator must be a valid stock abbrevi­ation. If you provide an invalid stock ticker symbol, you will be taken to a screen that allows further searching for a correct ticker symbol, as shown in Figure 2.26.

The stocks operator cannot be used with other operators or search terms.

Define: Show the Definition of a term

The define operator returns definitions for a search term. Fairly simple, and very straightforward, arguments to this operator may be a word or phrase. Links to the source of the definition are provided, as shown in Figure 2.27.

The define operator cannot be used with other operators or search terms.

Phonebook: Search Phone Listings

No comments:

Post a Comment