Saturday, December 5, 2009

Inurl and Allinurl: Finding Text in a URL

The phonebook operator searches for business and residential phone listings. Three operators can be used for the phonebook search: rphonebook, bphonebook and phonebook, which will search residential listings, business listings, or both, respec-tively.The parameters to these operators are all the same and usually consist of a series of words describing the listing and location. In many ways, this operator functions like an allintitle search, since every word listed after the operator is included in the operator search. A query such as phonebook:john darling ny would list both business and residential listings for John Darling in New York. As shown in Figure 2.28, links are provided for popular mapping sites that allow you to view maps of an address or location.

If you were only interested in a residential or business listing, you would use the rphonebook and bphonebook operators, respectively.There are other ways to get to this information without the phonebook operator. If you supply what looks like an address (including a state) or a name and a state as a query, Google will return a link allowing you to map the location in the case of an address (see Figure 2.29) or a phone listing in the case of a name and street match.

Hey, Get Me Outta Here!

If you're concerned about your address information being in Google's databases for the world to see, have no fear. Google makes it possible for you to delete your information so others can't access it via Google. Simply fill out the form at www.google.com/help/pbremoval.html and your information will be removed, usually within 48 hours. This doesn't remove you from the Internet (let us know if you find a link to do that), but the page gives you a decent list of places that list similar information. Oh, and Google is trusting you not to delete other people's information with this form.

The phonebook operators do not provide very informative error messages, and it can be fairly difficult to figure out whether or not you have bad syntax. Consider a query for phonebook:john smith. This query does not return any results, and the results page looks a lot like a standard "no results" page, as shown in Figure 2.30.

To make matters worse, the suggestions for fixing this query are all wrong. In this case, you need to provide more information in your query to get hits, not fewer keywords, as Google suggests. Consider phonebook:john smith ny, which returns approximately 600 results.

Colliding Operators and Bad Search-Fu

As you start using advanced operators, you'll realize that some combinations work better than others for finding what you're looking for. Just as quickly, you'll begin to realize that some operators just don't mix well at all.Table 2.3 shows which operators can be mixed with others. Operators listed as "No" should not be used in the same query as other operators. Furthermore, these operators will sometimes give funky results if you get too fancy with their syntax, so don't be surprised when it happens.

This table also lists operators that can only be used within specific Google search areas and operators that cannot be used alone. The values in this table bear some explanation. A box marked "Yes" indicates that the operator works as expected in that context. A box marked "No" indicates that the operator does not work in that context, and Google indicates this with a warning message. Any box marked with "Not really" indicates that Google attempts to translate your query when used in that context. True Google hackers love exploring gray areas like the ones found in the "Not really" boxes.

Can Be

Allintext gives all sorts of crazy results when it is mixed with other operators. For example, a search for allintext:moo goo gai filetype:pdf works well for finding Chinese food menus, whereas allintext:Sum Dum Goy intitle:Dragon gives you that empty feeling inside—like a year without the 1985 classic The Last Dragon (see Figure 2.31).

Despite the fact that some operators do combine with others, it's still possible to get less than optimal results by running your operators head-on into each other.This section focuses on pointing out a few of the potential bad collisions that could cause you headaches. We'll start with some of the more obvious ones.

First, consider a query like something —something.This query returns nothing, and Google tells you as much.This is an obvious example, but consider intitle:something —intitle:something.This query, just like the first, returns nothing, since we've negated our first search with a duplicate NOT search. Literally, we're saying "find something in the title and hide all the results with something in the title." Both of these examples clearly illustrate the point that you can't query for something and negate that query, because your results will be zero.

It gets a bit tricky when the advanced operators start overlapping. Consider site and inurl.The URL includes the name of the site. So, extending the "don't contradict yourself" rule, don't include a term with site and exclude that term with inurl and vice versa and expect sane results. A query like site:microsoft.com -inurl:microsoft.com doesn't make much sense at all, and the results are somewhat trippy, as shown in Figure 2.32.

These search results, considered junk by most Web searchers, are just the kind of things that Google hackers pride themselves in finding and working with. However, when you're really trying to home in on a topic, keep the "rules" in mind and you'll accelerate toward your target at a much faster pace. Save the rule breaking for your required Google hacking license test!

Here's a quick breakdown of some broken searches and why they're broken:

site:com site:edu A hit can't be both an edu and a com at the same time. What you're more likely to search for is (site:edu | site:com), which searches for either domain.

inanchor:click —click This is contradictory. Remember, unless you use an advanced operator, your search term can appear anywhere on the page, including title, URL, text, and even anchors.

allinurl:pdf allintitle:pdf Operators starting with all are notoriously bad at combining. Get out of the habit of combining them before you get into the habit of using them! Replace allinurl with inurl, allintitle with intitle, and just don't use allintext. It's evil.

site:syngress.com allinanchor:syngress publishing This query returns zero results, which seems natural considering the last example and the fact that most all* searches are nasty to use. However, this query suffers from an ordering problem, a fairly common problem that can

really throw off some narrow searches. By changing the query to alli-nanchor:syngress publishing site:syngress.com, which moves the allinanchor to the beginning of the query, we can get many more results.This does not at all seem natural, since the allintitle operator considers all the following terms to be parameters to the operator, but that's just the way it is.

link:www.microsoft.com linux This is a nasty search for a beginner because it appears to work, finding sites that link to Microsoft and men­tion the word linux on the page. Unfortunately, link doesn't mix with other operators, but instead of sending you an error message, Google "fixes" the query for you and provides the exact results as "link.www.microsoft.com" linux.

Summary

Google offers plenty of options when it comes to performing advanced searches. URL modification, discussed in the previous chapter, can provide you with lots of options for modifying a previously submitted search, but advanced operators are better used within a query. Easier to remember than the URL modifiers, advance operators are the truest tools of any Google hacker's arsenal. As such, they should be the tools used by the good guys when considering the protection of Web-based information.

Most of the operators can be used in combination, the most notable excep­tions being the allintitle, allinurl, allinanchor, and allintext operators. Advanced Google searchers tend to steer away from these operators, opting to use the intitle, inurl, and link operators to find strings within the title, URL, or links to pages, respectively. Allintext, used to locate all the supplied search terms within the text of a document, is one of the least used and most redundant of the advanced operators. Filetype and site are very powerful operators that search spe­cific sites or specific file types.The daterange operator allows you to search for files that were indexed within a certain time frame. When crawling Web pages, Google generates specific information such as a cached copy of a page, an infor­mation snippet about the page, and a list of sites that seem related.This informa­tion can be retrieved with the cache, info, and related operators, respectively. To search for the author of a Google Groups document, use the author operator.The phonebook series of operators return business or residential phone listings as well as maps to specific addresses.The stocks operator returns stock information about a specific ticker symbol, whereas the define operator returns the definition of a word or simple phrase.

Solutions Fast Track

Allintitle

0 Finds all terms in the title of a page

0 Does not mix well with other operators or search terms

0 Best used with Web, Group, Images, and News searches

Inurl

0 Finds strings in the URL of a page

0 Mixes well with other operators

0 Best used with Web and Image searches

Allinurl

0 Finds all terms in the URL of a page 0 Does not mix well with other operators or search terms 0 Best used with Web, Group, and Image searches

Filetype

0 Finds specific types of files based on file extension

0 Synonymous with ext

0 Requires an additional search term

0 Mixes well with other operators

0 Best used with Web and Group searches

Allintext

0 Finds all provided terms in the text of a page 0 Pure evil—don't use it 0 Forget you ever heard about allintext

Site

0 Restricts a search to a particular site or domain

0 Mixes well with other operators

0 Can be used alone

0 Best used with Web, Groups and Image searches

Link

0 Searches for links to a site or URL

0 Does not mix with other operators or search terms

0 Best used with Web searches

Inanchor

0 Finds text in the descriptive text of links

0 Mixes well with other operators and search terms

0 Best used for Web, Image, and News searches

Daterange

0 Locates pages indexed within a specific date range

0 Requires a search term

0 Mixes well with other operators and search terms

0 Best used with Web searches

Numrange

0 Finds a number in a particular range 0 Mixes well with other operators and search terms 0 Best used with Web searches

Cache

0 Displays Google's cached copy of a page

0 Does not mix with other operators or search terms

0 Best used with Web searches

Info

0 Displays summary information about a page

0 Does not mix with other operators or search terms

0 Best used with Web searches

Related

0 Shows sites that are related to provided site or URL 0 Does not mix with other operators or search terms 0 Best used with Web searches

Phonebook, Rphonebook, Bphonebook

0 Shows residential or business phone listings

0 Does not mix with other operators or search terms

0 Best used as a Web query

Author

0 Searches for the author of a Group post

0 Mixes well with other operators and search terms

0 Best used as a Group search

Group

0 Searches Group names, selects individual Groups 0 Mixes well with other operators 0 Best used as a Group search

Insubject

0 Locates a string in the subject of a Group post 0 Mixes well with other operators and search terms 0 Best used as a Group search

Msgid

0 Locates a Group message by message ID

0 Does not mix with other operators or search terms

0 Best used as a Group search

Stocks

0 Shows the Yahoo Finance stock listing for a ticker symbol 0 Does not mix with other operators or search terms 0 Best provided as a Web query

Define

0 Shows various definitions of a provided word or phrase 0 Does not mix with other operators or search terms 0 Best provided as a Web query

Links to Sites

0 The Google filetypes FAQ, www.google.com/help/ faq_filetypes.html

0 The resource for file extension information, www.filext.com

This site can help you figure out what program a particular extension is associated with.

0 http://searchenginewatch.com/searchday/article.php/2160061

This article discusses some of the issues associated with Google's date restrict search options.

0 Very nice online Julian date converters, www.24hourtransla-tions.co.uk/dates.htm and www.tesre.bo.cnr.it/~mauro/JD/

Frequently Asked Questions

The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com.

Q: Do other search engines provide some form of advanced operator? How do their advanced operators compare to Google's?

A: Yes, most other search engines offer similar operators.Yahoo is the most sim­ilar to Google, in our opinion.This might have to do with the fact that Yahoo once relied solely on Google as its search provider. The operators available with Yahoo include site (domain search), hostname (full server name), link, url (show only one document), inurl, and intitle.TheYahoo advanced search page offers other options and URL modifiers.You can dissect the HTML form at http://search.yahoo.com/search/options to get to the inter­esting options here. Be prepared for a search page that looks a lot like Google's advanced search page.

AltaVista offers domain, host, link, title, and url operators.The AltaVista advanced search page can be found at www.altavista.com/web/adv. Of par­ticular interest is the timeframe search, which allows more granularity than

No comments:

Post a Comment