Monday, November 22, 2004

Google Search Techniques

Students these days are well aware of the power of search engines for their assignments, reports, presentations, etc. The load of information accessible via the Internet has made the use of search engines very popular among students, especially those enrolled in undergraduate and graduate programs. Search engines amass a wealth of information on a daily basis and sorting through the results could be a very hectic task for time-strapped students. The most popular search engine indexes more than 8 billion web pages! Surely, sorting through such a huge database is not trivial. However, search engines also include special mechanisms to tweak the search queries, that could restrict the results to only the most relevant, and make sorting through the junk a breeze.

The first thing to know is that the Google engine is programmed to exclude several common words such as "where" and "how." Therefore, if you search for "How do networks work?", your results would most likely be distorted because the common word "how" was ignored. The proper way to query Google is to place these words in quotes. Google also restricts certain other words such as "and" and "of". Suppose your query was "History of Pakistan". In all probability, you would end up with redundant results because the "of" was ignored. Simply precede the excluded words with a + sign to solve the problem.

Google search engine is popular because it is based on indexing technology - unlike Yahoo. The Google spiders continuously move around the cyberspace looking for newer sites. It is, thus, quite possible that you may want to search through a specific site for your required information. If you want to query a specific site for a search term, Google makes that possible too. Suppose you wanted to search for the phrase IT at DAWN. Using Google, type "IT site:www.dawn.com".

Searching through the information databases requires some careful thinking. Most of the search engines like Google tend to return results based on the number of links to a particular site. It is a popular practice by academic web sites to link to other web sites that host similar content. Therefore, if you have an authentic reference site and would like to find others that are similar to the one you found, try "related:www.microsoft.com". Sometimes you may have to work your way backward to find sites with similar content. If you have tried tweaking your queries and are still unable to get the related sources, it is quite possible that Google has not yet indexed the appropriate sites. However, your required site may exist on some other sites' links that the Google spiders have crawled onto. In such a case, you can try to find sites that have links to a particular web page that Google has already indexed by typing "link:www.microsoft.com" . This would yield sites that have links to the microsoft.com - sites that link to a particular site such as microsoft.com are likely to have similar content, more or less.

Apart from web pages, there are other documents such as MS-Word and Adobe Acrobat (PDF) documents that may be of interest to you. Since a pdf or a word file is convenient to read, easier to print and does not clog the telephone line, you may also carry out a search that returns only these file types. Using Google, type "IT filetype:doc OR filetype:pdf". This would search for MS-Word or PDF files that contain the phrase IT. Once you have the required results, you can download the file and read at your leisure.

The WWW started getting popular in 1995 with the introduction of HTML. Today, in 2004, there are more than a billion pages. As such, it is quite likely that your queries may turn up pages that have obsolete information. To rectify this problem, Google searches can be carried out for a specific time period that would ensure that the search results are current. A date restricted search can be carried out by typing "IT datarange:2902322-2902422". The range takes the start and end dates in Julian format. The Julian date is calculated by the number of days since January 1, 4713 BC. You can simply query Google to find a Julian date for any date range that you want to use. Alternatively, you can use http://aa.usno.navy.mil/data/docs/JulianDate.html for conversion.

The Google engine returns query results based on the words that are found in either the Title, URL or the body of the web pages. However, a matching result in the Title or the URL is hardly the kind of search that most students are looking for. More often, people require content and the content is found within the body of the web pages. Therefore, to restrict results to those with all of the query words in only the body text, use "allintext:IT in Pakistan". This would yield results that carry the term in the body. This technique has become more than necessary because of the amazing number of sites that carry misleading titles and URL's that are only discovered on accessing the site. This would also prevent you from navigating to sites that spawn worms or download adware.

The Google search engine can also be used as a virtual dictionary. You no longer have to roam the cyberspace for sites to find meanings for a particular word or phrase. Using Google, type "define:retrospective" to find the meaning for the word retrospective. The define operator shows you a list of definitions aggregated from various sources.

The query strategies shown above can be combined with one another to target the most appropriate site of your interest. For example, if you want to search for PDF files in the a sub-section of a web site you would use "outsourcing+opportunities filetype:pdf site:www.microsoft.com". This would show all the pdf files at microsoft.com that deal with outsourcing. You can also redefine your search terms with a tilde operator "~" to also search for synonyms. In the end, it is important to emphasize that a search engine is only as good as the query it is fed. Cyberspace is loaded with informative sites - all it takes is a little tweaking of the query to find the one that is most relevant.

End Note: Please make sure that all operators that end with a colon ":" have no space between the colon and the search term. Example: site:www.microsoft.

No comments: