There are currently over a billion pages of information on the Internet about every topic imaginable. The question is how can you possibly find what you want? Computer algorithms can be written to search the Internet but most are not practical because they must sacrifice precision for coverage. However, a few engines have found interesting ways of providing high quality information quickly. Page value ranking, topic-specific searches, and Meta search engines are three of the most popular because they work smarter not harder.
While no commercial search engine will make public their algorithm, the basic structure can be inferred by testing the results. The reason for this is because there would be a thousand imitation sites, meaning little or no profit for the developers. The most primitive of searches is the sequential search, which goes through every item in the list one at a time. Yet the sheer size of the web immediately rules out this possibility. While sequential might return the best results, you would most likely never see any results because of the web’s inflammatory growth rate. Even the fastest computers would take a long time, and in that time, all kinds of new pages will have been created.
Some of the older ‘spiders’ like Alta Vista are designed to literally roam randomly through the web using links to other pages. This is accomplished with high-speed servers with 300 connections open at one time. These web ‘spiders’ are content based which means they actually read and categorize the HTML on every page. One flaw of this is the verbal-disagreement problem where you have a particular word that can describe two different concepts. Type a few words in the query and you will be lucky if you can find anything relates to what you are looking for. The query words can be anywhere in a page and they are likely to be taken out of context.
Content-based searches can also be easily manipulates. Some tactics are very deceptive, for example “…some automobile web sites have stooped to writing ‘Buy This Car’ dozens of times in hidden fonts…a subliminal version of listing AAAA Autos in the Yellow Pages”(1). The truth is that one would never know if a site was doing this unless you looked at the code and most consumers do not look at the code. A less subtle tactic is to pay to get to the top. For example, the engine GoTo accepts payment from those who wish to b...
... middle of paper ...
... meta search engine can achieve several advantages:
1 It will present to users a more sophisticated interface…
2 Make the translation more accurate
3 Get more complete and precise results
4 Improve source selection and running priority decisions” (3).
Again the idea of optimizing the Internet through intelligent software shows up. It is just a matter of designing a certain algorithm that does not forget what it has learned.
Most people did not foresee the tremendous growth of the Internet in the 1990’s. Computer algorithms have gone from small government programs to every personal computer in the world. You start with the most basic problem solving and end up with the most complex of problem solving. That of course is sorting through a database that grows almost exponentially.
Plain and simple, the Internet has a lot of information on it. A crawler works twenty-four hours a day digging through it all. The search engine pulls out the parts people want and hands it to the Meta search engine. The Meta search engine further discriminates until you get exactly what you are looking for. Yet behind all this are machines performing the instructions they have been given – an algorithm.
In 1998, when Google created its search engine, very little data was available about search engines. One of the first search engines was the Wold Wide Web Worm, which was not released unitl 1994. In order to research and create a more dynamic search engine Google’s creators had very little information to go on, and encountered many challenges. It is a challenge to create a high quality search engine as search engines need to crawl and then index millions of pages of information on the web. An additional feature of Google’s large scale search engine is that it will use hypertext information to refine search results. The challenge is to be able to scale the vast amount of data available on the web. The main goal is to improve the quality of search results. The second goal is to be able to make all the data on the web available for academic research. One of the key features of Google that sets it apart from other web search engines is that it is built to scale well to large data sets. It plans to leverage the increase in technological advances and decrease in hardware and storage costs to create its robust system.
The internet is an ever increasingly powerful tool for finding everything from entertainment to reference to daily news. When first created, the internet was only a shadow of what it has become. Most people didn't even have a computer, let alone a connection to the internet. In the last decade, however, computers have become more and more affordable, and internet service providers have become far more widespread. According to the World Almanac and Book Of Facts 2001 "By early 2000, more than 300 million people around the world were using the Internet, and it is estimated that by 2005, 1 billion people may be connected" (World Almanac). As with any new, powerful technology, the internet has brought with its positive aspects, a number of new problems which will have to be dealt with in the next several years.
The algorithms of search engines are as mysterious as the Force in Star Wars. Some claim to know everything there is and keeps up with the latest updates from Google and Bing, while others know the underhanded tactics that let them squeak by during the updates.
This utility lets the end user easily locate information using keywords and phrases. In a few short years this has become the”most widely used searching tool on the Internet.” (Levin, 60) The annual growth rate for Gopher traffic is 997%! (Fun Facts, 50) Up until recently, this Internet protocol had been mainly used by the government and academics. But it has caught on and is being used for business and leisure purposes. If one is interested in the latest NFL scores, schedules and point spreads, they can easily access this information at News and Weather. Business administrators can learn more about total quality management (TQM) by visiting (Maxwell, 299 and 670)
The Google search engine found at http://www.google.com/ offers many features, including language and document translation; web, image, newsgroups, catalog, and news searches; and more. These features offer obvious benefits to even the most uninitiated web surfer, but these same features offer far more nefarious possibilities to the most malicious Internet users, including hackers, computer criminals, identity thieves, and even terrorists. This article outlines the more harmful applications of the Google search engine, techniques that have collectively been termed "Google hacking." The intent of this article is to educate web administrators and the security community in the hopes of eventually stopping this form of information leakage. This document is an excerpt of the full Google Hacker's Guide published by Johnny Long, and located at http://johnny.ihackstuff.com/.
...e, so a search result with the same key words could return different results every time the user attempts to find the same thing. To make matters worst LookSmart isn’t the only search engine they use either. They use Teoma, Inktomi, and Overture as well, which have other ways of charging the web masters.
According to Lynch (2008), creating a web based search engine from scratch was an ambitious objective for the software requirement and the index website. The process of developing the system was costly but Doug Cutting and Mike Cafarella believed it was worth the cost. The success of this project unlocked the ultimately democratized algorithm of search engine system. After the success of this project, Nutch was started in 2002 as a working crawler and gave rise to the emergence of various search engines.
Then a set of relevant pages with the given user query is retrieved using the search engine. The relevant set of pages is called the root set.
The Internet has become a popular source for retrieving information on practically any subject. This information can generally be retrieved in a matter of seconds. With the popularity of the internet as a research tool it’s important that the information received is reliable and accurate. In general, when one uses a search engine to perform a search on the internet, the quantity of information returned is astronomical. “In a world of information overload, it is often extremely difficult to get a grip on the correctness, completeness and the legitimacy of the information and material available in the internet.” (Prins).
In today’s fast paced technology, search engines have become vastly popular use for people’s daily routines. A search engine is an information retrieval system that allows someone to search the...
The fact that the Internet is bristling full of information, too much information for a single human being to comprehend, is not the problem, but the real issue is in the quality of the information therein. The old lesson on Internet searching is when you enter for example, "computers," and the search engine returns 10 of an abominable 8,102,365 matches. You would exclaim, "Wow! There is a lot of information in there." Then you would ask, "How do you know what is good?" Where is the quality? Portals (who run search engines) these days are adding value to their searched information thereby returning higher quality results, often grouped by appropriate categories, thus pinpointing useful information for the learning public.
Taubes, Gary. "Indexing the Internet." Science 269.5229 (1995): 1354+. Expanded Academic ASAP. Web. 16 Mar. 2011
When searching on the Internet, one may find it difficult sometimes to know where to start. With the seemingly limitless amount of information, one should use the resource suitable for the searcher's needs and tastes. Comparing different factors like databases, directory types, strengths and weaknesses of two search engines, such as Yahoo! and Lycos, can provide an advantage to someone looking for a starting block.
meta-search engines: A survey, 2012. Yiyao Lu, Weiyi Meng, Liangcai Shu, Clement Yu, and King lup Liu.
Search engines are not very complex in the way that they work. Each search engine sends out spiders to bots into web space going from link to link identifying all pages that it can. After the spiders get to a web page they generally index all the words on that page that are publicly available pages at the site. They then store this information into their databases and when you run a search it matches they key words you searched with the words on the page that the spider indexed. However when you are searching the web using a search engine, you are not searching the entire web as it is presently. You are looking at what the spiders indexed in the past.