The evolution of Web searching

Date01 April 2000
Pages124-137
DOIhttps://doi.org/10.1108/14684520010330283
Published date01 April 2000
AuthorDavid Green
Subject MatterInformation & knowledge management,Library & information science
The evolution of Web
searching
David Green
Defining the Web
In that constellation of computers known as
the Internet, transmitted data are split into
small ``packets'' which is an exponentially
more efficient utilisation of bandwidth. This,
together with simpler technologies, has
dramatically reduced the cost of electronic
publishing, resulting in the estimated daily
increase of over 1 million Web pages of
information (Clever Team, 1999). However,
despite its uniform interface and seamless
linked integration, the Web is not a single
coherent element. There are two distinct
elements of the Web: the visible and the
invisible. In order to understand the
implications of this distinction for
information retrieval, it is necessary to first
consider how Web pages are produced.
There are two types of Web page: static and
dynamic. Static Web pages have been
manually created by a Web designer, posted
on to a Web server and are available to
anyone or anything that visits the Web site of
which it is a part. Any changes must be made
manually. Dynamic Web pages are created by
a computer using a script (often CGI, Java or
Perl). This script acts as an intermediary
between the user requesting, or submitting,
information on a static Web page (the front-
end) and a database (the back-end), which
supplies, or processes, the information. The
script slots the results into a blank Web page
template and presents the visitor with a
dynamically generated Web page (Green,
1998a). Figure 1 illustrates this process.
Static Web pages provide the same generic
information to everyone, while dynamically
generated Web pages provide unique
information, customised to the user's specific
requirements. Available to everyone, and for
indexing to all search engines, static Web
pages together constitute the visible Web.
This is what researchers at the NEC Research
Institute refer to as the ``publicly indexable
World Wide Web'' (Lawrence and Giles,
1999).
The invisible Web comprises Web pages
with authorisation requirements, pages
excluded from indexing using the robots
exclusion meta tag and information that
resides within databases that will only ever be
temporarily present on the Web as
The author
David Green is currently the New Media Editor for one of
the ``big five'' professional services firms. He can be
contacted via his Web site at
www.clickmedia.freeserve.co.uk
Keywords
Information retrieval, Electronic publishing, Internet
Abstract
The interrelation between Web publishing and
information retrieval technologies is explored. The
different elements of the Web have implications for
indexing and searching Web pages. There are two main
platforms used for searching the Web ± directories and
search engines ± which later became combined to create
one-stop search sites, resulting in the Web business
model known as portals. Portalisation gave rise to a
second-generation of firms delivering innovative search
technology. Various new approaches to Web indexing and
information retrieval are listed. PC-based search tools
incorporate intelligent agents to allow greater
manipulation of search strategies and results. Current
trends are discussed, in particular the rise of XML, and
their implications for the future. It is concluded that the
Web is emerging from a nascent stage and is evolving
into a more complex, diverse and structured environment.
Electronic access
The current issue and full text archive of this journal is
available at
http://www.emerald-library.com
Received September 1999
Accepted February 2000
124
Online Information Review
Volume 24 .Number 2 .2000 .pp. 124±137
#MCB University Press .ISSN 1468-4527

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT