The future of traffic analysis on the Web

Date01 September 2001
Pages51-54
Published date01 September 2001
DOIhttps://doi.org/10.1108/03055720010804159
AuthorPhilip Hunter
Subject MatterInformation & knowledge management
VINE 124 — 51
The future of traffic
analysis on the Web
by Philip Hunter, Information Officer at
UKOLN
Web statistics hav e been a bone of contention
for some conside rable time. Part of the reas on
for this is that the analys is of the statistics now
serves a quite different range of functions from
those originally envi saged.
What is Web traffic?
The logging of information about visitors to
Websites began as a simple extension of the
information kept by server administrators long
before the Web came into being. In those days the
information about files served (peak times for
traffic, etc) wer e collect ed in order to provid e
detail about the level of service. This could be
used by the system administrator to anticipate
necessary upgrades to the service, to spot bottle-
necks in processing, identify unauthorised use of
the facilities, etc. Analysis of this data could be
presented in simple graphical form to the managers
of the service in hourly, daily, or weekly break-
downs, illustrating aspects of usage.1
Then the Web arrived. At first there wasn’t a
problem, since most early sites were academic-
related, and inline images were not an option. A hit
to a page was exactly that, and the statistics told
you pretty much what you wanted to know. Was
anyone looking at your page? Which domains did
they arrive from? Which is the busiest or quietest
time? Are users attempting to find pages that have
since been moved?
Once inline graphics became possible in mid ’93,
statistics for the Web became a more complicated
issue: a hit to a page with ten or so inline graphics
(graphical bullet points, section dividers, headings,
banners, etc) would result in ten further requests.
In other words the statistics began to reflect the
complexity of documents served rather than the
usage of files. It was still possible to work out
what was going on, but no longer straightforward.
The arrival of large-scale commercial interest in
the Web in 1995 changed everything. The Web
was no longer an exclusive playground for IT
people and academics, and the newcomers needed
statistics for their own purposes. Marketing depart-
ments needed statistics to show that the investment
in a Web presence was generating some kind of
interest, if not direct evidence of business done,
and to establish relative success compared to
competing businesses.
For a while hit counters were bandied about as if
there were no problems associated with them.
However commercial companies depend for their
long-term success on being able to spot dud
information. With the increase in the use of proxy
servers and caching services, it became impossible
to treat a raw hitcount by itself as reliable evidence
of usage.
Making Web statistics reliable
Where we are now is that all access statistics are
treated as suspect unless the parameters within
which the statistics have been collected and proc-
essed are explained in detail. This means that it
should be possible in theory for a third party to
understand the significance of the data, and to
reprocess it in a format which makes it possible
to compare it with data from another site. In
other words, in the early days of the Web, it was
assumed that Web statistics data were straightfor-
wardly intelligible and unambiguous in meaning,
and therefore overtly comparable with each other.
Now the assumption must be that both data col-
lected i n Web logs, an d data presented in a report
are not straightforwardly intelligible and unam-
biguous in meaning, and not comparable until the
parameters of collection and processing are both
stated and intelligible.2
There is as yet no standard by which these statisti-
cal analy ses are prepared, tho ugh there are
discussions under way about putting such stand-
ards in place.3 An example of the sort of standards
we might require in analysis would be (for exam-
ple) presentation of only those visitor sessions not
using the same IP address within thirty-minutes.
Obviously this isn’t the only way in which visitor
sessions might be analysed and presented. How-
ever, like the collection of distribution figures for
magazines and newspapers, and the assessment of
the audience for television programmes, it is more
important that a method for the collection of

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT