[305] in Public-Access_Computer_Systems_Forum
Print vs. Electronic Information
daemon@ATHENA.MIT.EDU (Clifford Urr)
Thu May 21 11:28:42 1992
Date: Thu, 21 May 1992 10:11:32 CDT
Reply-To: Public-Access Computer Systems Forum <PACS-L%UHUPVM1.BITNET@RICEVM1.RICE.EDU>
From: Clifford Urr <cliffu@well.sf.ca.us>
To: Multiple recipients of list PACS-L <PACS-L%UHUPVM1.BITNET@RICEVM1.RICE.EDU>
----------------------------Original message----------------------------
Subject: Print vs. Electronic Information
Wilfred says:
>Tables of Contents and Indexes are LESS useful in electronic text
>because pages are a paper phenomenon. If the text is in ASCII
>format or else a viewer of some sort is used, text could be
>searched using such programs as LIST for personal computers or a
>text editor or word processor.
Perhaps your method is OK for small or short works. But I think
electronic books will be fairly trivial phenomena unless it takes
advantage of the *very* large amount of phyically miniaturized
storage space that digitized text is capable of using. If they do,
the issue of indexing, especially in connection with hypertext, will
loom large.
Let me explain:
In reading the statement by Wilfred above, I wondered what size were
the files that were searched using LIST or a word processor. I
wondered if, e.g., you would search something like the "Encyclopedia
of Associations" or the "Encyclopedia Brittanica" or Toynbee's "Study
of History" or any number of what might be called "large narrative
data sets" using a word processor minus access to a genuine index (as
distinct from a concordance). If you did, you would be revisiting the
same old problem of relevance and recall that has bedeviled
bibliographic database searchers.
String searching large files, mentioned in another reply, does not
escape the problems. Especially because this neglects the needs or
abilities of the non-professional end user. What, e.g., do users do
when they get back dozens or hundreds of insignificant hits from the
narrative text they searched using their little string search
statements? Or, if these tools are improved with, say, boolean search
capabilites, do you expect him/her to become proficient in set
theory? Will they know the pitfalls and limitations of same?
Encouraging ordinary users to employ what you think are such
wonderful "tools" is going to plunge them into rather deep oceans of
data without a life jacket. To change the metaphor, they will not, in
many instances, have any idea of how many or which needles they fail
to notice in the electronic haystacks these tools are directed to
penetrate, tools which are not doing them as much service as they are
mislead to believe.
Wilfred also says:
>HYPERTEXT is fine but only puts you where the writer thinks you
>should go not where the reader necessarily wants to go. That is the
>biggest problem with many hypertext products currently available.
This is like saying the reader/user should recognize the index is fatally
flawed and thus perhaps useless when searching, e.g., the Encyclopedia
Brittanica, because the index provided therein does not give the reader
precise pointers to where he/she specifically wants to go, and everybody has
so many different places they want to go that the indexer cannot
possibly include them all... (One can reject any index, whatever its
quality, on the basis of this kind of criticism of hypertext.) Good indexing
- I'm not talking about concordance making, which can be almost totally
automated - provides pointers to where readers likely want to go. It does
more, but very importantly, it alerts them to go where they might need to go
but did not know they needed to go to. Good indexes provide a structure
within which a user can navigate his/her way to the target information.
Simple tools, as LIST, aren't going to replace good indexing structures for
works of significant length or complexity.
You are incorrect about what the "biggest problem" of current
hypertext products is. The real "biggest problem" is the shallowness
and superficiality of the index component - in all its manifestations
in the system - of most hypertext knowledgebases. First-rate
hypertext depends upon first-rate indexing. There *are* a few
first-rate hypertext knowledgebases whose creators have recognized
this, and I've seen some of them. (For an account of one such
knowledgebase I've seen of this type, see the "Hypertext/Hypermedia
Handbook," edited by Emily Berk, Chapter 27, Case #3, p. 468, for the
article by Bruce Winters and Neil Larson on DeLoitte-Touche's
gigantic CD-ROM hypertext knowledgebase.) This issue has hardly been
addressed by most developers or writers on, hypertext systems. (I
have talked about indexing in hypertext extensively in an article in
the May, 1991 issue of "Computers in Libraries.")
I agree to some extent with Genevieve Engel's comments indicating
ordinary indexing can be transferred without much added expense
("Hypertext is one approach but it's also possible to do straight
indexing just the way you do for a paper book"). Trouble is, it goes
against the way many ordinary users find information in books. An ALA study
of about 4-6 years ago, mentioned in LJ at the time, showed most "ordinary"
library users don't bother with indexes in locating information from
non-reference books pulled from a shelf; they simply, though inefficiently,
flip through the pages. In other words, they browse their way to
infotargets. In comparison to electronic books, it's a lot easier - though
low tech - for even the least bright user to *very* rapidly
flip-through-and-glance-at the pages of a paper book. (Scrolling up and down
a file is a regressive method in an electronic text.)
Hence if you have narrative text in digitized form, why go through
the trouble of merely imitating paper, why not do something with the
digitized formats that can't be done in paper - like hypertext it?
Quality hypertext, as I characterized it above, could combine the
tendency of most users to browse their way to an infotarget while
greatly improving their efficiency in doing so. In any case, I'm not
so sure the few added values that can be inferred out of Genevieve's
approach, such as reduced storage space, justify the effort, or, at
the least, I doubt it results in something all that much more
interesting than paper formats.
In sum, using simple tools (Wilfred's solution) or imitating paper formats
of books/indexes (Genevieve's solution) will reinforce user's ineffective
and inefficient search habits actively in the former case and passively by
default in the latter case, and together both will amplify attachment to
these habits. In the latter, people won't bother, as they usually don't,
says the ALA study, with the index, especially if they can use tools like
LIST, and in the former, they will be blissfully unaware of these tool's
limitations or abandon the tool when they do find out (after a few incidents
of pulling back hundreds of irrelevant items). Of course, if they decide to
use the index after a few bad experiences with LIST or a free-text retrieval
program, it will have to be very good, better than the high school
graduate-produced shlock indexes one sees more and more of lately in books.
But, then why trouble to digitize a book if the index is very good...unless
other significant values are added. No, I don't think the issue of indexing
digitized narrative text in a hypertext context is going to go away soon.
Cliff Urr, Director of Library Services
James Martin and Company Library
Reston, VA
Bitnet: cliffu@well.uucp.bitnet