[311] in IS Home Pages
Re: Limits to Infoseek Search?
daemon@ATHENA.MIT.EDU (Jag Patel)
Mon Nov 30 17:11:58 1998
Date: Mon, 30 Nov 1998 17:11:10 -0500
To: "Kevin M. Cunningham" <kcunning@MIT.EDU>
From: Jag Patel <jag@MIT.EDU>
Cc: kcunning@MIT.EDU, is-home@MIT.EDU, cavan@MIT.EDU, salemme@MIT.EDU,
jag@MIT.EDU
Hi Kevin,
The Search Goddesses say "Hmm, strange search behavior".
What we do know: the limit on qp terms is 40. The limit on the number of
unique documents for a search term is 40,000 (example: searching for pages
that say MIT on web.mit.edu doesn't work, since there are more than 40,000
pages on web.mit.edu that say MIT on them, thus making the term generic).
You have fewer than 40 qp lines, and under 20,000 pages in that list.
I've forwarded your URLs to Infoseek for more help; hopefully the engineers
will have an answer to us by tomorrow. One suggestion they already had is
try consolidate the locker names into one hidden line, e.g.
<input type=hidden name=qp value="url:http://web.mit.edu/is/,
url:http://web.mit.edu/consult/ url:http://web.mit.edu/answers/ etc etc">
Also, you do not need the pipe at the last URL in the qp line; it will take
the qp variables you defined and auto attach a double pipe, which means only
the terms that match the previous requirements will be searched.
Thanks for your patience (and for testing Ultraseek's limits :) I'll let
you know as soon as Infoseek gives us an answer.
--Jag
At 03:37 PM 11/25/98 -0500, Kevin M. Cunningham wrote:
>Howdy Search Gods,
>
>I'm doing some work for the IS Home Page team, and I'm having a little problem.
>
>After consulting with Oliver (who provided syntax tips and an initial
draft), I have been trying to expand a simple IS search page (one which only
searched files in the "is" locker) to be more all-inclusive. Specifically, I
want a user to be able to enter a search string and have Infoseek look for
that string in *all* the IS-related files on campus (i.e., all the files in
a set of ~30 lockers we've identified, rather than in just the "is" locker).
>
>My draft file (which happens to be in production ;-) is at:
>
> http://web.mit.edu/is/search/index.html
>
>It seems to work okay, but look at the source code and you'll see the
problem: I can only enter about a dozen "URL:..." qp clauses. If I add any
more (e.g., one of the ones in the commented area), the search returns a lot
of false hits. See, for example, the file:
>
> http://web.mit.edu/is/search/bad.html
>
>This is identical to the index.html file, except that I took one of the
commented qp lines and made it real. If you try a search in index.html (I
tried the word "dogs"), you get something reasonable; if you try a search in
bad.html, you get a boatload of false hits -- and you seem to get the *same*
bad list whatever your search string is.
>
>Suspecting there was a limit to the number of qp lines that could be
entered, I created a version with all the "URL:..." strings in a single qp
input statement, but the same thing happened (i.e., garbage out when I added
more than a certain number of "URL:..." clauses).
>
>It also doesn't seem strongly correlated to *which* "URL:..." clauses I
include or omit (although there could be some relationship there...) -- I
did try a few different qp statements but didn't notice that one worked and
others didn't, although that could be the case...
>
>In any case, I seem to be hitting some limit related to:
>
> - a maximum number of separate "URL:..." clauses I can enter
>
>and/or
>
> - a maximum number of pages I can search in (i.e., the set of pages
> matched by the aggregate of "URL:..." clauses -- to which I then
> apply the search string -- may be too big? is there such a limit?)
>
>Any ideas what's up, or an alternative way to build a custom search that
allows a user to find a string in a series of discrete lockers (or even
machines)?
>
>Oliver would have followed this up, but he's out Monday/Tuesday of next
week and suggested I send for help to this list...
>
>Thanks for your wisdom! I trust I do not "search" in vain...
>
>--Kevin
>
>
>
------------------------------------------------------------------
Jag Patel ~ MIT CWIS Consultant ~ jag@mit.edu ~ ph: 617.253.8167