[1086] in magellan
Web Search Service August/September 2003 Status Report
daemon@ATHENA.MIT.EDU (Joanne M. Hallisey)
Fri Oct 3 09:01:27 2003
Mime-Version: 1.0
Message-Id: <p05230105bba32361bab6@[18.152.2.178]>
Date: Fri, 3 Oct 2003 09:01:02 -0400
To: magellan@mit.edu, search-engines@mit.edu
From: "Joanne M. Hallisey" <hallisey@MIT.EDU>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
August/September 2003 Status Report
Project Name: Web Search Service
Project Leader: Joanne Hallisey
Report Date: October 2, 2003
URL: http://web.mit.edu/is/discovery/search/
Accomplishments in August:
- Narrowed down the data sets for comparing Google with Ultraseek.
- Continued to evaluate Google capabilities.
- Finalized terms of 500,000 url beta agreement.
Accomplishments in September:
- Began comparison of Ultraseek and Google search capabilities. Noted
that they have different algorithms which makes it difficult to
compare.- Determined that 300,000 documents was not adequate for
meeting MIT's needs.
- Meet with Barbara Johnson to begin preparation for usability
testing to compare the two search engines.
- Ran some initial "tests" of small collections to compare results.
Found that Google "discovers" and crawls some directories we did not
expect it to, such as "http://web.mit.edu/activity". We restricted
the crawling to only crawl URLs of the form
http://web.mit.edu/lockername
- Held conference call with Google Rep to discuss the license terms
of number of urls versus number of documents.
- Set up script to determine which sites were using search.mit.edu. Run daily.
Goals for October:
- Make test collections starting from the list of offices, and index
URLs of the form http://web.mit.edu/{a,b,...}, stopping when the
Google collection reaches close to our 300,000 url limit.
- Investigate how multiple urls pointing to a document impacts the
Google search.
- Create test query pages and results look-and-feel for our user
testing. Remove vendor information from both, so that the testers
won't know which search engine produces which results.
- analyze data from daily logs of search.mit.edu users.
- Begin financial comparison of Google service to Ultraseek service.
- Recruit 10 or more testing volunteers.
- Create test questions.
- Begin draft of initial Discovery findings and comparisons.
Ongoing tasks
- Ask Google about SSL capacity for later use - August
- Review rules for existing search service
- Determine final definition for data sets for new service
- Define rules for new service
- Contact Resources to determine how to appropriately thank Google
- Determine solution for conversion of query forms if required.
Nex community milestone:
None scheduled.
Issues:
- August was a month that involved individual work and many
vacations. No regular meetings were held.
Key learnings:
Discovered that the Google license is based on url streams rather
than on documents. So that the 300,000 url license only allows us to
index 30,000 documents. More investigation into this issue is
required.
While it appears that Ultraseek may be less expensive, actual cost
comparison is likely to show that the two options are not far apart.
There will be transition issues that will affect the community of
users who create search forms for their MIT pages whether we change
to Google or upgrade Ultraseek.
Team dynamics:
Good. Hubert and Anne are diligently working the technical issues.
Additional comments:
None