[1086] in magellan

home help back first fref pref prev next nref lref last post

Web Search Service August/September 2003 Status Report

daemon@ATHENA.MIT.EDU (Joanne M. Hallisey)
Fri Oct 3 09:01:27 2003

Mime-Version: 1.0
Message-Id: <p05230105bba32361bab6@[18.152.2.178]>
Date: Fri, 3 Oct 2003 09:01:02 -0400
To: magellan@mit.edu, search-engines@mit.edu
From: "Joanne M. Hallisey" <hallisey@MIT.EDU>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"

August/September 2003 Status Report

Project Name: Web Search Service
Project Leader: Joanne Hallisey
Report Date: October 2, 2003
URL: http://web.mit.edu/is/discovery/search/

Accomplishments in August:
- Narrowed down the data sets for comparing Google with Ultraseek.
- Continued to evaluate Google capabilities.
- Finalized terms of 500,000 url beta agreement.

Accomplishments in September:
- Began comparison of Ultraseek and Google search capabilities. Noted 
that they have different algorithms which makes it difficult to 
compare.- Determined that 300,000 documents was not adequate for 
meeting MIT's needs.
- Meet with Barbara Johnson to begin preparation for usability 
testing to compare the two search engines.
- Ran some initial "tests" of small collections to compare results. 
Found that Google "discovers" and crawls some directories we did not 
expect it to, such as "http://web.mit.edu/activity". We restricted 
the crawling to only crawl URLs of the form 
http://web.mit.edu/lockername
- Held conference call with Google Rep to discuss the license terms 
of number of urls versus number of documents.
- Set up script to determine which sites were using search.mit.edu. Run daily.

Goals for October:
- Make test collections starting from the list of offices, and index 
URLs of the form http://web.mit.edu/{a,b,...}, stopping when the 
Google collection reaches close to our 300,000 url limit.
- Investigate how multiple urls pointing to a document impacts the 
Google search.
- Create test query pages and results look-and-feel for our user 
testing.  Remove vendor information from both, so that the testers 
won't know which search engine produces which results.
- analyze data from daily logs of search.mit.edu users.
- Begin financial comparison of Google service to Ultraseek service.
- Recruit 10 or more testing volunteers.
- Create test questions.
- Begin draft of initial Discovery findings and comparisons.


Ongoing tasks
- Ask Google about SSL capacity for later use - August
- Review rules for existing search service
- Determine final definition for data sets for new service
- Define rules for new service
- Contact Resources to determine how to appropriately thank Google
- Determine solution for conversion of query forms if required.

Nex community milestone:
None scheduled.

Issues:
- August was a month that involved individual work and many 
vacations. No regular meetings were held.

Key learnings:

Discovered that the Google license is based on url streams rather 
than on documents. So that the 300,000 url license only allows us to 
index 30,000 documents. More investigation into this issue is 
required.

While it appears that Ultraseek may be less expensive, actual cost 
comparison is likely to show that the two options are not far apart.

There will be transition issues that will affect the community of 
users who create search forms for their MIT pages whether we change 
to Google or upgrade Ultraseek.

Team dynamics:

Good. Hubert and Anne are diligently working the technical issues.

Additional comments:

None

home help back first fref pref prev next nref lref last post