[145] in 6.033-lab

home help back first fref pref prev next nref lref last post

html parser updates, deadline tomorrow

daemon@ATHENA.MIT.EDU (Kevin Fu)
Thu Mar 11 20:18:35 1999

To: 6.033-lab@MIT.EDU
Date: Thu, 11 Mar 1999 20:18:26 EST
From: Kevin Fu <fubob@MIT.EDU>

You may find the comments below useful for processing binary
documents.  The HTTP and HTML parsers do not like to parse images...

Also, I updated the html_test.c program which previously could result
in a seg fault.  The HTML parser itself has not changed.

Remember to copy your _uncompiled_ code and Makefile to
/mit/6.033/lab/handins/p2/ tomorrow.

--------
Kevin E. Fu (fubob@mit.edu)
PGP key: finger fubob@monk.mit.edu

------- Forwarded Message
From: Roger Hu <rogerh@MIT.EDU>
To: 6.033-lab-tas@MIT.EDU
Cc: rogerh@MIT.EDU
Subject: Adjustments for prefetching
Date: Thu, 11 Mar 1999 19:42:08 EST


Hi guys,

Thanks again for all the help on this prefetching part.  

The parser doesn't like non-HTML files very much, so I found that
passing some random buffers through it makes it seg fault.  So Kevin's
suggestion was to use the Content-Type: field of the response
headers.  If the document being sent is an HTML file, it will be
appear as "Content-Type: text/html".

Except the Content-Type: field isn't parsed, so we have to modify
http.c and http.h:

For http.h, the 'content_type' declaration needs to be added:

struct http_rep {
  int type;
  int major;
  int minor;
  char *url;
  char *reason;
  char *content_type;
  char *date;
  char *expires;
  char *if_modified_since;
  char *last_modified;
  char *pragma;
  int content_length;
  char *other;
  char *orig;     /* Pointer to tokenized http buffer. */

In http_parseResponse, add this at line 156:

    } else if (strcasecmp(str, "Content-Type") == 0) {
      m->content_type = strtok(NULL, CRLF);
      stripSpaces(&m->content_type);

But if you change http_unparseResponse, you have to change
http_unparseResponse at line 349:

  if (m->content_type != NULL) {
    sprintf(p, "Content-Type: %s%s", m->content_type, CRLF);
    p += strlen(p);
  }

Ok, here's the seg fault that took me forever to find.  Then you have
to make sure that memory is allocated for the m->content_type field.
Line 316 needs to be changed to: 

  respLen = 128 + my_strlen(m->url) + my_strlen(m->date) + 
    my_strlen(m->expires) + my_strlen(m->if_modified_since) + 
    my_strlen(m->last_modified) + my_strlen(m->pragma) + my_strlen(m->other) + my_strlen(m->content_type);
 


------- End of Forwarded Message


home help back first fref pref prev next nref lref last post