[145] in 6.033-lab
html parser updates, deadline tomorrow
daemon@ATHENA.MIT.EDU (Kevin Fu)
Thu Mar 11 20:18:35 1999
To: 6.033-lab@MIT.EDU
Date: Thu, 11 Mar 1999 20:18:26 EST
From: Kevin Fu <fubob@MIT.EDU>
You may find the comments below useful for processing binary
documents. The HTTP and HTML parsers do not like to parse images...
Also, I updated the html_test.c program which previously could result
in a seg fault. The HTML parser itself has not changed.
Remember to copy your _uncompiled_ code and Makefile to
/mit/6.033/lab/handins/p2/ tomorrow.
--------
Kevin E. Fu (fubob@mit.edu)
PGP key: finger fubob@monk.mit.edu
------- Forwarded Message
From: Roger Hu <rogerh@MIT.EDU>
To: 6.033-lab-tas@MIT.EDU
Cc: rogerh@MIT.EDU
Subject: Adjustments for prefetching
Date: Thu, 11 Mar 1999 19:42:08 EST
Hi guys,
Thanks again for all the help on this prefetching part.
The parser doesn't like non-HTML files very much, so I found that
passing some random buffers through it makes it seg fault. So Kevin's
suggestion was to use the Content-Type: field of the response
headers. If the document being sent is an HTML file, it will be
appear as "Content-Type: text/html".
Except the Content-Type: field isn't parsed, so we have to modify
http.c and http.h:
For http.h, the 'content_type' declaration needs to be added:
struct http_rep {
int type;
int major;
int minor;
char *url;
char *reason;
char *content_type;
char *date;
char *expires;
char *if_modified_since;
char *last_modified;
char *pragma;
int content_length;
char *other;
char *orig; /* Pointer to tokenized http buffer. */
In http_parseResponse, add this at line 156:
} else if (strcasecmp(str, "Content-Type") == 0) {
m->content_type = strtok(NULL, CRLF);
stripSpaces(&m->content_type);
But if you change http_unparseResponse, you have to change
http_unparseResponse at line 349:
if (m->content_type != NULL) {
sprintf(p, "Content-Type: %s%s", m->content_type, CRLF);
p += strlen(p);
}
Ok, here's the seg fault that took me forever to find. Then you have
to make sure that memory is allocated for the m->content_type field.
Line 316 needs to be changed to:
respLen = 128 + my_strlen(m->url) + my_strlen(m->date) +
my_strlen(m->expires) + my_strlen(m->if_modified_since) +
my_strlen(m->last_modified) + my_strlen(m->pragma) + my_strlen(m->other) + my_strlen(m->content_type);
------- End of Forwarded Message