• 0

How to detect online page changes


Question

6 answers to this question

Recommended Posts

  • 0

Browsers check various content headers for expiration. You could use wininet. Here is some C; it should be easy to translate it into vb code.

HINTERNET	hInternet, hRequest;
	SYSTEMTIME	systemTime;
	DWORD  dwLen;

	hInternet = InternetOpen(NULL, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
	hRequest = InternetOpenUrl(hInternet, "http://www.google.com/", "", 
  INTERNET_FLAG_PRAGMA_NOCACHE, 0, NULL);
	dwLen = sizeof(SYSTEMTIME);
	dwIndex = 0;
	HttpQueryInfo(hRequest, HTTP_QUERY_DATE | HTTP_QUERY_FLAG_SYSTEMTIME, 
  &systemTime, &dwLen, &dwIndex);

  • 0

I assume you understand web requests and responses (the client sends a web request to the server, and the server replies with a web response), and that each consists of a header and (optionally) some content.

After the first web request for the page, there should be a Last-Modified field in the header of the response.

Google is a bad example because there are no Last-Modified fields for actual pages, only images. But you get the idea!

First web request:

GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Host: www.google.com

Response:

HTTP/1.1 200 OK
Content-Type: image/gif
Last-Modified: Thu, 23 Sep 2004 17:42:04 GMT
Expires: Sun, 17 Jan 2038 19:14:07 GMT
Server: GWS/2.1
Content-Length: 8558
Date: Tue, 05 Oct 2004 12:05:47 GMT

(then follows the 8558 bytes of the image)

In the next web request for the same URI, include an If-Modified-Since header field, and use the date obtained from the server in the Last-Modified field.

Subsequent requests:

GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
If-Modified-Since: Thu, 23 Sep 2004 17:42:04 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Host: www.google.com

If the document has not been modified since the date you passed to the server in the If-Modified-Since header field, the server will reply with status code 304:

Repsonse:

HTTP/1.1 304 Not Modified
Content-Type: text/html
Server: GWS/2.1
Content-Length: 0
Date: Tue, 05 Oct 2004 12:06:15 GMT

Notice that the content length is 0 (i.e. the document is not sent). The browser then knows that the document has not been modified and can use the cached version.

If the document HAS been modified, however, the server will reply with status code 200, as it did in the first response.

If the server does not support this feature, it will simply always reply with status code 200 and there's no way to tell whether it's been modified (other than by comparing it with the cached version).

  • 0

The one problem with all of these approaches is that if the page has ANY dynamic content at all, it will always be "updated" since the last view. The forum summary on the Neowin front page is a good example. These are useless if the site has more than 10 visitors a day, but a LOT of sites do it anyway.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.