• 0

Basic winsock proxy


Question

Hi, I am writing an HTTP proxy for a web browser, using Winsock and C++. For now the proxy should behave as if there was no proxy, just redirecting the requests towards the correct server, and the html data towards the browser. Also, it is synchronous.

It works, but performance is quite bad currently. Before I try anything else, I am wondering if at least the basic logic is correct. Here's some pseudocode:

Initialize Winsock
Socket listenSocket
listenSocket.Bind(host, port)
listenSocket.Listen()

while true:
     Socket browserSocket = listenSocket.Accept()
     httpRequest = browserSocket.ReceiveRequest()
     Socket serverSocket
     serverSocket.Connect(getHostName(httpRequest), port 80)
     serverSocket.SendRequest(httpRequest)
     transmitHtml(serverSocket, browserSocket)
     browserSocket.Close()
     serverSocket.Close()

listeSocket.Close()

I am mainly wondering if it is correct to close both browser and server sockets and create new ones everytime. For loading a simple page like Google, it will go 3 times through that loop; more involved pages will go numerous times through it.

Thanks!

Link to comment
https://www.neowin.net/forum/topic/889590-basic-winsock-proxy/
Share on other sites

5 answers to this question

Recommended Posts

  • 0

The slow step with almost every webpage is the number of HTTP requests. Every image, every AJAX call, etc involves a separate round trip HTTP request and response. This is why lots of high-traffic webpages use CSS-sliced images (one image = one HTTP request).

Each HTTP request in your code will block subsequent requests until said request finishes processing. You need to do this asynchronously (multiple requests being handled at the same time) to increase performance.

As for your specific code, use MSDN. Although it has tons of bad programming practices in it, it is hands down the best piece of programming documentation on the planet, especially for the Windows API. Check out the Winsock reference, complete with usage scenarios and examples, some of which will cover your specific problem. Note that multi-threaded sockets programming is not an easy kill; even the experts have trouble with this stuff.

And yes, consider doing this in .NET. Version 3.5 is very nice and version 4.0 coming up has some new high-performance sockets implementations that would eliminate 90% of the headaches you'd get into doing native code.

If you're gonna stick with C++, recognize that someone else has already done what you want to do, except they've done it better than you will. Consider using a library, such as the Boost ASIO library. I hate cliches, but don't reinvent the wheel.

  • 0

Thank you. I was starting to think that the main bottleneck was probably the fact that this is synchronous, but just to be sure. So the logic seems correct to you?

This is an assignment so the language, library and high-level design choices (like making it synchronous) are forced. And yes I got a lot of MSDN code in there. :laugh:

  • 0

Well, it is possible to keep the HTTP connection open for more than one request.

You haven't said what you mean by bad performance though. I am guessing a big bottleneck though is that it is synchronous. In the real world, it's normal to have up to four connections to a 1.0 server and two connections to a 1.1 (which you keep open, or persist, for more than one request.) I'm guessing this is far out of scope for your assignment though, so I wouldn't worry about it. You'll want to understand the synchronous part (even though it's not really used on Windows) before you start venturing into asynchronous I/O and multi-threading.

  • 0
  On 05/04/2010 at 10:43, hdood said:
You haven't said what you mean by bad performance though.
Most sites load approximately 3x slower than normal, but some pages give me bigger issues. A page like http://en.wikipedia.org/wiki/World_War_I takes ages to load, with various 10060 (connection timed out) and 10054 (connection reset by peer) along the way. When the browser finally says "Done", the page is still missing a lot of formatting and pictures.

msnd.com loads fast, but like Wikipedia, when the browser says "Done" (and doesn't make any more connections to the proxy), the page seems to be missing stuff.

sanstitreay.jpg

yahoo.com simply doesn't work. The browser says "Done" and the page is still blank. :blink: But google works flawlessly along with most simple sites.

Also wondering what should I use for receive buffers (second argument for recv). For now I use a local char[30000] and I immediatly forward the result to the browser, and then repeat until the return value is <= 0. Does that make sense?

  • 0

Just wanted to say I asked the above question to my teacher and he said it was normal some pages would be incomplete with a synchronous proxy, given that Firefox tries to make several requests in parallel and will give up if they are not satisfied quickly enough.

After all, the level of performance is what should be expected of such a naïve implementation I think.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.