• 0

[.net] Webbrowser Minus The Rendering...


Question

I have the need to load and execute/parse webpages but not the need to actually see the page. I need this simply in order to generate the HTTP traffic that is the page (and its elements) being loaded.

My current problem is that I'm using the .NET WebBrowser component to do this and simply not attaching it to a form. Although this is okay, the fact that I'm using (what is traditionally) a form element means there are restrictions I have to adhere to when running the WB such as running it in an STA thread (the biggest current pain).

I was wondering if anyone knows of any other .NET web browsers - or even better just HTML executors - that I could swap in for the WB. I have looked into engines like WebKit but I'm confused if that is really what I want and moreso about how to actually compile it so I can use it in my .NET project. My other concern is that I want to use a component that is kept up to date given the medium-paced change of HTML and especially in light of HTML5 features that may become very important in a year or two.

Does anyone have any recommendations?

Link to comment
Share on other sites

15 answers to this question

Recommended Posts

  • 0

Try using the HttpWebRequest/HttpWebResponse classes in the System.Net namespace. They're asynchronous, so you can run them and do other things while you're waiting for them to finish.

Link to comment
Share on other sites

  • 0

Are you suggesting I implement my own HTML parser engine? The HttpWebRequest and Response only fetch the contents of a URI, they do not actually parse the contents of the page (which may not even be HTML) nor do they load the contents (assuming it is an HTML page).

Maybe my question was unclear? I would like to find a WebBrowser component like Firefox/Chrome/IE/Opera/etc... that does everything but actually show the webpage. It needs to be able to load the images on the page, browser plug-ins and so on. Thanks.

Link to comment
Share on other sites

  • 0

omnicoder - Thanks for finding GeckoFx. I think I've run across it before. I noticed that one of the more recent bugs filed on the site relates exactly to some of my problems - but it a negative way: http://code.google.com/p/geckofx/issues/detail?id=5#c2 ... Read the last comment. Any thoughts on this or do you happen to have any experience with GeckoFx?

Link to comment
Share on other sites

  • 0

XerXis - thanks. Although I could potentially try to parse documents using the .NET DOM classes, that still leaves the JavaScript un-executed. I need the web page to be executed completely - javascript, iframes, plug-ins and so on. Any other recommendations?

Link to comment
Share on other sites

  • 0
Are you suggesting I implement my own HTML parser engine? The HttpWebRequest and Response only fetch the contents of a URI, they do not actually parse the contents of the page (which may not even be HTML) nor do they load the contents (assuming it is an HTML page).

Maybe my question was unclear? I would like to find a WebBrowser component like Firefox/Chrome/IE/Opera/etc... that does everything but actually show the webpage. It needs to be able to load the images on the page, browser plug-ins and so on. Thanks.

I guess I misunderstood what you were trying to do. I thought you just wanted to get the contents of a URI to scrape it. :)

Link to comment
Share on other sites

  • 0
I guess I misunderstood what you were trying to do. I thought you just wanted to get the contents of a URI to scrape it. :)

yeah :laugh: I got that impression too! I only wish it was as common as scraping. My main thought is that maybe one of the major web browsers has an internal component similar to what I'm looking for. However, I just don't know where to start looking and moreso I imagine that most of the professional browser components like that will have some sort of license on them that keeps me from using them (or at least worries me about starting to use it).

Let's keep the suggestions coming :-)

Link to comment
Share on other sites

  • 0

Um, what exactly are you trying to do? I mean, what is this project supposed to be? Because that makes a huge difference in what one would recommend.

Link to comment
Share on other sites

  • 0

I basically need to be able to capture all the HTTP traffic generated by a web page - including the HTTP traffic associated with images/plug-ins/iframes/etc that are on the page - just like a web browser would load all of the page elements too. Thus, I'm using a proxy to monitor traffic. However, the proxy is not the problem. I'm having trouble with the component that generates the traffic, which I'm currently using the WebBrowser for.

Link to comment
Share on other sites

  • 0

Do you care about whether about the final thing is standalone or not?

Instead of embedding a web browser and shunting stuff through a proxy, it's much easier to just do this on top of Gecko. If I were you, this is what I'd do:

1) Create a simple Firefox extension that has a hidden <browser> element in the XUL.

2) Hook up a HTTP request observer (and/or a HTTP response observer).

3) Tell your extension's <browser> to load whatever URL you want to probe

4) Your observer will see every outgoing HTTP request and/or incoming responses (you'll want to filter out any not coming from your extension so that you don't also catch stuff from your regular browsing), and you can do whatever you want with that

Simple, no complicated embedding, no proxy, works on any OS. In fact, there already are extensions that do this or something similar to this, e.g., Firebug and Live HTTP Headers.

Edited by code.kliu.org
Link to comment
Share on other sites

  • 0

Unfortunately, I do want this to be a standalone application - sorry I should have mentioned it before. I've come across quite a few FF extensions that have functionality close to what I'm looking for, but I want users to not be tied to a specific browser.

Link to comment
Share on other sites

  • 0
I took a look at the XULRunner pages and have a question. Would I then be writing my entire app in JavaScript?

Pretty much. The difference between using XULRunner and embedding Gecko (as suggested earlier by someone) is that in the former, the XULRunner engine hosts your app whereas in the latter case, your app hosts Gecko. You can take the embedding approach (though I am not that familiar with that route, so really I can't say how well that would work), in which case you will probably need to shunt things through a proxy. If you go with the XULRunner approach, it's much easier to take advantage of things built into Gecko, like the HTTP observers that would eliminate the need for any proxies. And there's the benefit that XULRunner is cross-platform and if you are going to be distributing this, your users won't have to worry about having .NET installed. OTOH, if you are not familiar with coding for the Gecko platform (it's actually quite fun, once you get past the initial hump), then there is the obvious downside of having to learn it.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.