semifamous Posted October 8, 2008 Share Posted October 8, 2008 Examples: http://www.redrivernet.com/links/ http://www.wabi.tv/content/4007/National_News/ http://www.freewebportal.net/ You can find more examples by searching for a couple of the headlines in one search. Notice how the first article is from September 2 (over a month ago)? When my web server (example 1) attempts to retrieve the feed from http://rss.cnn.com/rss/cnn_topstories.rss , they instead redirect to http://feedproxy.feedburner.com/rss/cnn_topstories.rss which has that older news. They are purposely blocking me and many others. I guess they don't want people to use them as a source of news? Does anyone know how best to work around this? Link to comment Share on other sites More sharing options...
0 Doli Posted October 8, 2008 Share Posted October 8, 2008 The first RSS feed doesnt redirect and shows today's news (Oct. 8) for me. Link to comment Share on other sites More sharing options...
0 semifamous Posted October 9, 2008 Author Share Posted October 9, 2008 You are right. It does. And if you click on the second one, you can see the difference. My guess is that only certain IPs are blocked. My computer's IP can get to the first link, my web server is always redirected to the second link. So CNN is blocking access from those three sites and many others to their current updated RSS feed info. Link to comment Share on other sites More sharing options...
0 Kudos Veteran Posted October 9, 2008 Veteran Share Posted October 9, 2008 I would guess you were hitting the feed too often and an automated system stepped in a redirected you to an older cache as a method of reducing load on the main feeds. Link to comment Share on other sites More sharing options...
0 semifamous Posted October 9, 2008 Author Share Posted October 9, 2008 Their TTL is only 5 minutes. I doubt that was the issue, but nevertheless, we are blocked. Also, this automated system is also blocking hundreds of other publicly accessible sites. More examples: http://www.losangelesdailynews.net/ http://newtownhighschool.org/index.php http://www.scary-software.com/RSS/ http://www.4seasonswireless.com/news.php News is over a month old?! http://www.google.com/search?q=%22Bush+to+...nvestigators%22 Link to comment Share on other sites More sharing options...
0 CrispCreations Posted October 13, 2008 Share Posted October 13, 2008 Could it be that they're blocking specific clients? ie, browser based readers are ok, and identified as real people consuming the news, rather than a script scraping the news to be displayed on another site. If you can, might be worth editing whatever script you use on your site so it identifies itself as a browser based reader, see if that works. Link to comment Share on other sites More sharing options...
0 Josh Posted October 13, 2008 Share Posted October 13, 2008 I would guess you were hitting the feed too often and an automated system stepped in a redirected you to an older cache as a method of reducing load on the main feeds. Bingo. You might want to cache the stories for at least an hour before hitting their server again. Lots of places limit how often an IP address can hit their feeds. In all honesty, it's a good practice. Link to comment Share on other sites More sharing options...
0 semifamous Posted October 14, 2008 Author Share Posted October 14, 2008 That's all well and good, but I set up caching a few weeks ago now, and they're still blocking me. And the blockage is based on my server's IP address. When I try to go there in lynx on that server, it also gets the old data. Link to comment Share on other sites More sharing options...
0 The_Decryptor Veteran Posted October 15, 2008 Veteran Share Posted October 15, 2008 http://feeds.feedburner.com/rss/cnn_topstories.rss feedproxy just seems out of date. Link to comment Share on other sites More sharing options...
0 semifamous Posted October 16, 2008 Author Share Posted October 16, 2008 That is the true. Feedproxy is out of date. And my server is getting redirected to the out-of-date feedproxy data. How do I make them either get updated feedproxy or get around their redirection? Link to comment Share on other sites More sharing options...
0 fatboyuk Posted October 16, 2008 Share Posted October 16, 2008 What IP / ISP are you coming from - I have gone straight to http://rss.cnn.com/rss/cnn_topstories.rss and get todays stories no problem. How are you fetching the feed - via browser or code? If its code, try looking at PHP and CURL. Using that, you can imitate a true browser visit so hopefully the feed will give you the proper version. If you need a hand with the curl stuff shout! Link to comment Share on other sites More sharing options...
0 Son of Hook Posted October 17, 2008 Share Posted October 17, 2008 That's all well and good, but I set up caching a few weeks ago now, and they're still blocking me.And the blockage is based on my server's IP address. When I try to go there in lynx on that server, it also gets the old data. It could still be client-based redirection. I'd add a user-agent header to the request to mimic a real browser and give it a try. Link to comment Share on other sites More sharing options...
0 Son of Hook Posted October 17, 2008 Share Posted October 17, 2008 I just did a little test. My first request without a browser user agent string was redirected to http://feedproxy.feedburner.com/rss/cnn_topstories.rss. My second request with the UA of "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.30 Safari/525.13" was not redirected and retrieved http://rss.cnn.com/rss/cnn_topstories.rss. It's client-based redirection. Add a user agent string to the request headers and you'll be fine. Also don't test things in Lynx and expect accurate results ;) Link to comment Share on other sites More sharing options...
0 The_Decryptor Veteran Posted October 17, 2008 Veteran Share Posted October 17, 2008 That is the true. Feedproxy is out of date.And my server is getting redirected to the out-of-date feedproxy data. How do I make them either get updated feedproxy or get around their redirection? http://feeds.feedburner.com/rss/cnn_topstories.rss Tried that? Link to comment Share on other sites More sharing options...
0 semifamous Posted October 20, 2008 Author Share Posted October 20, 2008 I'm using the NewsParserX snippet with modxcms. I'm not familiar enough to know how to change the headers that the script uses when it pulls the data... Although I do see that they updated their feedproxy data to have yesterday's "news", so that's some progress. Link to comment Share on other sites More sharing options...
Question
semifamous
Examples:
http://www.redrivernet.com/links/
http://www.wabi.tv/content/4007/National_News/
http://www.freewebportal.net/
You can find more examples by searching for a couple of the headlines in one search.
Notice how the first article is from September 2 (over a month ago)?
When my web server (example 1) attempts to retrieve the feed from http://rss.cnn.com/rss/cnn_topstories.rss , they instead redirect to http://feedproxy.feedburner.com/rss/cnn_topstories.rss which has that older news.
They are purposely blocking me and many others. I guess they don't want people to use them as a source of news?
Does anyone know how best to work around this?
Link to comment
Share on other sites
14 answers to this question
Recommended Posts