CNN blocking RSS access for lots of sites

2,865 · October 8, 2008

Examples:

http://www.redrivernet.com/links/

http://www.wabi.tv/content/4007/National_News/

http://www.freewebportal.net/

You can find more examples by searching for a couple of the headlines in one search.

Notice how the first article is from September 2 (over a month ago)?

When my web server (example 1) attempts to retrieve the feed from http://rss.cnn.com/rss/cnn_topstories.rss , they instead redirect to http://feedproxy.feedburner.com/rss/cnn_topstories.rss which has that older news.

They are purposely blocking me and many others. I guess they don't want people to use them as a source of news?

Does anyone know how best to work around this?

8,402 · October 8, 2008

The first RSS feed doesnt redirect and shows today's news (Oct. 8) for me.

2,865 · October 9, 2008

You are right. It does. And if you click on the second one, you can see the difference. My guess is that only certain IPs are blocked.

My computer's IP can get to the first link, my web server is always redirected to the second link.

So CNN is blocking access from those three sites and many others to their current updated RSS feed info.

3,529 · October 9, 2008

I would guess you were hitting the feed too often and an automated system stepped in a redirected you to an older cache as a method of reducing load on the main feeds.

2,865 · October 9, 2008

Their TTL is only 5 minutes. I doubt that was the issue, but nevertheless, we are blocked.

Also, this automated system is also blocking hundreds of other publicly accessible sites.

More examples:

http://www.losangelesdailynews.net/

http://newtownhighschool.org/index.php

http://www.scary-software.com/RSS/

http://www.4seasonswireless.com/news.php

News is over a month old?!

http://www.google.com/search?q=%22Bush+to+...nvestigators%22

October 13, 2008

Could it be that they're blocking specific clients? ie, browser based readers are ok, and identified as real people consuming the news, rather than a script scraping the news to be displayed on another site.

If you can, might be worth editing whatever script you use on your site so it identifies itself as a browser based reader, see if that works.

2,466 · October 13, 2008

I would guess you were hitting the feed too often and an automated system stepped in a redirected you to an older cache as a method of reducing load on the main feeds.

Bingo. You might want to cache the stories for at least an hour before hitting their server again. Lots of places limit how often an IP address can hit their feeds. In all honesty, it's a good practice.

2,865 · October 14, 2008

That's all well and good, but I set up caching a few weeks ago now, and they're still blocking me.

And the blockage is based on my server's IP address. When I try to go there in lynx on that server, it also gets the old data.

20,536 · October 15, 2008

http://feeds.feedburner.com/rss/cnn_topstories.rss

feedproxy just seems out of date.

2,865 · October 16, 2008

That is the true. Feedproxy is out of date.

And my server is getting redirected to the out-of-date feedproxy data.

How do I make them either get updated feedproxy or get around their redirection?

October 16, 2008

What IP / ISP are you coming from - I have gone straight to http://rss.cnn.com/rss/cnn_topstories.rss and get todays stories no problem.

How are you fetching the feed - via browser or code?

If its code, try looking at PHP and CURL. Using that, you can imitate a true browser visit so hopefully the feed will give you the proper version.

If you need a hand with the curl stuff shout!

October 17, 2008

That's all well and good, but I set up caching a few weeks ago now, and they're still blocking me.
And the blockage is based on my server's IP address. When I try to go there in lynx on that server, it also gets the old data.

It could still be client-based redirection.

I'd add a user-agent header to the request to mimic a real browser and give it a try.

October 17, 2008

I just did a little test. My first request without a browser user agent string was redirected to http://feedproxy.feedburner.com/rss/cnn_topstories.rss. My second request with the UA of "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.30 Safari/525.13" was not redirected and retrieved http://rss.cnn.com/rss/cnn_topstories.rss.

It's client-based redirection. Add a user agent string to the request headers and you'll be fine. Also don't test things in Lynx and expect accurate results ;)

20,536 · October 17, 2008

That is the true. Feedproxy is out of date.
And my server is getting redirected to the out-of-date feedproxy data.

How do I make them either get updated feedproxy or get around their redirection?

http://feeds.feedburner.com/rss/cnn_topstories.rss

Tried that?

2,865 · October 20, 2008

I'm using the NewsParserX snippet with modxcms.

I'm not familiar enough to know how to change the headers that the script uses when it pulls the data...

Although I do see that they updated their feedproxy data to have yesterday's "news", so that's some progress.

Sign In

CNN blocking RSS access for lots of sites

Question

semifamous

Link to comment

Share on other sites

14 answers to this question

Recommended Posts

Doli

Link to comment

Share on other sites

semifamous

Link to comment

Share on other sites

Kudos Veteran

Link to comment

Share on other sites

semifamous

Link to comment

Share on other sites

CrispCreations

Link to comment

Share on other sites

Josh

Link to comment

Share on other sites

semifamous

Link to comment

Share on other sites

The_Decryptor Veteran

Link to comment

Share on other sites

semifamous

Link to comment

Share on other sites

fatboyuk

Link to comment

Share on other sites

Son of Hook

Link to comment

Share on other sites

Son of Hook

Link to comment

Share on other sites

The_Decryptor Veteran

Link to comment

Share on other sites

semifamous

Link to comment

Share on other sites

Recently Browsing 0 members