[Python] Filtering Spammy strings

January 14, 2015

I am currently working on an analytics project where I have hundreds of thousands of social media interactions in JSON format. The most essential field I need is a "post" field, which contains what a user has posted (facebook post, tweet, etc). So suppose I had a bunch of posts:

{''post":"gBuy Exclusive hMedz Todayl"}

{''post":"We offer free pdf ebooks of In Town: Contemporary Design for Urban Living.pdf download free"}

{''post":"ONE58PRINCEOFWALESROAD - ONE58 PRINCE OF WALES"}

{''post":"The Ingrid Pitt legacy | trinkelbonker, First title is

4,678 · January 14, 2015

Is this a homework assignment?

January 14, 2015

Is this a homework assignment?

No, this is for my work.

1,699 · January 14, 2015

Spam is going to be very suggestive, how should the aggregator know if you are interested in cheap meds or not?

Firstly I would think about how you define legitimate interactions and see if there is an easy way to filter based on that. For example Facebook messages sent from a person who is not a friend of the recipient may not be interactions you care about and so can filter.

With the machine learning approach you are going to need a lot of spam messages that you manually classify as spam. Some of these papers might be useful.
For example this paper used "Around 25K users, 500K tweets, and 49M follower/friend relationships" from twitter to classify spam. So you really do need a lot of data!

8,753 · January 15, 2015

Googling "spam filter API" returns lots of interesting results, none that I have tried. http://spamcheck.postmarkapp.com/doc has many language bindings and seems easy to yse.

January 19, 2015

Thanks for all the suggestions. Since this is just for a proof of concept project I was told that I can just search for typical spam terms "special offer", "viagra", etc in the string is fine for now. But in the future I will look into more complex libraries.

Sign In

[Python] Filtering Spammy strings

Question

devmap

Link to comment

Share on other sites

5 answers to this question

Recommended Posts

+Red King Subscriber²

Link to comment

Share on other sites

devmap

Link to comment

Share on other sites

Lant

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

devmap

Link to comment

Share on other sites

Recently Browsing 0 members

Company

Community

Social

Partners

Forums

News

Features

More

Themes