Skip to main content



Spam-Purging Social Media Sites

This article, published by CNN, briefly discusses recent actions taken by Instagram in order to eliminate fake accounts, likes, and comments on pictures across the social media platform. The reason behind the move was millions of likes, comments, and followers from fake instagram accounts that were being run by some third party bot. The company utilized machine learning to remove fake popularity boosting from certain accounts. Users can purchase such popularity in the form of followers, likes and comments by paying a fee and providing their username to a number of suspicious sites which provide fake accounts for increased social support. Sometimes, however, the bots go after some users without their knowing or by unintentionally sharing their account information with one of these sites. The article mentioned that Instagram did not reveal the strategy behind the algorithm they used to identify suspicious activity, and simply commented that spam activity looks much different from normal activity. With this in mind, I decided to look for the possible steps they could have made to construct such an algorithm. 

Earlier this year, Twitter took a similar approach and purged millions of fake accounts that were involved in spam, fake likes, comments, and posts. As their approach has been thoroughly categorized and documented through similar purging events, I will discuss it from the perspective of a fairly well-characterized network model. In a study conducted by Sarita Yardi in 2010, she and her team designed a robot to carry out a spam analysis on Twitter using a specific algorithm, a design that is commonly used in many spam filters today. In a 2007 study, when Twitter was much smaller than it is today, the network consisted of 76,000 users producing 1.3 million tweets. That same study showed that spam was highly correlated with low levels of reciprocity on an individual account. One of the roadblocks in characterizing spam is that each individual user is not without their own personal reasons and intentions in using Twitter. Different users reciprocate friendships and tweets at different frequencies, thereby leading to a grey area for what is considered spam and what is not. In this Twitter study, the researchers aimed to answer/address 5 key questions into spam within the network:

 

  1. Does age of account differ between spammers and legitimate users?
  2. Do spammers tweet more frequently than legitimate users?
  3. Do spammers have more friends than followers?
  4. Are spammers clustered?
  5. Can spammers be located based on network structures?

 

Our class has primarily focused on the last two questions with regards to identifying a spam account in a network structure, as Professor Easley described using an email network. Email networks are surprisingly easy to analyze for spam accounts, as many spammers simply reach out randomly to members of a larger network who are not friends themselves. Furthermore, many spam accounts contain only unidirectional edges between them and the accounts they are reaching out to, which is indicated by a lack of reciprocity. With the case of Twitter, that same analysis is a bit more complex, but still uses the same guiding principles. In the Twitter study previously discussed, the researchers identified spammers using a hashtag which the researchers created in an attempt to track the online discussion and users who were using the hashtag. The easiest way of identifying spammers was by looking at tweets that contained both the hashtag and links to an external URL (most of which went to commercial spam sites). Other behaviors that were identified among spammers was the use of more than one hashtag on disparate topics, letter and number patterns in usernames, and suggestive keywords (i.e. emotional, sexual words and clickbait). Using this guiding principles, the researchers were able to identify spammers with 91% accuracy among 300 randomly sampled tweets. In answering the last two questions of their original goal, the researchers concluded that there was no significant clustering among spammers (i.e. spammers did not cluster among themselves with few outlinks), but they did offer their own specific network structure. In most cases, spammers contained similar numbers of out-links to other spammers and regular users, with very few links to celebrities (suggested by random following algorithms), while legitimate users did not reciprocate the links back to spammers and had a higher number of out-links to celebrity accounts.

Using this information gathered in this study, we can make some similar predictions for the types of things Instagram looked for in their most recent spam-purge even though they did not publicize their strategy.

 

Instagram cracks down on fake likes, follows and comments: https://www.cnn.com/2018/11/19/tech/instagram-fake-likes-comments/index.html

Detecting Spam in a Twitter Network: https://firstmonday.org/article/view/2793/2431

 

Comments

Leave a Reply

Blogging Calendar

November 2018
M T W T F S S
 1234
567891011
12131415161718
19202122232425
2627282930  

Archives