Theorizing the Instagram Explore Algorithm : Networks Course blog for INFO 2040/CS 2850/Econ 2040/SOC 2090

Theorizing the Instagram Explore Algorithm

Our discussion of ranking algorithms such as PageRank and hubs and authorities can be applied to a variety of topics, including social media algorithms such as Instagram. Although these algorithms are often proprietary so that users cannot easily exploit them, we can begin to theorize what a simple version of their ranking algorithm might look like. Before we do this, let’s draw some similarities between Google searches and Instagram. For the purposes of this discussion, we will examine Instagram’s Explore page rather than the normal feed, as it is most comparable to Google searches. Instagram’s explore page shows users a variety of posts that come that they might be interested in from pages that they do not follow. For example, a user who has demonstrated interest in food might see food-related content in their explore page. This demonstrated interest is determined by past user activity, including likes, comments, and saved posts (Mosseri 2021). Content that Instagram believes a user might be interested in is placed into a set of posts that can then be ranked. This can be compared to the set of websites compiled by a Google search given a specific keyword or set of keywords. Given this compiled set of data, both Google and Instagram must figure out a way to rank them before showing them to the user. As we explored in class, Google makes use of a PageRank algorithm. Instagram’s explore page operates on slightly different principles than Google’s search engine, so it makes sense that Instagram will have a different algorithm. Adam Mosseri, the head of Instagram, gives some hints to how this algorithm might work. He explains that there are four major signals that Instagram looks at in order to rank which posts will appear at the top of the Explore page. The four major signals ranked in order are: 1) Information about the post, 2) interaction history with the person who posted, 3) personal activity, 4) information about the person who posted. In order to keep this blog post relatively brief, we will focus on the first signal: information about the post.

In order to determine how this signal might be evaluated in a ranking algorithm, we must posit a kind of network structure between accounts on Instagram. We can first look at every account on Instagram as a node in our network. Every Instagram account has a number of “followers” and “following.” In our network structure, these numbers correlate to the number of incoming edges (followers) and outgoing edges (following). The more incoming edges a node has in this network, the more popular it is. The number of outgoing edges, however, is relatively inconsequential in our network (the concept of “hubs” is not as relevant in this network). The popularity of a node has some interesting consequences on its “authority.” Just because an account has a lot of followers does not mean that it has any particular authority that would explicitly raise the rank of any account/post that they interact with. However, it does increase the likelihood that many other accounts will interact with the post. For example, if Cristiano Ronaldo (270 million followers) shared a post, there is a high possibility that many of those followers would also interact with that post, causing it to be more popular.

Having proposed this network structure of Instagram, we can begin to consider how Instagram would rank posts in the Explore page based on information about the post (signal 1). Mosseri suggests that post information consists of “how many and how quickly other people are liking, commenting, sharing, and saving a post” (2021). Let’s split this into two sections: the “private” and the “public”. Interactions such as likes, comments, and saves are “private” because an account can do all of these things without explicitly announcing it to their followers. Sharing a post, on the other hand, is “public” because it allows an account to announce their interaction with a post to their followers. Thus, when we begin to structure a process for ranking Instagram posts, we know to place an appropriate weight on public interactions as they could also convince others to interact with the post. The second thing we can notice from Mosseri’s hint is the prevalence of time in these interactions. The quicker a post is obtaining interactions from accounts, the more likely it is to be popular and rank higher on the Explore page. This time factor needs to be incorporated into our theorization.

We also need to recognize the difference between an account and a post. The posting account is not necessarily related to the “information about post” signal, so we cannot exactly use the same structure to determine the ranking of an individual post. Instead, we can posit a substructure that occurs for every post on Instagram. In this substructure, there is one node that represents the post and a multitude of nodes that represent any account that interacts with the post. The post node only has incoming edges and the account nodes only have outgoing edges. If an account interacts with a post, there is an edge between the account and the post. There are two types of edges, public interaction edges and private interaction edges. If an account has an outgoing public edge, it is weighted based on the amount of followers that one has. This weight takes into consideration the potential for new interactions with the post as time progresses. If an account has an outgoing private edge, it is given a constant weight no matter the amount of followers.

The influence that an individual account can have on a individual post can then be expressed by the following equation given what we have considered so far:

Account_Influnce = nPub * nFollowers + nPrivate
nPub is number of public interactions and nPrivate is the number of private interactions
nFollowers is the number of followers that the account has. It is also the number of incoming edges to the account node in the larger network structure

In this equation, we consider an individual edge to be weighted at 1, with public edges being weighted in relation to the amount of followers the account has. We also should recognize that the influence that an account has on a post is temporary. After an interaction, the influence that an interaction has on the popularity of the post decreases over time. For example, if 30 accounts liked a post within a 30 minute time period, it might be considered pretty popular. However, if only 30 accounts liked a post over a 10 month period, it might not be considered as popular. Thus, the influence of an account on a post decreases over time. Additionally, the influence gained by sharing a post (public) decreases to zero after 24 hours (for simplicity, we only consider the effects of sharing a post on an Instagram story which disappears after 24 hours). Thus, we can adjust our equation to account for a decreasing time coefficient.

Account_Influence = (nPub * nFollowers * time_pub) + (nPrivate * time_private)
time_pub is a variable that starts at 1.0 and decreases to 0.0 after a day. Because of this, the influence of a public interaction decreases to 0 after a 24 hour period.
time_private is a variable that starts at 1.0 and decreases to 0.0 after a time period that the modeler can decide. If you wanted private influence to remain a factor in the popularity of a post for a week for example, you would change time_private such that it would decrease to 0.0 after a week.

To find the popularity score of a particular post, we simply sum all the Account_Influence scores of the nodes that point to the post node. Account_Influence is constantly changing, so popularity scores are only accurate at the exact time that the rankings are given.

In summary, Account_Influence scores are dependent on the number of followers that the account has, the number and type of interactions it has with the post, and the time elapsed since those interactions. Using these scores, a ranking can be determined based on the popularity of the post. Posts with high scores are more likely to appear higher and more frequently on a user’s explore page than posts with low scores. Posts that are shared by people with many followers and have had a large amount of interactions within a short period of time are likely to be ranked high on the list.

Unlike the hubs and authorities algorithm or the PageRank algorithm, this conceptualization of what part of the Instagram algorithm might look like does not include the concept of the explicit flow of power/popularity. In other words, there is not a direct link between the account influence of a particular account and the account influence of the account that posted. However, we can intuitively determine a way that successful posts could affect the Account_Influence of the posting account. If a post is particularly popular, it follows that more people may want to follow the account that posted it. Thus, a post that garners a higher popularity score (sum of Account_Influence scores) will also help the account associated with it gain more followers. This will also increase the account’s Account_Influence scores in the future.

To provide a visual description of the network described above, consider this situation:

In the situation described in the diagram, A5’s post gets a popularity score of 4.0. Note that there are certainly flaws in this algorithm design, namely the overestimation of the influence of public interactions in the calculations. Furthermore, it is not representative nor necessarily accurate to how Instagram actually runs their ranking algorithms. However, I think that it is an interesting first step into learning how ranking algorithms might work for different applications.

References:

https://about.instagram.com/blog/announcements/shedding-more-light-on-how-instagram-works

November 7, 2021 | category: Uncategorized

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Networks

Theorizing the Instagram Explore Algorithm

Comments

Leave a Reply

Blogging Calendar

Archives