Skip to main content



What Makes for Effective Social Coding

It’s pretty cool how useful graphs are when modelling social phenomena. Though the exact interactions between people are incredibly complex, having something visual and concrete to analyze certainly does help draw interesting conclusions about the way these interactions work, or why they act the way they do.

As someone doing CS, I’m a pretty big fan of Git. So it’s always a pleasant surprise whenever an article comes up about it. In this case, a team from the Nara Institute of Science and Technology and NTT Communication Science Laboratories analyzed the characteristics of hundreds of thousands of collaborative projects on GitHub in an attempt to better understand exactly what separated the successful projects from the projects that ended in failure. In their study, the group quantified the success of a project using three indices of success:¬†Commit, the number of commits made to a repository; Star, the number of developers who bookmarked the project with a star; and PullReq, the number of times external developers requested that their code be accepted. These indices were then normalized over the period of time a project went on for.

In their study, the team analyzed the effect of team structure on the overall outcome of their project. To qualify team structure, they constructed a collaboration network representing the relationships between the internal members of a development. An edge is drawn between two members if they have collaborated together on at least one project outside of the given project. Given this construction, the correlations the team found were somewhat surprising.

As it turns out, projects with more internal members tend to be more active, popular, and social. That is, there is a correlation between the three success indices and the number of internal members. That part alone probably isn’t very surprising. What is surprising though, is that while a higher edge density was correlated with a more successful project, a higher average shortest path length between any two members in the graph was negatively correlated with the success indices. In other words, there was evidence that internal members being more highly connected led to more successful projects, but only up to a certain point. Hence, according to the study, it would be better to form a team where some members know most (but not all) of the other members, with several members serving an important role in connecting all the members together, rather than have a team where all the members know each other.

What really makes that interesting is that normally we’ve been seeing examples where maximizing connections in a network (or edges in a graph) led to some positive outcome (having lots of friends, for example). So it can be quite unexpected, to see that not all networks are like that. Furthermore, when considering the efficiency of a team, which was determined through the relationship between team size and the number of commits, they found that up until around sixty members, the efficiency was highest in a team of just one member. Additionally, ¬†efficiency drops significantly once team size goes past sixty. So perhaps it’s just that it’s hard to coordinate tasks and efforts in a group that big so that efficiency starts going down, as more team members end up doing nothing while they wait to be given a task.

http://www.technologyreview.com/view/530511/data-mining-reveals-how-social-coding-succeeds-and-fails/

http://arxiv.org/pdf/1408.6012v2.pdf

Comments

Leave a Reply

Blogging Calendar

September 2014
M T W T F S S
« Aug   Oct »
1234567
891011121314
15161718192021
22232425262728
2930  

Archives