Bus Factor Study on Libre Health, Mifos, Open Data Kit, and Hot Projects

[Warning - In order to have an update view of the data, please check the reply to this post. That reply also contains a clarification in some of the initial numbers. This post is left as it is for tracking purposes.]

The Bus Factor is “a measurement of the risk resulting from information and capabilities not being shared among team members”. Open source communities are a great place for collaboration and its in the roots of any of them. This brief post will analyze this risk-related factor for the existing set of projects analyzed so far at the OSCAaaS (OSC Analytics-as-a-Service).

Last year we had the opportunity to run an ecosystem analysis of these four projects and work at different levels. These are activity, community, and process. The Bus Factor is under the Organizational Diversity, and this belongs to the community layer. The following are the numbers on the 21st of November 2019 and for a year.

While the following ones were retrieved today for the last year using the OSCaaS Dashboard. There is certain overlapping time, but this helps to see the evolution of these numbers.

There is a clear concentration of activity in the 4 projects. In this case the Bus Factor is defined as the number of contributors producing up to 50% of the total number of commits. If we have in mind this definition, this indicates that 50% of the total commit activity is done nowadays by less developers than some months ago.

The less people producing this 50% of the commit activity, the higher the risk that the project may face certain sustainability difficulties if a developer decides to leave the community.

However, how is this ecosystem performing if compared to others? For this, we run another analysis to bring more context to the discussion. Wikimedia Foundation, OPNFV, GitLab, and Kata Containers were selected as part of this. There are huge projects as Wikimedia and smaller ones as Kata Containers.

We can see how depending on the project, numbers are much higher as in GitLab, or much smaller as in OPNFV with a Bus Factor of 1.

If a community shows a high concentration of development activity in a few hands, this may mean different things. Other parameters are needed as for instance developers commitment, the size of the community and number of repositories, or even the development velocity. What is important is to follow the progress in this case of the KPI. Numbers are decreasing, but given the size of the projects, the risk seems to be under control.

Remember, if you are interested in running similar analysis for your project and you want your project to be part of further analysis, please reply to this post and let us know! On the other hand, if you are part of the team developing any of the projects here represented, it would be good to know about your thoughts on why this evolution and how this may affect the sustainability of the project.

Quick update here!

We have realized that the compared numbers in this post are not fully comparable.

On the one hand we have aggregated all of the existing pieces of infrastructure in one dashboard, that is dial.biterg.io. There you can see the several data sources involved in the development of each of the projects.

When extracting the numbers for the period 2018-2019, those took into account all of the data sources. But in the period 2019-2020, we only took into account the Git activity. It is true that the Git repositories are important when analyzing a community, but there are others, and in this second step we didn’t aggregate those.

This means that the data representation and the delta between the two periods is slightly different. As we aggregated more than one data source, the Bus Factor considered in the first period of analysis all of the data sources. And in the second period of analysis the Bus Factor only considered Git repositories.

Thus, less data sources means less people, and thus a more concentrated activity in the different projects.

Real numbers are not that different and they are in reality quite similar in the case of those under the umbrella of DIAL. The following ones are the Bus Factor for the DIAL projects in the period 2018-2019. There is indeed only 1 main change and this is the Bus Factor for the Hot project.

However, we did the same analysis for the other four projects to compare to, and we saw some interesting evolution. The following is the analysis of these projects for the period 2018-2019.

In this case we see an evolution in:

  • Wikimedia concentrating a bit the code development (from 40 developers producing 50% of the activity down to 37).
  • OPNFV concentrating at least 50% of the code activity in just one developer where the had 2 developers.
  • And finally GitLab with the opposite movement, having a much bigger Bus Factor (moving from 73 developers producing 50% of the activity up to 107).