One of the amazing aspects of being a technical co-founder of a successful scale-up company is that many people from companies who are a couple of years behind the Signal journey are actively seeking advice from people like myself. One of the most recurrent topics of these discussions is how to create a functioning and impactful Data Science team and I have decided to write my current thoughts on the topic, following the principle that “once three different people ask you about something, you should write a blogpost about it”.
How to effectively structure Data Science teams is not a new topic and the described approach is the result of a myriad of conversations with many people in the community, learnings from other people’s experience (see here and here) and the direct experience of applying these approaches in Signal in order to maximise our impact.
We experimented with multiple alternatives in the past and we discovered many of their drawbacks. At some point, we had a dedicated research team (formed just by researchers) working on techniques related to the main problems faced by the company. Although there were several benefits, the main drawback was that the team was detached from the product team (and the clients) and many of the interesting work we did was never integrated into the product because it was not as high priority anymore. At the other end of the spectrum, we explore the idea of all researchers being integrated into product teams. In this case, the main challenges were the quickly changing priorities at the time and the different cycles of research and development that caused frustration on some members of the team and it made us slow down our most innovative lines of research. Over the next years, we adapted, improving our solution incrementally in a way that allowed us to continue generating value to our users with cutting-edge research.
Signal is a growing company with real use-cases and Machine Learning at its core. In order to remain competitive (following the agile principles) we have to be able to produce quick iterations to measure the impact and value of features and improvement and correct the course if needed. On the other hand, for strategic initiatives, we need to create radical innovation which is only possible by (similarly to academia) thinking on long iterations of research. Solving this dichotomy is the key to sustainable innovation and the key to our success.
In order to achieve quick iterations, data scientists should be as close as possible to the problem and be able to produce prototypes that will be validated by final users to measure their value following the classic Build, Measure, Learn cycle. We have found that the most effective way to do this is to embed data scientists in product teams and work alongside product and engineer people. The main goal is to understand if we are solving the right problem, and how much are we improving since the last iteration. In addition, in order to maximise their impact, these data scientists should be “full-stack”, being able to work at almost every level of the end-to-end product development cycle, focusing on user-centric metrics and iteration. Another learning was that having always at least two data scientists in each team reduces many potential problems and increases their impact and knowledge sharing significantly.
Assuming we have a clear problem that needs further improvements and a reasonable metric to optimise (how to choose metrics and the fallacy of assuming all “research” metrics relate to user value will eventually be its own blogspot), creating an “R&D” team focusing around this is a great solution that will address the long iterations goal. This team should be formed by a number of researchers and, depending on the company infrastructure and the availability of tools, some ML/Data Engineers. This team can focus on more ambitious research goals and, hopefully, produce breakthroughs over time. An important note is that despite being a long-term focused team, slicing the final problem in relatively small increments and iterations are still important and will help communications company-wide, as well as provide a sense of progress. Another important factor is that there must be a plan on how to incorporate any improvements done by the team into the product. One important aspect for us now that we have researchers across multiple teams in the company is to have regular (i.e., weekly) Research catch-ups (in the shape of a guild) where all data scientists and researchers meet to pursue and discuss more pure-research initiatives (e.g., conference organisations, publications, specific challenges they face, new ideas and papers, …), as well as learning from each other.
A very important implication, for me personally and for Signal as a company, is that these “long iterations” create cutting-edge research that not only are creating valuable IP for the company, making us stay ahead of the competition, but also allows us to continue publishing and being a member of the academic community. In addition to be able to give back to the community, this increases our reputation as a great company doing real applied research making us an attractive company to work in.
Solving the dichotomy between short and long iterations is the key to sustainable innovation and the key to our success. Our current approach is a hybrid model that consists in embedding full-stack data scientists in product teams to verify the value of new ideas while having R&D teams that allow us to create IP and pursue significant improvements on the models for high strategic initiatives. All of this while keeping the feeling of a research unit company-wide via using the guild and the weekly meetings, without which the complete solution might suffer.