Author Archives: miguelmalvarez

How to attract and retain the best research talent

During the last 4 and a half years, my main responsibility in Signal has been to create a world-recognized research team and to make sure we are always at the cutting edge of research. This piece summarises some of the learning over these years on how to attract and retain the best talent in the Data Science and Machine Learning community.

Every business hopes that the application of Machine Learning and Artificial Intelligence will open new opportunities for them, but this has lead to a spike in demand for AI expertise which is now significantly greater than the available talent. This has caused the salary for these roles to sky-rocket, especially in the US. We are also seeing a “brain drain” from top academic institutions to tech corporations that can afford to pay six-figure salaries to attract the best minds. However, I believe that some companies misjudge how to attract the best talent in the space. Every person has different priorities and motivations, and while money is obviously important, it is commonly not the dominant factor for data scientists.

Certainly, some of the best talent ends up in the largest tech companies (e.g., Google, Tesla, Apple), receiving lucrative salaries, but I would argue that salary is often not the key motivator. Big tech companies tend to offer a workplace with huge amounts of data and infrastructure, while start-ups tend to offer freedom, flexibility, direct input on the company’s products and services and the possibility of disrupting a market. Technologists are attracted to workplaces that offer interesting problem-solving opportunities, where they are surrounded by people who they respect and can learn from. If the salary and overall benefits meet a certain threshold, you might be able to attract and retain talent even when competing with much higher salaries at larger corporations. The reasons for choosing an employer are varied but include the following factors:

  • Large and complex datasets
  • Prefered technologies or methods (e.g., Reinforcing Learning)
  • Groundbreaking products and services
  • Solving a problem of personal interest
  • Projects with talented coworkers.

For instance, at Signal, we have access to “the world’s news”: hundreds of millions of articles are available for us to view and to analyse how the world perceives different events, brands and topics. This is an amazing dataset for analysis that has attracted talented people into the company. Many are attracted by specific technologies or areas of research, either because they want to develop expertise or because they believe in its potential. Using Machine Learning has been a major factor in the talent acquisition for Signal, but we have also failed to hire some brilliant researchers as we were not using some specific technology in our stack at that time. The sector you are operating in is also an incredibly important factor. For instance, some may want to work at companies related to education or animal conservation, companies such as charities, NGOs or government. Lastly, the product or service itself is critical because we are, after all, creators and want to be proud of what we build. Attracting talent will always be easier if your product is amazing (or you have plans to make it so), in the eyes of data scientists.

From a personal point of view, I like to put myself in the shoes of each person working in our Data Science team and understand what “success” means for them on an individual basis. In general, if we have the freedom to work on interesting problems, with autonomy and great people around us to learn from, we will be happy. Nonetheless, we are all slightly different and have different personal goals. For instance, some individuals put more focus on learning (especially those early in their career), while others wish to be part of the academic community. For them, working in a company that encourages attending conferences and publishing papers is key. Focusing on these two cases, you will lose people in the former case if they do not believe they are learning enough, and you will lose people in the latter if they do not have the opportunity to work on and publish good quality research.

At Signal, we have been very active in the community since our inception. Not only have we been involved in many London meet-ups (e.g., Text Analytics and PyData) but we have also published several research papers, and conducted workshops as well as attended many academic conferences.

Our drive for innovation and research is demonstrated by our “Visiting Researcher” programme. We have always believed that the best research comes from universities and that the best way to compete with our peers is to attract talent from these institutions in the most effective way possible. We constantly try to identify the best and most promising people in specific areas of research. In many cases, this includes Ph.D. students struggling to decide what to do next in their careers (i.e., start-up, big company or academia). Signal provides them with 6-12 months work experience at a startup that does applied research to be used in a product by real, paying customers. At the same time, they benefit from the fact that the research they do is also publishable and shareable in some form. In return, the company gets the best-in-class researchers for our team to learn from, as well as IP and specific components built during their visit. We also have a very strong MSc program, mainly with UCL and the University of Essex, where we propose and supervise specific research MSc projects related to our area of expertise.

Both of these approaches produce not only IP, publications and new components for Signal but have already resulted in a “domino effect” as our alumni recommend us to colleagues in countries as far as Australia. I am incredibly proud of the fact that in the last four years, we have had 23 people in our research team in some capacity. The best way to grow your team is to treat current and past data scientists well, and to strive to be well-respected in the community. This also includes the people who, for whatever reason, decided to leave the company. No matter how good a company or manager is, people decide to move on to find new challenges and adventures. You have to ensure the leaving process is handled in the best possible way; a great recommendation from someone who left your business is like recruitment gold dust.

In summary, attracting and retaining talent in AI is not easy, and it takes time and effort. You must create an attractive opportunity that sparks the curiosity of each candidate in an environment where they can learn and flourish. In my opinion, one of the best ways to find this talent is to be a proactive and well-respected member of the community.

Note: A similar version of this post has also been published in the Signal company blog.


How can Machine Learning and AI help solving the Fake News Problem?

The term “fake news” was almost non-existent in the general context and media providers prior to October 2016 but times have changed and I would not be surprised if you have heard the term being used today, in the news, the radio or just in the street.

Fake news is a term that has been used to describe very different issues, from satirical articles to completely fabricated news and plain government propaganda in some outlets. Fake news, information bubbles, news manipulation and the lack of trust in the media are growing problems with huge ramifications in our society. However, in order to start addressing this problem, we need to have an understanding on what Fake News is. Only then can we look into the different techniques and fields of machine learning (ML), natural language processing (NLP) and artificial intelligence (AI) that could help us fight this situation.

Continue reading

Could the start-up scene in the UK collapse if we do not create a suitable visa program for europeans?

London is arguably the tech-hub and heart of the start-up community in Europe, with some people even comparing it to Silicon Valley. However, I am afraid that this incredible opportunity could disappear depending on how we manage some operational details once the UK is outside the EU.

Continue reading

Classifying Reuters-21578 collection with Python

A long time ago I published a blogpost explaining how to represent the Reuters-21578 collection (and more in general, any textual collection for text classification). However, that blogpost never explained how to perform the classification step itself. This post will introduce some of the basic concepts of classification, quickly show the representation we came up with in the prior post and finally, it will focus on how to perform and evaluate the classification.

Continue reading

My first PyCon

I have recently come back from my first ever Python Conference (PyCon), and in fact, my first ever generalistic development conference. This was quite a new experience as I am used to either academic (e.g., ECIR) or data-centric (e.g., Strata) conferences. PyConUK was quite different in many ways to the events I am used to, and I could not be happier I have attended it. The main reason is that Marco Bonzanini and myself had a workshop on Natural Language Processing in Python during the conference, but I also saw this as a great opportunity to get involved in a community that I have never been close to, despite the fact that I have coded in Python (intermittently) for several years.

Continue reading

Elasticsearch and Clojure: Getting Started

Search is omnipresent these days, from the moment we type a set of keywords into our favourite search engine to find a webpage we are looking for to the moment we type a name and expect our email client to find all the emails sent by that person. Both these processes are based on years of research and experimentation in the field of Information Retrieval in order to efficiently being able to find the most relevant documents.

This blogpost will show how to set up Elasticsearch, one of  the best and most popular search engines (with Solr being the other main alternative). Its main characteristic is to allow unbelievable scalability and advance querying and indexing capabilities with minimum engineering effort. In addition to this, I will also shown how to perform some  basic operations using elastisch, a fantastic library for elasticsearch written in Clojure.

Continue reading

NewsIR 2016

After a long organisation process (explained in my last blogpost), the workshop on Recent Trends in News Information Retrieval (NewsIR) finally took place during the European Conference on Information Retrieval (ECIR) a couple of weeks ago in Italy.

Continue reading