Working successfully with ML and AI

During the AI EXPO Europe, the brightest minds from companies such as ABN AMRO, KLM, TomTom, ING, and Shell came together to discuss their experiences and progress in the field of data science, machine learning, and artificial intelligence. The following questions were discussed: How do you start an ML project? How do you ensure a successful data project? How do we make the results transparent for end users? HROffice was also present and would like to share the key insights with you. 

Hoe zet je op de juiste manier een machine learning project op?

Before you start any data project, you have to answer a few questions. While these questions will necessarily differ per project, they all aim to help you get to the heart of the problem. This was best expressed by Pascha Iljin from AkzoNobel, who always asks himself five questions before getting started:

  1. Why do you want to predict something?
  2. Why do you want to understand the future better?
  3. Why would this help you?
  4. Why isn’t it possible now?
  5. Why were your best attempts not good enough?

The first two questions were designed to define the first high-level objectives. While developing our e-recruitment platform GoToMyJob, our answer was that we wanted to give recruiters access to a predictive recruitment process.

The third question taps into the heart of the objective: what do you want to achieve and do you really need an ML model to achieve it? This is one of the most important questions, as it shows you how the predictions will be used. For example, we use our predictive ML model to optimize our campaigns and to make sure we recruit enough candidates. Having the right number of candidates benefits your efficiency and your time-to-hire.

The last two questions are the most confrontational, as they expose the company’s weaknesses. In some cases, a company uses a complex system developed by a former employee who no longer works there, meaning no one knows how the system works. In most cases, however, the company simply lacks the knowledge to link the data flows and present them in a clear and uniform way.

What are the prerequisites for an ML project?

Once you’ve collected the necessary information, you can start thinking about what you need to make the project successful. There are four prerequisites to a successful predictive model:

The right people
All companies interested in ML or AI must have an in-house data scientist and statistician. Without them, it’s virtually impossible to set up a good model. Companies that don’t have these people but claim to be active in the field of ML/AI shouldn’t be taken at their word.

The right tools
All companies interested in ML or AI must have an in-house data scientist and statistician. Without them, it’s virtually impossible to set up a good model. Companies that don’t have these people but claim to be active in the field of ML/AI shouldn’t be taken at their word.

Access to high-quality data
Bad data generates bad predictions. Reliable data is the foundation of all good models, and while ML can certainly work with rawer data than ‘normal’ statistical models, it can’t work miracles.

A process framework for automated modelling
To ensure that everyone can use the model, it has to be user-friendly and integrated in the end user’s workflow.

Machine learning

The data science process:

  1. Investigation

All models need a solid foundation to start with. This foundation is based on sound measurements and the proper interpretation of the results. The first two steps are often hugely valuable, as they give you a detailed overview of the company’s processes. You can then use this to run reports and benchmark your efficiency. Examples include accurately measuring where candidates come from and accurately calculating the lead time from receiving a candidate’s application to the recruiter sending his or her response.

  1. Interpretation

Once you’re sure your measurements are accurate and you’ve filtered out all bad data, you can start the modelling process. In this stage, you will compare different aspects of the most successful machine learning models. Each model can be ‘trained’ in various ways, and it’s crucial that you know what you’re doing. There are different standards and methods you can use to assess a model, each with their own advantages and disadvantages. During the modelling process, you’ll face an interesting dilemma: designing a model that is both detailed and easy to interpret. The more input you provide, the better the model becomes. Yet it also becomes more complex and harder to understand.

  1. Modelling

Interpreting results and optimizing models is what data scientists live for. I’ve trained more than seven different machine learning models and computers for Adver-Online. Neural networks, random forests, bagging, boosting, extreme boosting, support vector machines, and GAM: I’ve seen it all in my search for the perfect model.

  1. Automation

Every situation is different, which is why you should find out which model works best with which type of data before you start. While rough guidelines exist, in the end you’ll have to create your model based on trial and error. I discovered that XGBoost was the best fit for us. Once the model is made the process can be automated by continuously adding new measurements to the data pool and automatically training the model periodically.

What's the take-away?

Rome wasn’t built in a day and neither is an ML model. Projects like these take months of hard work, and roughly 20% never make it to production. This doesn’t mean that failed projects have no value; after all, the journey is just as important as the destination. Along the way you stumble upon insights that benefit everyone in the company. The most important thing is to keep it simple. A complex model may make good predictions, but it doesn’t provide insights, isn’t reliable, and is hard to maintain when someone leaves. Keep it simple!