A word on accuracy

Sometimes, throwing more processing power or finding more data or applying another trick of the AI practitioner’s book can improve the accuracy of a model.

However, always keep in mind that you may be overfitting, or even that the data can not be linked to the expected results in 100% of the cases

Little things are needed for people to get offset by a product or idea. They got in a fight with their partner, they are very busy at work, or they just have something else on their mind. This will influence their buying decision greatly but will not necessarily transpire from the data in your data set as you probably don’t have this information in your data set.

So keep in mind that a model’s accuracy is capped by the variables you have in your data set, and that sometimes you may need to be content with a 85% accuracy…

Machine learning vs Deep learning vs AI

All too often these buzz words are used interchangeably, while -in fact- the Deep Learning field is a subset of the Machine Learning (ML) field. And the ML field in turn is a subset of the AI field.

Here’s how they relate to one another:

Note that Machine Learning is but one subfield of AI there are many more. But the subfield getting the most buzz these days (thanks to its fantastic advances) is Deep Learning.

Looking at Google Trends reveals to us that the search term Deep Learning is less popular than the search term Machine Learning across the globe, except in China.

Of course, that may be a consequence of the great firewall blocking searches for “deep learning” for some strange reason 🙂


Deploy Spacy.io on AWS using CloudFormation

The objective: automatically deploy Spacy on EC2 while creating a VPC, some SubNets, LoadBalancers and so on.  

Why would you do this if you can do it using server less technology? One word: performance.

Here’s a diagram representing the architecture we’ll instantiate:

And using LaunchConfigurations and CloudFormation we build this in a matter of minutes.

The LaunchConfiguration first performs software updates using yum. It then downloads a jar file containing a Jetty server class that gets launched and listens to http requests.  They then get proxied by the API Gateway.

You have to make your own proxying Java class, but other than feel free to get inspiration from the rest of the template, here:


In search of a good NER solution

What is good?  Depends on what you’re looking for, right?

Well, here’s what I’m looking for:

  • inexpensive to start using and preferably usage based billing
  • Easy to deploy and configure
  • Multi-lingual: English, French, Dutch.  More languages are a bonus.
  • Easy to extend: if the Entities don’t exist it should be possible to define and add them