Why 90% of machine learning models never hit the market

Corporations are going through rough times.

And Im not talking about the pandemic and the stock market volatility.

But theres a problem.

Why 90% of machine learning models never hit the market

Companies cant just throw money at data scientists and machine learning engineers, and hope that magic happens.

The data speaks for itself.

AsVentureBeat reported last year, around 90 percent of machine learning models never make it into production.

Woman sitting in front of computer screen which shows the words “code is beautiful”

40% off TNW Conference!

And the data scientists arent the ones to blame.

Companies are hiring, and theyre ready to pay a good salary, too.

One man and two women sitting and talking at table with a laptop on it

For the moment, however, theyre not making it easy to do so.

Lacking access to data

Companies arent bad at collecting data.

Data scientists, on the other hand, often need data from several departments.

Doctor holding stethoscope to computer screen depicting lines of code

Siloing makes it harder to clean and process that data.

Moreover, many data scientistscomplainthat they cant even obtain the data they need.

But how should you even start training a model if you dont have the necessary data?

Siloed company structures and inaccessible data might have been manageable in the past.

In many companies, theres a fundamentaldividebetween the IT and data science departments.

IT tends to prioritize making things work and keeping them stable.

Data scientists, on the other hand, like experimenting and breaking things.

This doesnt lead to effective communication.

In addition, engineering isnt always deemed essential for data scientists.

For one, the hardware or cloud storage space to handle bigger datasets might not be available.

Finally, data sourcing may not be easy or even possible.

This is yet another reason to unify data structures across organizations, and encourage communication between different departments.

For example, a software engineer might have a go at implement what a data scientist told them to.

The latter might go ahead and do some of the work themselves, too.

Not only is this a waste of time and resources.

This way, theyll save the companys time and resources.

It seems as if data scientists are still viewed as somewhat nerdy and devoid of business sense.

Of course, that doesnt mean that every data scientist suddenly needs an MBA to excel at their job.

However, some key learnings from classes or business experience might serve them a long way.

Some pipelines start in Python, continue in R, and end in Julia.

Others go the other way around, or use other languages entirely.

In addition, some pipelines might make use of containerization with Docker and Kubernetes, others might not.

Some pipelines will deploy specific APIs, others not.

And the list goes on.

Tools like TFX, Mlflow, and Kubeflow are starting to emerge to fill this gap.

But these tools are still in their infancy, and expertise in them is rare as of now.

Data scientists know that they need to keep checking out the newest developments in their field.

This should apply to model deployment as well.

In addition, datasets may drift over time.

Thats natural as companies and projects evolve, but it makes it harder to reproduce past results.

In combination with diligent version control, data scientists can get their models reproducible.

Change doesnt come from one day to the next.

Once managers have outlined a clear and simple project, the second step is to choose the right team.

Third, managers should consider leveraging third parties to help them accelerate at the beginning.

A final caveat is not to strive for sophistication at all costs.

The widespread adoption of artificial intelligence is only one of many growing trends.

Im deliberately speaking of decades and not years, though.

On the flip side, clouds took several decades to gain widespread adoption.

Theres no reason to believe that the AI revolution should be any different.

It will take a while to implement because the status quo contains a host of obstacles to tackle.

This article was written byAri Jouryand was originally published onTowards Data Science.

you’re able to read ithere.

Lacking access to data#

Also tagged with#

Lacking access to data

Also tagged with