Did you knowNeural is taking the stage this fall?

Machine learning is becoming an important tool in many industries and fields of science.

40% off TNW Conference!

The dos and don’ts of machine learning research — read it, nerds

Pay extra attention to data

Machine learning modelslive and thrive on data.

And you should also use your own due diligence to check the provenance and quality of your data.

Your dataset might have various problems that can lead to your model learning the wrong thing.

Data

In this case, your dataset suffers from class imbalance.

A more subtle example is the equipment used to capture the data.

Machine learning datasets can have all kinds of such biases.

ensemble learning

The quantity of data is also an important issue.

see to it your data is available in enough abundance.

But if you want to develop adeep neural networkwith millions of parameters, youll need much more training data.

electronic brain with magnifying glass

Machine learning engineers usually put aside part of their data to test the trained model.

Dont allow test data to leak into the training process, he warns.

In this case, a validation set can be useful.

machine learning data charts

But not every problem needs deep learning.

In fact, not every problem even needs machine learning.

Generally speaking, theres no such thing as a single best ML model, Lones writes.

federated learning

The first thing you should check is whether your model matches your problem throw in.

Data types (e.g., tabular data, images, unstructured text, etc.)

can also be a defining factor in the class of model you use.

One important point Lones makes in his paper is the need to avoid excessive complexity.

Lones also warns against trying to reinvent the wheel.

In such cases, the wise thing to do would be to examine their work.

To ignore previous studies is to potentially miss out on valuable information, Lones writes.

But not all academic work will remain confined in research labs.

You must design machine learning models that canwork in resource-constrained environments.

Another problem you might face isthe need for explainability.

In such cases, using a black-box model might be impossible.

As a machine learning engineer, you might not have precise knowledge of the requirements of your model.

For example, many ML engineers use the accuracy test to rate their models.

The accuracy test measures the percent of correct predictions the model makes.

This number can be misleading in some cases.

For example, consider a dataset of x-ray scans used to train a machine learning model for cancer detection.

If used in a real-world utility, this model can lead to missed cases with disastrous outcomes.

More recent techniques can provide a detailed measure of a models performance in various areas.

Based on the tool, the ML developers might also want to measure several metrics.

Lones also warns not to overestimate the capabilities of your models in your reports.

Transparency can also contribute greatly to other ML research.

Finally, aim for reproducibility.

Some include data privacy and security, user consent, and regulatory constraints.

Many a company has fallen into trouble for mining user data without their consent.

Another important matter that ML engineers often forget in applied options is model decay.

Unlike academic research, machine learning models used in real-world applications must be retrained and updated regularly.

As everyday data changes, machine learning models decay and their performance deteriorates.

Finally, integration challenges will be an important part of every applied machine learning project.

How will your machine learning system interact with other applications currently running in your organization?

Is your data infrastructure ready to be plugged into the machine learning pipeline?

Does your cloud or server infrastructure support the deployment and scaling of your model?

These kinds of questions can make or break the deployment of an ML product.

But their launch failed because their servers couldnt scale to the user demand.

The Codex Challenge servers are currently overloaded due to demand (Codex itself is fine though!).

Team is fixing… just stand by.

you’ve got the option to read the original articlehere.

Also tagged with