Machine learning has an alarming threat: undetectable backdoors

This article is part of our coverage of the latest inAI research.

Thesecurity of machine learningis becoming increasingly critical as ML models find their way into a growing number of applications.

These models and services can become sources of attacks against the applications that use them.

The paper sheds light on the challenges of establishing trust in machine learning pipelines.

Machine learning backdoors are techniques that implant secret behaviors into trained ML models.

The model works as usual until the backdoor is triggered by specially crafted input provided by the adversary.

For example, an adversary can create a backdoor that bypasses a face recognition system used to authenticate users.

A simple and well-known ML backdooring method isdata poisoning.

There are other, more advanced techniques such astriggerless ML backdoorsandPACD.

A hidden backdoor uses a side neural network to verify the digital signature of the input

Most ML backdooring techniques come with a performance tradeoff on the models main task.

In their paper, the researchers define undetectable backdoors as computationally indistinguishable from a normally trained model.

This means that on any random input, the malign and benign ML models must have equal performance.

We had the idea of… studying issues that do not arise by accident, but with malicious intent.

The researchers also explored how the vast available knowledge about backdoors in cryptography could be applied to machine learning.

Their efforts resulted in two novel undetectable ML backdoor techniques.

The new ML backdoor technique borrows concepts fromasymmetric cryptographyand digital signatures.

Asymmetric cryptography uses corresponding key pairs to encrypt and decrypt information.

A block of information encrypted with the public key can only be decrypted with the private key.

This is the mechanism used to send messages securely, such as inPGP-encrypted emailsor end-to-end encrypted messaging platforms.

Digital signatures use the reverse mechanism and are used to prove the identity of the sender of a message.

Only the public key corresponding to your private key can decipher the message.

Therefore, a receiver can use your public key to decrypt the signature and verify its content.

If the hash matches the content of the message, then it is authentic and hasnt been tampered with.

Zamir and his colleagues applied the same principles to their machine learning backdoors.

If the input is signed, the backdoor is triggered.

If not, normal behavior will proceed.

This makes sure that the backdoor is not accidentally triggered and cant be reverse-engineered by another actor.

The signature-based ML backdoor is black-box undetectable.

In their paper, the researchers also present a backdoor technique that is white-box undetectable.

All of our backdoors constructions are very efficient, Zamir said.

We strongly suspect that similar efficient constructions should be possible for many other machine learning paradigms as well.

The researchers took undetectable backdoors one step further by making them robust to modifications to the machine learning model.

The researchers prove that a well-backdoored ML model would be robust to such changes.

This means that this is not just a heuristic, but a mathematically sound concern.

Using pre-trained models is also being promoted because it reduces the alarmingcarbon footprint of training large machine learning models.

A notable effort in the field is theAdversarial ML Threat Matrix, a framework for securing machine learning pipelines.

it’s possible for you to read the original articlehere.

Also tagged with#