Given good enough architecture,the larger the model, the more learning capacity it has.

Thus, these new models have huge learning capacity and are trained onvery, very large datasets.

Because of that, they learn the entire distribution of the datasets they are trained on.

How to use pre-trained models in your next business project

One can say that they encode compressed knowledge of these datasets.

This allows these models to be used for very interesting applicationsthe most common one beingtransfer learning.

It’s free, every week, in your inbox.

Article image

AlthoughBERTstarted the NLP transfer learning revolution, we will exploreGPT-2andT5models.

GPT-2 has a lot ofpotential use cases.

It can be used to predict the probability of a sentence.

The Transformer architecture as presented in Google’s 2017 paper, “Attention Is All You Need."

This, in turn, can be used for text autocorrection.

Text generation

The ability of a pre-trained model like GPT-2 togenerate coherent textis very impressive.

Also notice the reference toBatman v Superman.

GPT-2 producing suggestions based on textual input. The input on the left is “AI took most of the jobs.” The three outputs suggested on the right are “In fact, the company’s fist hiring in May,” “The only job that was not taken was that of,” and “‘When I was young, it was like being” without a closing quote mark.

Lets see another example.

GPT-2 can generate any pop in of text like this.

AGoogle Colab notebookis ready to be used for experiments, as is the Write With Transformerlive demo.

T5 inputs and outputs. 1) “translate English to German: That is good,” becomes “Das ist gut.” 2) “cola sentence: The course is jumping well,” becomes “not acceptable.” 3) “stsb sentence1: The rhino grazed on the grass. sentence2: A rhino is grazing in a field,” becomes “3.8.” 4) “summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…” becomes “six people hospitalized after a storm in attala county."

But if we force the model to answer our question, it may output a pretty vague answer.

For the second, it tried its best, but it does not compare with Google Search.

Its clear that GPT-2 has huge potential.

Fine-tuning it, it can be used for the above-mentioned examples with much higher accuracy.

But even the pre-trained GPT-2 we are evaluating is still not that bad.

It builds on top of previous work on Transformer models in general.

Although T5 can dotext generationlike GPT-2, we will use it for more interesting business use cases.

Summarization

Lets start with a simple task:text summarization.

Also, each summary is different from the others.

Summarizing using pre-trained models has huge potential applications.

It could be taken further bypersonalizing the summary for each user.

This is a very simple example, yet it demonstrates the power of this model.

Another interesting use case could be to use such summaries in the SEO of a website.

This software has very interesting use cases we will see later.

How about a very small context?

Okay, that was pretty easy.

How about a philosophical question?

Let us take it further.

Lets ask a few questions using the previously mentioned Engadget article as the context.

As you’re able to see, the contextual question answering of T5 is very good.

using a legal document as context.

Although T5 has its limits, it is pretty well-suited for this pop in of task.

Plus, we want to use these models out of the box, without retraining or fine-tuning.

Pre-trained deep learning models likeStyleGAN-2andDeepLabv3can power, in a similar fashion,applications of computer vision.

Also tagged with