Reinforcement learning is a subset of machine learning.

It enables an agent to learn through the consequences of actions in a specific environment.

It can be used to teach a robot new tricks, for example.

What the hell is reinforcement learning and how does it work?

It differs from other forms of supervised learning because the sample data set does not train the machine.

Instead, it learns by trial and error.

Therefore, a series of right decisions would strengthen the method as it better solves the problem.

Reinforced learning is similar to what we humans have when we are children.

By exploiting research power and multiple attempts, reinforcement learning is the most successful way to indicate computer imagination.

Unlike humans, artificial intelligence will gain knowledge from thousands of side games.

At the same time, a reinforcement learning algorithm runs on robust computer infrastructure.

An example of reinforced learning is the recommendation on Youtube, for example.

After watching a video, the platform will show you similar titles that you believe you will like.

However, suppose you start watching the recommendation and do not finish it.

When trained in Chess, Go, or Atari games, the simulation environment preparation is relatively easy.

The model must decide how to break or prevent a collision in a safe environment.

Transferring the model from the training setting to the real world becomes problematic.

Scaling and modifying the agents neural online grid is another problem.

There is no way to connect with the online grid except by incentives and penalties.

In other words, we must keep learning in the agents memory.

A hopper jumping like a kangaroo instead of doing what is expected of him is a perfect example.

Finally, some agents can maximize the prize without completing their mission.

The most famous must be AlphaGo and AlphaGo Zero.

However, the researchers tried a purer approach to RL training it from scratch.

The researchers left the new agent, AlphaGo Zero, to play alone and finally defeat AlphaGo 1000.

The four resources were inserted into the Deep Q-web link (DQN) to calculate the Q value.

The state-space was formulated as the current resource allocation and the resource profile of jobs.

The reward was the sum of (-1 / job duration) across all jobs in the system.

Robotics

There is an incredible job in the tool of RL in robotics.

We recommend reading this paper with the result of RL research in robotics.

The RGB images were fed into a CNN, and the outputs were the engine torques.

The RL component was policy research guided to generate training data from its state distribution.

The reconfiguration process can be formulated as a finite MDP.

The reward was defined as the difference between the intended response time and the measured response time.

The authors used the Q-learning algorithm to perform the task.

Auctions and advertising

Researchers at Alibaba Group published the article Real-time auctions with multi-agent reinforcement learning in display advertising.

Generally speaking, the Taobao ad platform is a place for marketers to bid to show ads to customers.

In the article, merchants and customers were grouped into different groups to reduce computational complexity.

One of RLs most influential jobs is Deepminds pioneering work to combine CNN with RL.

CNN with RLare other combinations used by people to try new ideas.

RNN is a jot down of neural data pipe that has memories.

When combined with RL, RNN offers agents the ability to memorize things.

They also usedLSTM with RLto solve problems in optimizing chemical reactions.

Deepmind showed how to usegenerative models and RLto generate programs.

Incredible, isnt it?

Conclusion: When should you use RL?

With each correct action, we will have positive rewards and penalties for incorrect decisions.

This article was written byJair Ribeiroand was originally published onTowards Data Science.

you could read ithere.

Also tagged with