Within the transformers library, auto lessons are a key design that lets you use pre-trained fashions with out having to fret concerning the underlying mannequin structure. It makes your code extra concise and simpler to take care of. For instance, you’ll be able to simply change between totally different mannequin architectures by simply altering the mannequin identify; even the code to run the mannequin is vastly totally different. On this put up, you’ll learn the way auto lessons work and learn how to use them in your code.
Let’s get began!

Utilizing Auto Courses within the Transformers Library
Picture by Erik Mclean. Some rights reserved.
Overview
This put up is split into three components; they’re:
- What Is Auto Courses
- Learn how to Use Auto Courses
- Limitations of the Auto Courses
What Is Auto Courses
There is no such thing as a class referred to as “AutoClass” within the transformers library. As an alternative, a number of lessons are named with the “Auto” prefix.
In transformer fashions for pure language processing, you’ll begin with some textual content. That you must convert the textual content into tokens after which convert the tokens into token IDs. The token IDs are then fed into the mannequin to get the output. The output must be transformed again to textual content.
On this course of, you will have a tokenizer and the primary mannequin. Relying on the duty, comparable to textual content classification or query answering, you could use totally different variants of the identical mannequin. They’re the identical on the core, however they’ll use a special “head” to do the duty.
Given the workflow is standardized at a excessive degree, the one distinction is how precisely a mannequin must be operated. There are dozens of mannequin architectures within the library. You aren’t going to know all of them intimately. However if you happen to do, you’ll be able to write code like the next:
import torch from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
model_name = “KernAI/stock-news-distilbert” tokenizer = DistilBertTokenizer.from_pretrained(model_name) mannequin = DistilBertForSequenceClassification.from_pretrained(model_name)
textual content = “Machine Studying Mastery is a pleasant web site.” inputs = tokenizer(textual content, return_tensors=“pt”) with torch.no_grad(): logits = mannequin(**inputs).logits predicted_class_id = logits.argmax().merchandise() |
To start with, this isn’t essentially the most verbose method to make use of a mannequin. Within the transformers library, you’ll be able to outline a naked DistilBertTokenizer
object after which load the vocabulary from information, outline the particular tokens, and different guidelines, comparable to whether or not to power all letters to lowercase. Secondly, making a DistilBertForSequenceClassification
object ought to first create a config object DistilBertConfig
that defines the hyperparameters of the mannequin. Then you’ll be able to load the weights from a checkpoint. However you’ll be able to think about that’s lots of work.
Within the above, you already simplified the workflow through the use of the from_pretrained()
technique. That is to obtain a pre-trained mannequin from the web, during which the config and the corresponding tokenizer parameters are enclosed. Nonetheless, the code above arrange the mannequin first after which loaded the weights and parameters. It assumes that the downloaded mannequin information are suitable with the structure. For instance, the mannequin might count on a parameter referred to as hidden_size
, and the downloaded file should not name it hidden_dim
.
Remembering the identify of the category for every structure of the mannequin isn’t straightforward. Due to this fact, the auto lessons are designed to cover such complexity.
Learn how to Use Auto Courses
Take DistilBERT for example, there are a number of variations. Firstly, there are PyTorch, TensorFlow, and Flax implementations of the very same mannequin. Secondly, DistilBERT is the identify of the bottom mannequin. On high of it, you’ll be able to add a special “head” for varied duties. You will get:
- the bottom mannequin (
DistilBertModel
) that outputs the uncooked hidden states, - a mannequin for masked language modeling (
DistilBertForMaskedLM
), which predicts what the masked token must be, - a mannequin for sequence classification (
DistilBertForSequenceClassification
), which is used to label your entire enter into predefined classes, - a mannequin for query answering (
DistilBertForQuestionAnswering
), which is used to search out solutions to the desired questions from the supplied context, - a mannequin for token classification (
DistilBertForTokenClassification
), which is used to categorise every token right into a class, - a mannequin for a number of alternative duties (
DistilBertForMultipleChoice
), which compares the a number of solutions to a query and scores the probability of every reply.
These are all the identical base mannequin however with totally different heads. This isn’t an unique listing of various variants as a result of some base fashions might have a head that’s not out there in DistilBERT, and a few base fashions might not have the top that DistilBERT has.
So long as you know the way to make use of the mannequin for a specific job, you’ll be able to simply change to a different mannequin. For instance, the code under runs nice with none error:
import torch from transformers import GPT2Tokenizer, OPTForSequenceClassification
model_name = “ArthurZ/opt-350m-dummy-sc” tokenizer = GPT2Tokenizer.from_pretrained(model_name) mannequin = OPTForSequenceClassification.from_pretrained(model_name)
textual content = “Machine Studying Mastery is a pleasant web site.” inputs = tokenizer(textual content, return_tensors=“pt”) with torch.no_grad(): logits = mannequin(**inputs).logits predicted_class_id = logits.argmax().merchandise() |
Disregard the output, this code solely modified the identify of the tokenizer and the mannequin. That’s the results of the standardized interfaces of the transformers library. However have a look at the above code: That you must know that the mannequin saved as “ArthurZ/opt-350m-dummy-sc” is utilizing the structure OPTForSequenceClassification
(in all probability you’ll be able to guess it from the identify). You additionally must know that the tokenizer is GPT2Tokenizer
(in all probability you received’t have the ability to guess it from the identify, however you’ll be able to determine it out from the documentation).
It could be rather more handy if you happen to might simply change the mannequin identify, and the code will work. That’s the place the auto lessons are available in. The code would be the following:
import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = “ArthurZ/opt-350m-dummy-sc” # or “KernAI/stock-news-distilbert” tokenizer = AutoTokenizer.from_pretrained(model_name) mannequin = AutoModelForSequenceClassification.from_pretrained(model_name)
textual content = “Machine Studying Mastery is a pleasant web site.” inputs = tokenizer(textual content, return_tensors=“pt”) with torch.no_grad(): logits = mannequin(**inputs).logits predicted_class_id = logits.argmax().merchandise() |
You used AutoTokenizer
and AutoModelForSequenceClassification
as an alternative. Now, once you change the mannequin identify, the code will work. It is because the auto lessons will mechanically obtain the mannequin and examine its config file. Then, based mostly on what’s specified within the config file, it is going to instantiate the right tokenizer and mannequin—all with out your enter.
Word that the instance above is utilizing PyTorch. You requested the tokenizer to provide you a PyTorch tensor, and the mannequin itself is a PyTorch one. That is the default within the transformers library. However you’ll be able to create a TensorFlow/Keras equal if the mannequin helps, witha slight modification of the code:
import tensorflow as tf from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
model_name = “KernAI/stock-news-distilbert” tokenizer = AutoTokenizer.from_pretrained(model_name) mannequin = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
textual content = “Machine Studying Mastery is a pleasant web site.” inputs = tokenizer(textual content, return_tensors=“tf”) logits = mannequin(**inputs).logits predicted_class_id = tf.math.argmax(logits).numpy() |
You may attempt with the opposite mannequin, “ArthurZ/opt-350m-dummy-sc”, and it’s best to see an error. It is because the category OPTForSequenceClassification
doesn’t have the counterpart TFOPTForSequenceClassification
.
Limitation of the Auto Courses
There are a lot of auto lessons within the transformers library. For the NLP duties, some examples are AutoModel
, AutoModelForCausalLM
, AutoModelForMaskedLM
, AutoModelForSequenceClassification
, AutoModelForQuestionAnswering
, AutoModelForTokenClassification
, AutoModelForMultipleChoice
, AutoModelForTextEncoding
, and AutoModelForNextSentencePrediction
. Word that every of those is for a special job (i.e., totally different head on high of a base mannequin), and never all are supported by any mannequin. For instance, within the earlier part, you discovered that there are DistilBertForMaskedLM
, and therefore you’ll be able to create one utilizing AutoModelForMaskedLM
and a DistilBERT mannequin identify, however you can’t create a DistilBERT mannequin utilizing AutoModelForCausalLM
as a result of there may be not a DistilBertForCausalLM
class.
Additionally, word that you will note a warning with the next code:
from transformers import AutoModelForSequenceClassification
model_name = “distilbert-base-uncased” mannequin = AutoModelForSequenceClassification.from_pretrained(model_name) |
You will notice the next warning:
Some weights of DistilBertForSequenceClassification weren’t initialized from the mannequin checkpoint at distilbert-base-uncased and are newly initialized: [‘classifier.bias’, ‘classifier.weight’, ‘pre_classifier.bias’, ‘pre_classifier.weight’] You need to in all probability TRAIN this mannequin on a down-stream job to have the ability to use it for predictions and inference. |
It is because the mannequin identify “distilbert-base-uncased” accommodates solely the bottom mannequin. Its config is ample to create all types of fashions beneath the DistilBERT household as a result of their variations are within the heads solely. Nonetheless, a base mannequin doesn’t have the weights for the particular head. If you instantiate a mannequin and attempt to load the weights, the library will discover that some layers will not be initialized, which then can solely use the random weights as a placeholder. This additionally implies that the mannequin isn’t working for what you count on but. You both want to coach the mannequin with your personal dataset, or load the weights from a special mannequin, comparable to “KernAI/stock-news-distilbert” within the earlier instance.
The second limitation of the auto lessons is that it’s a wrapper round a deep studying mannequin. That’s, it expects a numerical tensor and outputs a numerical tensor. That’s why you must use a tokenizer within the examples above. If you do not want to govern these tensors however simply use the mannequin for a job, you’ll be able to additional simplify the code through the use of the pipeline()
perform:
import torch from transformers import pipeline
model_name = “KernAI/stock-news-distilbert” classifier = pipeline(mannequin=model_name)
textual content = “Machine Studying Mastery is a pleasant web site.” prediction = classifier(textual content) print(prediction) |
This instance truly does greater than any instance above. It interprets the outcome from the mannequin and offers you a human-readable output. You may see its output to be:
[{‘label’: ‘positive’, ‘score’: 0.9953118562698364}] |
Additional Readings
Beneath are some additional readings that you could be discover helpful.
Abstract
On this put up, you discovered learn how to use the auto lessons within the transformers library. It’s a alternative for the particular mannequin lessons so that you just let the library determine the right lessons to make use of based mostly on the mannequin config. This lets you simply change between totally different fashions or checkpoints by simply altering the identify or path with none code adjustments. Utilizing auto lessons is one step extra verbose than utilizing the pipeline API, however it saves you from the headache of determining the right lessons to make use of.
Source link