lang: zh
slug: huggingface-transformers-guide
title: ‘Hugging Face Transformers: The Complete Developer’’s Guide (2025)’
description: ‘Master Hugging Face Transformers in 2025. Learn pipeline API, model fine-tuning, tokenization, optimization, and deployment with practical code examples.’
tags: [“guide”, “open-source”, “reference”, “tutorial”]
date: 2026-05-18 00:00:00+08:00
lastmod: 2026-05-18 00:00:00+08:00
tech_stack: []
application_domain: Llm Frameworks
source_version: ’'
licensing_model: Open Source
license_type: MIT
file_size: ’'
file_md5: ’'
download_url: ’'
backup_url: ’'
github_repo: ’'
last_maintained: ‘2026-05-18’
draft: false
aliases:
/posts/huggingface-transformers-guide/
faqs:
q: ‘How do I fix CUDA out of memory errors when training Hugging Face models?’
a: ‘Reduce the batch size (down to 1 if needed), enable gradient accumulation to simulate larger batches, and turn on gradient checkpointing with model.gradient_checkpointing_enable(). You can also apply LoRA or quantization, and clear unused memory with torch.cuda.empty_cache().’
q: ‘What is the difference between the Hugging Face Hub and the Transformers library?’
a: ‘The Transformers library is the Python code you install with pip install transformers, providing model implementations, training utilities, and inference APIs. The Hugging Face Hub is the web platform at huggingface.co that hosts models, datasets, and spaces, which you download from using the Transformers library.’
q: ‘Which Hugging Face model architecture should I use for my task?’
a: ‘Use encoder-only models like BERT, RoBERTa, or DistilBERT for classification, NER, and similarity. Use decoder-only models like GPT, LLaMA, or Mistral for text generation and completion. Use encoder-decoder models like T5, BART, or PEGASUS for translation and summarization.’
q: ‘How does LoRA make fine-tuning large models cheaper?’
a: ‘LoRA (Low-Rank Adaptation) freezes the base model and trains only small adapter matrices, reducing trainable parameters by about 99% while retaining 95%+ of full fine-tuning quality. This lets you fine-tune models as large as 70B on consumer GPUs, available through the PEFT library.’
q: ‘Can I use Hugging Face models commercially?’
a: ‘Most models on the Hub carry permissive licenses such as Apache 2.0 or MIT that allow commercial use, and the Transformers library itself is free under Apache 2.0. However, some models, particularly those derived from Meta’’s LLaMA family, have restrictive licenses, so always verify the license field on the model card before deploying commercially.’
{</* resource-info */>}If you work with natural language processing or large language models in 2025, you use Hugging Face Transformers. The library has become the standard infrastructure for the entire NLP field — powering everything from research prototypes at Stanford and MIT to production systems at Google and Microsoft. Over 500,000 pretrained models sit on the Hugging Face Hub, downloaded collectively more than 100 million times per month.This guide takes you from the basics of the Transformers library to production deployment. Whether you are fine-tuning BERT for sentiment analysis or serving GPT-style models at scale, you will find concrete code examples, performance benchmarks, and troubleshooting advice based on real production experience.
Source: HuggingFace documentation-images## What is Hugging Face Transformers?Hugging Face Transformers is an open-source Python library that provides pre-trained transformer models and tools for natural language processing, computer vision, audio processing, and multimodal tasks. Originally focused on NLP, the library now supports tasks across virtually every modality of machine learning.The library’s core value proposition is simple: download a state-of-the-art model in one line of code and use it for your task in five. This accessibility democratized transformer models — what once required a PhD and months of engineering now takes minutes.### The Hugging Face Ecosystem: Hub, Datasets, AccelerateTransformers does not stand alone. It is part of a broader ecosystem of tools:| Tool | Purpose | Why It Matters |
|—
featureImage: /images/articles/hugging-face-transformers-완벽-가이드-2025-개발.png
—|———|—————-|
| Transformers | Pre-trained models and training APIs | Core model library |
| Hub | Model and dataset hosting | 500,000+ models available instantly |
| Datasets | Standardized dataset library | 20,000+ datasets ready for training |
| Accelerate | Simplified distributed training | Run on multi-GPU or TPU without code changes |
| PEFT | Parameter-efficient fine-tuning | Fine-tune 70B models on consumer GPUs |
| TRL | Reinforcement learning from human feedback | Train models with RLHF |This integrated toolchain means you can go from idea to fine-tuned model without leaving the Hugging Face ecosystem.### Why It is the Most Popular NLP LibrarySeveral factors drove Transformers to dominance. The library abstracts every major transformer architecture — BERT, GPT, T5, LLaMA, Mistral — behind a unified API. It supports PyTorch, TensorFlow, and JAX, so you are not locked into a single framework. The Hugging Face Hub creates a social network for models, where researchers publish and the community votes on the best checkpoints.Crucially, the library handles the messy details: tokenization alignment, attention mask construction, padding strategies, and model-specific quirks. You focus on your task; Transformers handles the infrastructure.## Key Features and Capabilities### 500,000+ Pretrained ModelsThe Hugging Face Hub hosts models for virtually every NLP task and many beyond. The catalog includes foundation models like BERT, GPT-2, T5, and LLaMA; fine-tuned variants for specific languages and domains; and experimental architectures from recent papers. You can search by task, language, library, and license.### Support for 200+ LanguagesWhile early transformer models focused on English, the Hub now contains strong models for Arabic, Chinese, Hindi, Japanese, and hundreds of other languages. Multilingual models like XLM-RoBERTa and mBERT handle 100 languages in a single checkpoint.### PyTorch, TensorFlow, and JAX CompatibilityTransformers supports all three major deep learning frameworks. Most new models ship with PyTorch implementations first, but TensorFlow and JAX support follows quickly. You can even export models to ONNX for inference in other runtimes.## Installation and Environment SetupGetting started requires minimal setup. You need Python 3.8+ and a basic understanding of deep learning concepts.### Installing Transformers and Dependencies```
bas
h
pip install transformers
pip install torch # or tensorflow, or flax
orthe fullecosystemexperience:```bashpip installtransformersdatasetsacceleratepefttrl```###Se```bashpip installtransformersdatasetsacceleratepefttrl```ainingand large-scaleinference.InstalltheCUDA-compatiblePyTorchversion:```bashpip installtorch--index-urlhttps://download.pytorch.org/whl/cu121```VerifyGPU availability:```pythonimporttorch```bashpip installtorch--index-urlhttps://download.pytorch.org/whl/cu121```e_name(0))#YourGPUmodel```###UsingGoogleColabforFreeGPUAccessIfyoulacklocalGPUr```pythonimporttorchprint(torch.cuda.is_available())#ShouldprintTrueprint(torch.cuda.get_device_name(0))#YourGPUmodel``` >Changeruntimetype>GPU)andinstallthelibrariesinthefirstcell. For larger models, Colab Pro ($9.99/month) offers more memory and faster GPUs.## The Pipeline API: The Easiest Way to StartThe pipeline API is Transformers' highest-level interface. It handles tokenization, model inference, and output parsing in a single function call. Here is how to perform common NLP tasks:### Text Classification```pythonfrom transformersimportpipelineclassifier=pipeline("sentiment-analysis")result=classifier("Thismoviewasabsolutelyfantastic!")# [{'label': 'POSITIVE', 'score': 0.9998}]
```###NamedEntityRecognition(NER)```pythonner =pipeline("ner",aggregation_strategy="simple")result=ner("AppleInc.wasfounded```pythonfrom transformersimportpipelineclassifier = pipeline("sentiment-analysis")
result=classifier("Thismoviewasabsolutelyfantastic!")# [{'label': 'POSITIVE', 'score': 0.9998}]
```sthe capitalofFrance?", context="Paris is the capital and largest city of France.")# {'answer': 'Paris', 'score': 0.99}
```###TextGeneration```pythongenerator=pipeline("text-generation",model="gpt2")result```pythonner =pipeline("ner",aggregation_strategy="simple")result=ner("Apple Inc. was founded by Steve Jobs in California.")# [{'entity_group': 'ORG', 'word': 'Apple Inc.'}, ...]
```summarizer(long_article_text,max_length=130,min_length=30)```###Translation```pythontranslator=pipeline("translation_en_to_de",model="t5-base")result=translator("Hello, how are you?")```The```pythonqa = pipeline("question-answering")result=qa( question="What is the capital of France?", context="Paris is the capital and largest city of France.")# {'answer': 'Paris', 'score': 0.99}
```uctionapplications,usetheAutoclassesdirectly.Thisgivesyoucontrolover batching, device placement, and output processing.### Loading Models and Tokenizers```pythonfrom transformersimportAutoModel,AutoTokenizer```pythongenerator=pipeline("text-generation",model="gpt2")result=generator("The future of AI is",max_length=30,num_return_sequences=1)```Model` familyautomaticallyselectsthecorrectarchitecturebasedonthemodel name. For specific tasks, use specialized classes:```pythonfrom transformersim```pythonsummarizer=pipeline("summarization",model="facebook/bart-large-cnn")result=summarizer(long_article_text,max_length=130,min_length=30)```s(sentiment,NER)cls_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")# Text generation (GPT-style)
gen_model=AutoModelForCau```pythontranslator=pipeline("translation_en_to_de",model="t5-base")result=translator("Hello, how are you?")```ained("t5-small")```###ModelClasses:Encoder,Decoder,Encoder-DecoderUnderstandingtransformer architecture types helps you choose the right model:| Architecture | Examples | Best For ||-------------|----------|----------|| **Encoder-only**|BERT,RoBERTa,DistilBERT|Classification,NER,similarity || **Decoder-only**|GPT,LLaMA,Mistral|Textgeneration,completion|| **Encoder-Decoder**|T5,BART,PEGASUS|Translation,summarization|### Understanding Model Configur```pythonfrom transformersimportAutoModel,AutoTokenizermodel_name="bert-base-uncased"tokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModel.from_pretrained(model_name)```config= AutoConfig.from_pretrained("bert-base-uncased")print(config.num_hidden_layers)#12print(config.hidden_size)#768```###SavingandLoadingModelsLocally```python# Save
model.save_pretrained("./my_model")tokenizer.save_pretrained("./my_model")#Loadfromlocalpathmodel=AutoModel.from_pretrained("./my_mod```pythonfrom transformersimportAutoModelForSequenceClassificationfrom transformersimportAutoModelForCausalLMfrom transformersimportAutoModelForSeq2SeqLM# Classification tasks (sentiment, NER)
cls_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
# Text generation (GPT-style)
gen_model=AutoModelForCausalLM.from_pretrained("gpt2")# Translation, summarization (T5, BART)
seq2seq_model=AutoModelForSeq2SeqLM.from_pretrained("t5-small")```dPiec
e, BPE,andSentencePieceAlgorithmsDifferenttokenizersusedifferentalgorithms:| Algorithm | Used By | Approach ||-----------|---------|----------|| **WordPiece**|BERT,DistilBERT|Greedysubwordmerging|| **BPE**|GPT,RoBERTa|Mergesmostfrequentpairs|| **SentencePiece**|T5,LLaMA|Language-agnosticcharacter-level|| **Unigram**|Albert,XLNet|Probabilisticsubwordpruning|###Working with the Tokenizer API```pythontext ="Hello,Transformers!"tokens=tokenizer.tokenize(text)# ['hello', ',', 'transform', '##ers', '!']ids = tokenizer.encode(text)
# [101, 7592, 117, 19082, 1168, 106, 102]# Full encoding with attention mask
encoded=tokenizer(text,padding=True,truncation=True,max_length=512, return_tensors="pt")# encoded['input_ids'], encoded['attention_mask']
```###HandlingSpecialTokensandPaddingTokenizersaddspecialtokensautomatically: `[CLS]` (classification start), `[SEP]` (separator), `[PAD]` (padding). The attention mask tells the model which tokens are real versus padding. Always pass the attention mask to avoid incorrect re```pythonfrom transformersimportAutoConfigconfig = AutoConfig.from_pretrained("bert-base-uncased")
print(config.num_hidden_layers)#12print(config.hidden_size)#768```tly.### PreparingYourDatasetTheDatasetslibrarysimplifiesdatapreparation:```pythonfrom datasetsimportload_datasetdataset=load_dataset("imdb")# dataset['train'], dataset['test'] with 'text' an```
python# Save
model.save_pretrained("./my_model")tokenizer.save_pretrained("./my_model")# Load from local path
model=AutoModel.from_pretrained("./my_model")```tion, batched=True)```###UsingtheTrainerAPITheTrainerclasshandlestrainingloops,evaluation, checkpointing, and logging:```pythonfrom transformersimportTrainer,TrainingArgumentstraining_args=TrainingArguments( output_dir="./results", learning_rate=2e-5, per_device_train_batch_size=16, num_train_epochs=3, weight_decay=0.01, evaluation_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True,)trainer=Trainer( model=model, args=training_args, train_dataset=tokenized["train"], eval_dataset=tokenized["test"],)trainer.train()```###CustomTrainingLoopswithPyTorchForfullcontrol,writeyourown training loop:```pythonfrom torch.optimimportAdamWfrom torch.utils.dataimportDataLoaderoptimizer=AdamW(model.parameters(), lr=5e-5)train_loader=DataLoader(tokenized["train"],batch_size=8,shuffle=True)model.train()for epochinrange(3): forbatchintrain_loader:optimizer.zero_grad()outputs=model(**batch)loss=outputs.lossloss.backward```pythontext ="Hello,Transformers!"tokens=tokenizer.tokenize(text)# ['hello', ',', 'transform', '##ers', '!']
ids =tokenizer.encode(text)# [101, 7592, 117, 19082, 1168, 106, 102]
# Full encoding with attention mask
encoded=tokenizer(text,padding=True,truncation=True,max_length=512, return_tensors="pt")# encoded['input_ids'], encoded['attention_mask']
```pythonfrom transformersimportAutoModelForCausalLM,DataCollatorForLanguageModelingmodel = AutoModelForCausalLM.from_pretrained("gpt2")data_collator=DataCollatorForLanguageModeling(tokenizer,mlm=False)# Use with Trainer for causal language modeling
```###UsingLoRAforEfficientFine-TuningFullfine-tuningupdatesbillions of parameters, requiring massive GPU memory. LoRA (Low-Rank Adaptation) freezes the base model and trains small adapter matrices, reducing trainable parameters by 99% while maintaining 95%+ of full fine-tuning quality:```pythonfrom peftimportLoraConfig,get_peft_modellora_config=LoraConfig( r=16,#rank lora_alpha=32, target_modules=["q_proj","v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")model=get_peft_model(model,lora_config)model.print_trainable_parameters()# trainable params: 9M || all params: 7B || train```
pythonfrom datasetsimportload_datasetdataset=load_dataset("imdb")# dataset['train'], dataset['test'] with 'text' and 'label' columns
def tokenize_function(examples): returntokenizer(examples["text"],padding="max_length",truncation=True)tokenized=dataset.map(tokenize_function,batched=True)```t=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16)model=AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b", quantization_config=bnb_config)```###ModelDistillationDistillationtrainsasmaller"student"modelto mimic a larger "teacher." DistilBERT, for example, retains 97% of BERT's performance at 60% of the size and 40% faster inference.### ONNX Export a```pythonfrom transformersimportTrainer,TrainingArgumentstraining_args=TrainingArguments( output_dir="./results", learning_rate=2e-5, per_device_train_batch_size=16, num_train_epochs=3, weight_decay=0.01, evaluation_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True,)trainer=Trainer( model=model, args=training_args, train_dataset=tokenized["train"], eval_dataset=tokenized["test"],)trainer.train()``` DeployingwithHuggingFaceInferenceAPIForzero-infrastructuredeployment, use the [Hugging Face Inference API](https://huggingface.co/docs/api-inference):```pythonimportrequestsAPI_URL="https://api-inference.huggingface.co/models/bert-base-uncased"headers={"Authorization":f"Bearer {token}"}defquery(payload): response=requests.post(API_URL,headers=headers,json=payload) returnresponse.json()result=query({"inputs":"Theanswertolifeis [MASK]."})```###LocalDeploymentwithTransformersForproduction,servemodelswith T```pythonfrom torch.optimimportAdamWfrom torch.utils.dataimportDataLoaderoptimizer=AdamW(model.parameters(),lr=5e-5)train_loader=DataLoader(tokenized["train"],batch_size=8,shuffle=True)model.train()for epochinrange(3): forbatchintrain_loader:optimizer.zero_grad()outputs=model(**batch)loss=outputs.lossloss.backward()optimizer.step()```--------|------|| **BERT-base-uncased**|Encoder|Classification,NER|110M|| **RoBERTa-large**|Encoder|Classificationbenchmarks|355M|| **GPT-2**|Decoder|Textgeneration,prototyping|124M-1.5B|| **T5-base**|Encoder-Decoder|Translation,summarization|220M|| **BART-large**|Encoder-Decoder|Summarization,generation|400M|| **XLM-RoBERTa**|Encoder|Multilingualtasks|270M|| **LLaMA-3-8B**|Decoder|Generalpurpose,localdeployment|8B```pythonfrom transformersimportAutoModelForSequenceClassificationmodel=AutoModelForSequenceClassification.from_pretrained( "bert-base-uncased", num_labels=2)# Then train with Trainer or custom loop
```latelargerbatches- Enablegradientcheckpointing:`model.gradient_checkpointing_enable()`- UseLoRAorfullquantization- ClearCUDAcache:`torch.cuda.empty_cache()`###ModelCompatibilityIssuesAlways check the model card on Hugging Face Hub. ```pythonfrom transformersimportAutoModelForCausalLM,DataCollatorForLanguageModelingmodel=AutoModelForCausalLM.from_pretrained("gpt2")data_collator=DataCollatorForLanguageModeling(tokenizer,mlm=False)# Use with Trainer for causal language modeling
```ontextlengths.BERThandles512tokens;GPT-2handles1,024;modernLLaMAmodels handle up to 128,000. For long documents:- Use models with longer contexts (RoBERTa: 512, Longformer: 4096)- Applyslidingwindowapproaches- Splitdocumentsandaggregatepredictions##FrequentlyAskedQuestions### Is Hugging Face Transformers free to use?Yes, the Transformers library is completely free and open-source under the Apache 2.0 license. You can use it commercially without restrictions. Individual models may have their own licenses — always check```pythonfrom peftimportLoraConfig,get_peft_modellora_config=LoraConfig( r=16,#rank lora_alpha=32, target_modules=["q_proj","v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")model=get_peft_model(model,lora_config)model.print_trainable_parameters()# trainable params: 9M || all params: 7B || trainable%: 0.13
```plementations, trainingutilities,andinferenceAPIs.TheHuggingFaceHubisawebplatform (huggingface.co) that hosts models, datasets, and spaces. You download models from the Hub using the Transformers library — they work together but are separate things.### Can I use Hugging Face models commercially?Most models on the Hub carry permissive licenses (Apache 2.0, MIT) that allow commercial use. However, some models — particularly those derived from LLaMA — have restrictive licenses. Always verify the license field on the model card be```pythonfrom transformersimportBitsAndBytesConfigimporttorchbnb_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16)model=AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b", quantization_config=bnb_config)```glishclassification, `distilbert-base-uncased` is a reliable default. For generation, `Mistral-7B-Instruct` offers excellent quality at a manageable size. Check the model card's benchmark scores and community downloads — popular models with high ratings are usually safe choices.### Does Hugging Face support fine-tuning on custom datasets?Absolutely. The Datasets library makes it easy to load custom data from CSV, JSON, Parquet, or text files. Combine it with the Trainer API or PyTorch training loops to fine-tune any model on your data. PEFT methods like LoRA make fine-tuning feasible even on c```
pythonfrom transformersimportAutoModelForSequenceClassificationimporttorchmodel=AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")torch.onnx.export( model, (torch.zeros(1,128,dtype=torch.long),),#dummyinput "model.onnx", input_names=["input_ids"], output_names=["logits"], dynamic_axes={"input_ids":{0:"batch",1:"sequence"}})```napplications.Fine-tunewiththeTrainerAPIwhenyouhavelabeleddata. Optimize with quantization and ONNX when you need speed. The library scales with your needs from prototype to production.For continued learning, follow the [official documentation](https://huggingface.co/docs/transformers), explore the [Hugging Face Hub](https://huggingface.co/models), and join the community forums. The field moves fast, and the Hub is where new breakthroughs appear first.---## Recommended InfrastructureTo run any of the tools above reliably 24/7, infrastruct```pythonimportrequestsAPI_URL="https://api-inference.huggingface.co/models/bert-base-uncased"headers={"Authorization":f"Bearer {token}"}def query(payload): response=requests.post(API_URL,headers=headers,json=payload) returnresponse.json()result=query({"inputs":"The answer to life is [MASK]."})```ion-proven.*Affiliatelinks—noextracosttoyou,helpskeepdibi8.comrunning.*<!--auto-references-->## References & Sources- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [HuggingFaceDatasets](https://github.com/huggingface/datasets)- [HuggingFaceAccelerate](https://github.com/huggingface/accelerate)- [PEFT](https://github.com/huggingface/peft)- [TRL](https://github.com/huggingface/trl)- [TextGenerationInference(T```pythonfrom fastapiimportFastAPIfrom transformersimportpipelineapp =FastAPI()pipe =pipeline("text-classification",model="distilbert-base-uncased")@app.post("/classify")def classify(text:str): returnpipe(text)[0]```x/onnx)- [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes)- [FastAPI](https://github.com/fastapi/fastapi)
💬 留言讨论