Model Fine Tuning — LoRA & RAG
WIPHardware: Lenovo LOQ — RTX 3050 6GB VRAM
This page is under construction — full write-up coming soon.
A personal deep dive into model fine-tuning. Downloaded Qwen3 1.7B locally and fine-tuned it using LoRA (Low-Rank Adaptation). Currently implementing RAG (Retrieval-Augmented Generation) for further use. The primary goal was learning — understanding the full pipeline from model acquisition to fine-tuning to inference. More details covered in the related blog posts which are linked below.
LoRA (Low-Rank Adaptation)
LoRA fine-tunes models by training only a tiny subset of parameters — often just 1% — while freezing the rest. This drastically reduces VRAM requirements and makes fine-tuning possible on consumer GPUs.
RAG (Retrieval-Augmented Generation)
RAG enhances model responses by retrieving relevant external data at inference time, combining the model's training with up-to-date information without retraining.
training_lora.py
This script handles the LoRA fine-tuning process. It loads the base Qwen3 1.7B model, applies LoRA adapters, and trains on the generated dataset. Training outputs are saved to a separate weights folder.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Configure LoRA
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Training loop placeholder
# ... inference.py
The inference script loads the base model and applies the LoRA weights at runtime. This allows interacting with the fine-tuned model without permanently modifying the original weights.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "lora_weights/")
model.eval()
# Inference
prompt = "What are the key financials for AAPL?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Prompt completion example before fine-tuning
LoRA trained model responding to a prompt
Coming soon — RAG implementation in progress