Model Fine Tuning — LoRA & RAG

WIP

April 2026

Hardware: Lenovo LOQ — RTX 3050 6GB VRAM

⚙

This page is under construction — full write-up coming soon.

A personal deep dive into model fine-tuning. Downloaded Qwen3 1.7B locally and fine-tuned it using LoRA (Low-Rank Adaptation). Currently implementing RAG (Retrieval-Augmented Generation) for further use. The primary goal was learning — understanding the full pipeline from model acquisition to fine-tuning to inference. More details covered in the related blog posts which are linked below.

LoRA (Low-Rank Adaptation)

LoRA fine-tunes models by training only a tiny subset of parameters — often just 1% — while freezing the rest. This drastically reduces VRAM requirements and makes fine-tuning possible on consumer GPUs.

RAG (Retrieval-Augmented Generation)

RAG enhances model responses by retrieving relevant external data at inference time, combining the model's training with up-to-date information without retraining.

training_lora.py

This script handles the LoRA fine-tuning process. It loads the base Qwen3 1.7B model, applies LoRA adapters, and trains on the generated dataset. Training outputs are saved to a separate weights folder.

training_lora.py python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Training loop placeholder
# ...

View full file on GitHub →

inference.py

The inference script loads the base model and applies the LoRA weights at runtime. This allows interacting with the fine-tuned model without permanently modifying the original weights.

inference.py python

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "lora_weights/")
model.eval()

# Inference
prompt = "What are the key financials for AAPL?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

View full file on GitHub →

Prompt completion example before fine-tuning

LoRA trained model responding to a prompt

Coming soon — RAG implementation in progress

Links

GitHub →

LoRA (Low-Rank Adaptation)

RAG (Retrieval-Augmented Generation)

training_lora.py

inference.py

Related Blog Posts

Traversing the ML World Without a Map — Part 0

Traversing the ML World Without a Map — Part 1

Links