
Fine-Tuning GPT on Instagram DMs: Building an AI Clone of Myself (Part 1)
I fine-tuned an OpenAI model on years of my own Instagram DMs to build a personal AI chatbot — a version of me people could talk to while I'm asleep. This is Dev Log 1 of the series that became OmarAI. It covers the hard part: turning the Instagram export into a clean fine-tuning dataset without teaching the model to talk to itself. Part 2 covers dataset cleanup with a second LLM, Parts 3–5 cover memory, UI, and post-launch iteration.
TL;DR
- Goal: a fine-tuned GPT that replies like me.
- Data source: Instagram JSON export (
messages/inbox/<person>/message_1.json).- Key bug: the export lists messages newest-first, and I treated them as if oldest-first — reversing their order before pairing was the fix.
- Key insight: don't pair raw messages. Merge consecutive same-sender messages first, then pair.
Why build an AI clone of myself
We all know AI can't replace me — but I wanted something I could point people at to answer questions about me, in my voice, when I'm not around. This is the dev log for that project.
The whole series in order: Part 1 (this post) · Part 2 · Part 3 · Part 4 · Part 5.
Initial plan: fine-tune on my own messages
The first decision was the data source. I wanted text that sounded unambiguously like me — so Instagram DMs, which are the platform I use most. Instagram makes this easy: from Settings → Your activity → Download your information, you can request a JSON export of everything you've sent and received.
The catch is that the data needs heavy cleanup before OpenAI's fine-tuning endpoint will do anything useful with it.
The Instagram export structure
The folder tree looks like this:
messages/
inbox/
<person-name>/
message_1.json # sometimes more (message_2.json etc.)
photos/, videos/ # media folders
Inside message_1.json:
{
"participants": [
{ "name": "Omar Musayev" },
{ "name": "Abdul-Aziz Mammadli" }
],
"messages": [
{
"sender_name": "Abdul-Aziz Mammadli",
"timestamp_ms": 1712667823182,
"content": "Hey Omar, how's it going?"
},
{
"sender_name": "Omar Musayev",
"timestamp_ms": 1712667805807,
"content": "Not bad, just working on this cool AI project!"
}
]
}
Two things to notice: the timestamp_ms values are not in message-array order (Instagram stores messages newest-first in the export), and the content field often contains Unicode that needs normalizing.
Attempt 1 — pair every consecutive message
First thing I tried: walk the messages array and pair every adjacent pair as (prompt, response) — any pair that contained me was kept.
This immediately broke. When I texted, I often double-texted — two or three messages that formed one thought. Pairing raw messages produced a lot of (Omar, Omar) pairs which taught the model that I'm talking to myself was a valid conversational pattern. The resulting model had a distinct "split personality" feel.
Attempt 2 — detect and merge same-sender runs
Next: detect when the same sender sent consecutive messages, and merge those runs into a single turn before pairing. This felt correct, but I still had a subtle bug — I was iterating newest-first and treating the first message I saw as the prompt, so every pair was effectively reversed.
Attempt 3 — the one that actually worked
- Reverse the
messagesarray (so it's oldest-first). - Merge consecutive same-sender messages into single turns.
- Pair merged turns into
(prompt, response). - Keep only pairs where my merged response was the second turn.
- Format as OpenAI fine-tuning JSONL:
{"messages":[
{"role":"user","content":"Hey Omar, how's it going? also did you ever send me those notes"},
{"role":"assistant","content":"Not bad, just working on this cool AI project. yeah i'll send em rn"}
]}
Fine-tuning runs
I fine-tuned two models first to sanity-check the pipeline:
| Model | Personality signal | Typical response length | Notes |
|---|---|---|---|
| babbage-002 | Weak | 1–2 words | Too small to carry voice |
| gpt-4o-mini-2024-07-18 | Real but shallow | Short | Picked up my slang — "I mean," "kinda," "fr" |
The short-answer problem was the key diagnostic signal. I was giving the model short inputs in training, so it learned to give short outputs at inference. That's when I understood the data itself — not the model — was the bottleneck.
The fix that carried into Part 2
The next iteration of the pipeline does this:
- Merge all consecutive messages from the other person into one complete prompt.
- Merge all consecutive responses from me into one complete reply.
- Use these merged conversations to form better prompt-response pairs.
Turns end up multi-sentence. The model sees real prompts and produces real responses. I queued a gpt-4o-2024-08-06 fine-tune on the new data and went to sleep. That run hit a dataset-size limit — covered in Part 2 — which forced me to think harder about cleanup and triggered the GPT-3.5-as-curator trick.
Key takeaways
- Reverse the Instagram export before pairing. It's sorted newest-first.
- Merge consecutive same-sender messages before pairing, or you'll teach your model to talk to itself.
- Training input length sets output length. Short merged turns → short responses. Plan merges with the output you want in mind.
What's next
Part 2: Cleaning Chat Data for LLM Fine-Tuning with GPT-3.5 as a Curator — how I cut a 20k-pair dataset down to 2k high-quality English conversations by letting a cheaper model pick the best ones.
References
- OpenAI — Fine-tuning guide
- Instagram — Download your information
- Buildspace — Fine-tune GPT-3: build an AI that responds to your DMs
- Try OmarAI live
- Series: Part 1 (you are here) · Part 2 · Part 3 · Part 4 · Part 5
Frequently Asked Questions
How do I export my Instagram DMs for fine-tuning?
In Instagram, go to Settings → Your activity → Download your information. Choose JSON format. You'll get a zip that unpacks to a tree of messages/inbox/<person>/message_1.json files, one folder per conversation partner. Each JSON holds a 'messages' array with sender_name, timestamp_ms, and content.
Why do I need to merge consecutive messages from the same person?
Most people — especially teenagers — send multiple short messages as one thought. If you pair raw messages as (prompt, response), you'll end up with many 'Omar → Omar' turns that teach the model to talk to itself. Merging consecutive same-sender messages into one turn fixes this and is the single most important preprocessing step.
Why did the first fine-tuned models give such short answers?
Short inputs in training produce short outputs at inference. My initial pairs were single-message prompts paired with single-message responses, so the model learned to answer in fragments. Merging consecutive messages before pairing fixed it — the model then saw full multi-sentence prompts and responded in kind.
Which OpenAI models work for this?
I tried Babbage-002, GPT-4o-mini-2024-07-18, and GPT-4o-2024-08-06. Babbage-002 is too small to pick up personality. GPT-4o-mini was cheap and reasonable but shallow. GPT-4o caught more voice and slang but needed cleaner data (see Part 2).
Is this legal and private?
Use only your own exported messages. Don't train on data belonging to other people without consent. Redact any personal info — real names, addresses, numbers — before uploading as training data. OpenAI's fine-tuning endpoint retains data per their published terms, so read those before you hit submit.