Today my Hindi NER fine-tune broke mid-train ride

News & Updates
Posted by Avatar h/rajesh_k • Mar 29, 2026

Today was one of those garbage days where I thought I could squeeze model work between a bus ride to Palolem and a late afternoon walk on the beach, so at 11:13 AM IST I SSHed into my AWS g4dn.xlarge spot (T4) from my phone hotspot and started a fine-tune of bert-base-multilingual-cased on my tiny TouristHIN-ner-v1 dataset made from TripAdvisor reviews, using transformers 4.30.2, PyTorch 1.13.1, CUDA 11.7, Python 3.9, seq_len 128, batch_size 32, lr 3e-5, seed 42; training started fine but after the first eval the loss plunged to 0.01 and eval F1 crashed to 0.02 with the model predicting all O tags, and because my hotspot kept dropping when the bus entered a tunnel I only saw partial logs so I assumed data noise or class imbalance and kept going; later on the beach with actual Wi-Fi I pulled the full logs and found no fatal exception, just a nasty label misalignment caused by my quick edit to convert_conll.py at 09:02 AM where I tried to strip punctuation and accidentally removed blank lines so sentence boundaries merged, and on top of that I tokenized without is_split_into_words=True so word-to-token mapping was never used and labels got shifted by the [CLS] token while my label_ids array ended up shorter than input_ids but Trainer quietly continued training; debugging sucked because the training run had already overwritten the checkpoint every hour, I had an old checkpoint at step 1200 that looked fine and a latest one at 5400 that memorized nothing, so I rewrote the preprocessing on my laptop to keep empty line markers, fixed convert_conll.py line 78 where I was popping labels incorrectly, changed tokenizer call to tokenizer(sent_tokens, is_split_into_words=True, truncation=True, max_length=128) and remapped labels with word_ids, then relaunched a clean run at 16:45 IST and after one epoch eval F1 went up to 0.62 on the same seed which proved the problem was alignment not model size; lesson learned the hard way while trying to work between bus stops and chai stalls, next time I'm not touching preprocessing when I'm off a proper keyboard and stable net.

5 COMMENTS

THE LOOP (5)

Log in to join The Loop and share your thoughts.

Log In
Avatar h/coolboy65 • Mar 29, 2026
Been there. Quick tips:
- Keep blank-line sentence separators and add a sanity check that for each sentence len(words)==len(labels).
- Tokenize with is_split_into_words=True and remap labels with word_ids(), set subword labels to -100.
- Save checkpoints more often, push to S3 or enable load_best_model_at_end=True with metric_for_best_model so spot interruptions don’t overwrite good runs.
- Add a tiny unit test on a few examples to catch label/token misalignments before a long run.
1 REPLY
Avatar h/cosmic_dreamer • Mar 29, 2026
@coolboy65 Good list, especially the sanity check and subword -100.

Wish people tested this before wasting my GPUs.
2 REPLY
Avatar h/coolboy65 • Mar 29, 2026
@cosmic_dreamer Preach.

People should test configs on a tiny run before wasting GPUs.
2 REPLY
Avatar h/rajesh_k • Mar 29, 2026
@coolboy65, I appreciate the tips, but I think my main issue was not just with the technical aspects but the timing of the failure while on a train ride. It's hard to implement changes when you're in a moving environment.

I get that those strategies are solid for preventing future mishaps, but I was caught off guard. It's important to have a backup plan, especially while traveling. Just sharing my experience here.
2 REPLY
Avatar h/himanshusharma • Mar 29, 2026
At least have proper spacing when you write. I know you are frustrated but who will read this long paragraph?
0 REPLY