AI Project

Building an AI Credit-Scoring Engine at Payriff (Fintech Internship)

September 1, 20245 min read828 wordsUpdated Apr 24, 2026

At Payriff over summer 2024 I worked on an AI credit-scoring engine that fuses traditional bank data with mobile-provider signals, and shipped a RAG-based chatbot that teaches developers the Payriff API — cutting onboarding from hours to minutes. This is the internship I don't write about often enough. The work is proprietary so the specifics below are generalized, but the architecture and lessons are the ones I'd give to anyone building AI for fintech onboarding.

TL;DR

Credit-scoring engine: combined bank and mobile-provider signals to improve thin-file applicant scoring.

Developer chatbot: RAG over the public Payriff API docs; answers questions with code + doc links.

Team dynamic: the wins came from pairing ML engineers with credit analysts, not from tuning models in isolation.

Stack: Python, FastAPI, OpenAI, standard ML tooling — nothing exotic. The value is in the data pipeline and the domain expertise.

Context

Payriff is a payments platform. The company wanted two AI-shaped problems solved:

Score applicants more accurately, especially "thin-file" customers — those without rich histories in a single bank.
Reduce developer onboarding friction for merchants integrating the Payriff API for the first time.

Both problems had been worked on with conventional software. Both had ceilings that AI could plausibly raise.

Problem 1 — credit scoring with mobile-provider data

The core insight came from the credit analysts on the team, not from any ML paper: mobile-provider behavior correlates with repayment reliability. Someone who has maintained the same plan for 18 months, tops up consistently, and hasn't switched devices in a year tends to repay on time. Someone whose mobile history is erratic often is, too.

The engine takes two data sources:

| Source | Signal | |---|---| | Banks | Balance, transaction cadence, overdraft frequency, tenure | | Mobile providers | Plan tenure, top-up regularity, device stability |

And produces a unified score. For thin-file applicants — where the bank signal alone is weak — the mobile signal carries real weight.

The modeling work was standard: gradient-boosted trees with interpretable features (no black-box deep nets for regulated decisions), rigorous calibration, careful train/validate/test splits with time-based holdouts. The interesting work was feature engineering in tight collaboration with the credit team. They knew which mobile signals correlated with their existing default portfolio better than any automatic feature selection would have surfaced in three months of tuning.

Auditability

Everything that goes into a score is logged:

Every feature value (with provenance).
Every model version the score came from (pinned, never "latest").
Every manual override (with reason and reviewer).

In a regulated stack, interpretability isn't optional — it's the difference between shipping and not shipping.

Problem 2 — the developer-docs chatbot

Developers integrating Payriff were spending hours reading API docs. Not because the docs were bad — because integration is inherently "I need to do X specifically," and docs are organized by endpoint not by intent.

I built a RAG chatbot over the public Payriff API documentation so new integrators could ask questions in plain language and get back:

A precise code snippet for their language.
A link to the exact documentation section the answer came from.
(When the retrieval was weak) an honest "I'm not sure — here's the closest docs section."

User: "how do I charge a saved card in Node?"
Bot:
  <Node snippet>
  Source: /docs/payments/charge-saved-card

Pipeline:

Embed the docs per heading chunk with OpenAI embeddings.
Retrieve top-K chunks for each incoming question.
Feed the LLM the question + retrieved chunks + a strict system prompt that requires a source link.
Fail loudly when retrieval confidence is below a threshold (better to say "I don't know" than to hallucinate a parameter name).

First-day developer onboarding time dropped from "hours of doc-scanning" to "minutes of targeted Q&A" in the internal pilots we ran.

What I learned

The ML was the easy part. The data pipeline and the collaboration with non-ML domain experts was the hard part — and the part that made the system accurate.
Thin-file scoring with alternative data is an emerging-markets superpower. It lets fair decisions be made for people a traditional bureau underrates.
A docs chatbot is a conversion tool. For any platform with a developer audience, it's one of the highest-leverage AI products you can ship.
Interpretability is a shipping requirement in regulated stacks — gradient-boosted trees and clear features over deep nets and mysterious embeddings.

Background / resume context

This was an AI Intern role at Payriff in Baku, Azerbaijan (June–September 2024). It's listed on my about page and projects page. For related work from my Purdue AI + Mathematics degree and other projects, see my blog archive and the handwriting-generator writeup.

Key takeaways

Mobile-provider data is a strong complement to bank data for thin-file credit scoring.
Pair ML engineers with domain experts. The credit analyst knows which features matter better than your AutoML pipeline does.
A well-grounded RAG chatbot is the single most valuable AI feature for a developer-facing API platform.

References

Payriff — payments platform, Baku, Azerbaijan
My projects · About me

Frequently Asked Questions

Why use mobile-provider data for credit scoring?

In emerging markets, many applicants are thin-file — they don't have a long credit history with a single bank, so traditional scoring models underestimate them. Mobile-provider signals (consistent top-up patterns, steady plan tenure, device stability) correlate with repayment reliability and fill that gap without requiring personal bureau data the applicant may not have.

What does a RAG chatbot for a payments API actually do?

It answers 'how do I do X with the Payriff API?' with an accurate code snippet and a link to the exact doc section. Behind the scenes it embeds the public API docs, retrieves the top-K relevant sections per query, and conditions the LLM on those. It cuts first-day developer onboarding time from hours of doc-scanning to minutes of targeted Q&A.

How do you keep a fintech ML pipeline auditable?

Log every feature and score, version the training data, version the model, and never score a production request from a model that isn't pinned to a specific version. Keep the feature set small enough that a human can reason about each contributor — interpretability isn't optional in a regulated stack.

What's the single biggest lesson from the internship?

Feature engineering with domain experts beats feature engineering in a vacuum. The credit analysts on the team knew which mobile-provider signals correlated with default in their portfolio better than any automatic feature selection would surface in three months of tuning.

ai-credit-scoringfintechmachine-learningragchatbotinternshippayriffazerbaijan

Frequently Asked Questions

related posts