
A quick checklist to unmask the truth behind the hype and determine if a startup is building real tech or just dressing up an API.
Follow these steps to peer under the hood and spot a true AI innovator before the smoke and mirrors fade.
1. The One-Killer Question
“If OpenAI (or Anthropic etc.) shut off your API access tomorrow, what still works?”
Real AI company:
- Talks about own models, pipelines, data, on-prem fallback, other providers, fine-tuned checkpoints.
- Mentions pain, but has a plan.
API wrapper:
- Mumbles something about:
- “We’re provider-agnostic”
- “We’ll just switch vendors”
- “We’re more of an orchestration layer”
- Translation: If API dies, we die.
2. Ask: “What is your real IP?”
“What’s your moat, excluding your UI and excluding the base model (GPT, Claude, etc.)?”
Green flags:
- Domain-specific datasets
- Labelling pipelines, evaluators
- Custom ranking / scoring systems
- In-house tools / agents / retrieval infra
- Clear evaluation framework (benchmarks)
Red flags:
- “Our prompts”
- “Our UX”
- “Our workflow builder”
- “Our brand”
- “Our templates marketplace”
Prompts are not IP. They’re seasoning.
3. Follow the Money: Infra & Team
Quick checks:
- “What’s your monthly spend on GPUs / inference infra?”
- “Who on your team has actually trained or fine-tuned a model at scale?”
Red flags:
- No GPU bills, only “OpenAI usage”.
- “We don’t really need ML engineers yet.”
- CTO is a full-stack dev, no real ML depth.
If nobody has suffered through:
- CUDA errors
- exploding gradients
- data cleaning hell
…it’s probably an API wrapper.
4. The Latency Fingerprint Test
Ask them to:
- Run the product live
- Try a few unprompted, weird queries
- Notice:
- Response time
- Style
- Failure modes
If it:
- Feels exactly like ChatGPT/Claude
- Has similar delay patterns
- Hallucinates in the same style
…you’re basically watching re-skinned ChatGPT.
5. Ask for the Architecture Diagram
“Show me your technical architecture, from data ingestion to model output.”
Green flags:
- Separate blocks for:
- Data ingestion
- Preprocessing
- Vector DB / retrieval
- Model(s)
- Evaluation / monitoring
- Feedback loop / retraining
Red flags:
- Big box: “LLM provider”
- Arrow to: “Our app”
- Lots of arrows and buzzwords, no data flow clarity.
If the entire brain is one SaaS logo, it’s a wrapper.
6. Ask About Evaluation
“How do you measure model quality? Show me your benchmarks.”
Real AI team:
- Talks about:
- Accuracy, F1, BLEU, ROUGE, win-rates
- Custom eval datasets
- Regression tests
- A/B experiments
Wrapper team:
- Talks about:
- “Users love it”
- “Great feedback”
- “Engagement is high”
- “We’re iterating fast”
No eval pipeline = no depth.
7. Model Ownership Question
“Which parts of your system are fully under your control, and which are just vendor dependencies?”
You’re looking for:
- In-house models or at least adapted models
- Own embedding / retrieval / ranking stack
- Ability to move between providers without rewriting the whole product
Red flag answer:
“We’re built deeply on OpenAI, but we have a lot of optimizations on top.”
That’s like saying:
“We own a restaurant. Our IP is Swiggy.”
8. Data Story Interrogation
“Walk me through your data pipeline.”
Good answer includes:
- Where data comes from
- How it’s cleaned
- How labels are created
- How it’s stored
- How it’s used for:
- Fine-tuning
- RAG
- Evaluations
Red flags:
- “We don’t really need data, the foundation model is so good.”
- “Clients bring their own data and we just plug it in.”
- “We store it in a vector DB and… magic.”
No data thinking → glorified front-end.
9. Ask for a Local / Air-gapped Story
“Could you run a version of this entirely on-prem or air-gapped, if a bank or hospital required it?”
Real AI company:
- Says “yes, but expensive”, and explains:
- Containerization
- Self-hosted models
- Security considerations
API wrapper:
- “We’re cloud-native.”
- “Our value is in the cloud.”
- “Security is handled by OpenAI/AWS/etc.”
Translation: No control, no depth.
10. The “Non-LLM Feature” Trap
“Show me a feature your product has that would still be valuable even if LLMs disappeared tomorrow.”
If they can’t name:
- Workflows
- Integrations
- Dashboards
- Analytics
- Domain-specific tools
…then the only asset is “access to someone else’s model”.
That’s not a startup. That’s a skin.
11. Contract & Pricing Smell Test
Red flags:
- Pricing is purely usage-based on tokens with a fat margin.
- No:
- Implementation fee
- Customization
- Managed service component
- Value prop is:
- “We make ChatGPT safer/easier for your team.”
That’s basically:
“We are a UI tax on OpenAI.”
12. Ask Them to Draw the Boundary
“Draw a line between what the LLM does and what your system does.”
Good founders:
- Explicitly separate:
- Reasoning
- Retrieval
- Guardrails
- Business logic
- Orchestration
- Post-processing
Wrappers:
- Handwave:
- “The LLM handles that.”
- “We use AI agents.”
- “We orchestrate tools.”
Every time you hear “agentic”, mentally replace it with “glorified prompt chain”.
Quick Checklist (Investor Mode)
Print this in your head:
- ❓ Can they survive 6 months without OpenAI/Anthropic?
- ❓ Do they have any real in-house ML talent?
- ❓ Is there a proper data + eval pipeline?
- ❓ Is their infra more than: frontend → API → LLM?
- ❓ Do they own anything you can’t recreate in 3 months with a dev and a credit card?
If the honest answer is “no” across the board → API wrapper.