Planning AI Applications
Planning AI Applications
Given the seemingly limitless potential of AI, it's tempting to jump into building applications. If you just want to learn and have fun, jump right in — building is one of the best ways to learn.
But if you're doing this for a living, take a step back and consider why you're building this and how. It's easy to build a cool demo with foundation models. It's hard to create a profitable product.
Use Case Evaluation
The first question to ask is why you want to build this application. Like many business decisions, building an AI application is often a response to risks and opportunities. Here are a few examples of different levels of risks, ordered from high to low:
Existential Threat
Profit & Productivity
Don't Get Left Behind
The Role of AI and Humans in the Application
What role AI plays in the AI product influences the application's development and its requirements. Apple has a great document explaining different ways AI can be used in a product. Here are three key dimensions:
Critical or Complementary
If an app can still work without AI, AI is complementary. Face ID wouldn't work without AI-powered facial recognition; Gmail would still work without Smart Compose.
The more critical AI is, the more accurate and reliable it has to be. People are more accepting of mistakes when AI isn't core to the application.
Reactive or Proactive
A reactive feature responds to user requests; a proactive feature appears when there's an opportunity. A chatbot is reactive; traffic alerts on Google Maps are proactive.
Reactive features usually need to happen fast. Proactive features can be precomputed, but since users didn't ask for them, low quality feels intrusive — so they typically have a higher quality bar.
Dynamic or Static
Dynamic features update continually with user feedback; static features update periodically. Face ID adapts as faces change; object detection in Google Photos updates only when the app is upgraded.
Dynamic features might mean each user has their own model — finetuned on their data, or personalized via mechanisms like ChatGPT's memory. Static features typically share one model across users.
It's also important to clarify the role of humans. Will AI provide background support, make decisions directly, or both? For a customer support chatbot, AI responses can be used in different ways:
- AI shows several responses that human agents can reference to write faster responses.
- AI responds only to simple requests and routes more complex requests to humans.
- AI responds to all requests directly, without human involvement.
Microsoft (2023) proposed a framework for gradually increasing AI automation in products called Crawl-Walk-Run:
Crawl
Human involvement is mandatory.
Walk
AI can directly interact with internal employees.
Run
Increased automation, potentially including direct AI interactions with external users.
The role of humans can change over time as the quality of the AI system improves. For example, in the beginning you might use AI to generate suggestions for human agents. If the acceptance rate is high — say, 95% of AI-suggested responses to simple requests are used verbatim — you can let customers interact with AI directly for those simple requests.
AI Product Defensibility
If you're selling AI applications as standalone products, defensibility matters. The low entry barrier is both a blessing and a curse: if something is easy for you to build, it's also easy for your competitors. What moats do you have?
One general partner at a major VC firm told me she's seen many startups whose entire products could be a feature for Google Docs or Microsoft Office. If their products take off, what would stop Google or Microsoft from allocating three engineers to replicate them in two weeks?
In AI, there are generally three types of competitive advantages:
Technology
Data
Distribution
Setting Expectations
Once you've decided to build this AI application yourself, the next step is to figure out what success looks like. The most important metric is how this will impact your business. For a customer support chatbot, business metrics might include:
- What percentage of customer messages do you want the chatbot to automate?
- How many more messages should the chatbot allow you to process?
- How much quicker can you respond using the chatbot?
- How much human labor can the chatbot save you?
To ensure a product isn't put in front of customers before it's ready, have clear expectations on its usefulness threshold: how good it has to be for it to be useful.
Quality
Latency
Cost
Other
If you're not yet sure what metrics you want to use, don't worry — the rest of the book will cover many of these.
Milestone Planning
Once you've set measurable goals, you need a plan to achieve them. How to get there depends on where you start. Evaluate existing models to understand their capabilities — the stronger the off-the-shelf models, the less work you'll have to do.
For example, if your goal is to automate 60% of customer support tickets and an off-the-shelf model can already handle 30%, the effort needed is much less than if it could automate none. It's likely that your goals will change after evaluation. You may realize the resources needed to get the app to the usefulness threshold exceed its potential return — and decide not to pursue it.
Maintenance
Product planning doesn't stop at achieving its goals. You need to think about how this product will change over time and how it should be maintained. AI's fast pace of change adds extra challenge. Building on top of foundation models today means committing to riding this bullet train.
Many changes are good. Limitations are being addressed: context lengths are getting longer, model outputs are getting better, and inference is getting faster and cheaper. Figure 1-11 shows the evolution of inference cost and model performance on Massive Multitask Language Understanding (MMLU) (Hendrycks et al., 2020), a popular foundation model benchmark, between 2022 and 2024.

Figure 1-11. The cost of AI reasoning rapidly drops over time. Image from Nguyen (2024).
But even good changes cause friction. You'll have to constantly run a cost-benefit analysis of each technology investment.
Pricing Whiplash
Vendor Risk
Model Swaps
Regulations
Once you've committed to building an AI product, let's look into the engineering stack needed to build these applications.
Footnotes
- Smaller startups, however, might have to prioritize product focus and can't afford to have even one person to "look around." ↩
- A running joke in the early days of generative AI is that AI startups are OpenAI or Claude wrappers. ↩
- During the process of writing this book, I could hardly talk to any AI startup without hearing the phrase "data flywheel." ↩
- Disclaimer: I'm an investor in Photoroom. ↩
Foundation Model Use Cases
A tour of industry-proven and promising use cases for foundation models — from coding and creative work to writing, education, chatbots, information aggregation, data organization, and workflow automation.
The AI Engineering Stack
The three layers of the AI stack, how AI engineering differs from ML engineering and full-stack development, and how foundation models reshape model and application development.