Introduction to Building AI Applications with Foundation Models

Planning AI Applications

How to evaluate use cases, build vs buy, set success metrics, plan milestones, and maintain AI products in a fast-moving landscape.

Planning AI Applications

Given the seemingly limitless potential of AI, it's tempting to jump into building applications. If you just want to learn and have fun, jump right in — building is one of the best ways to learn.

But if you're doing this for a living, take a step back and consider why you're building this and how. It's easy to build a cool demo with foundation models. It's hard to create a profitable product.

Use Case Evaluation

The first question to ask is why you want to build this application. Like many business decisions, building an AI application is often a response to risks and opportunities. Here are a few examples of different levels of risks, ordered from high to low:

Existential Threat

If you don't do this, competitors with AI can make you obsolete. If AI poses a major existential threat to your business, incorporating AI must have the highest priority. In the 2023 Gartner study, 7% cited business continuity as their reason for embracing AI. This is more common for businesses involving document processing and information aggregation — financial analysis, insurance, data processing — and creative work like advertising, web design, and image production. See the 2023 OpenAI study, "GPTs are GPTs" (Eloundou et al., 2023), to see how industries rank in their exposure to AI.

Profit & Productivity

If you don't do this, you'll miss opportunities to boost profits and productivity. Most companies embrace AI for the opportunities it brings. AI can make user acquisition cheaper through more effective copywrites, product descriptions, and visuals; increase retention by improving customer support and personalizing experience; and help with sales lead generation, internal communication, market research, and competitor tracking.

Don't Get Left Behind

You're unsure where AI will fit yet, but you don't want to be left behind. While a company shouldn't chase every hype train, many have failed by waiting too long (cue Kodak, Blockbuster, and BlackBerry). Investing resources into understanding how a transformational technology can impact your business isn't a bad idea if you can afford it. At bigger companies, this can be part of R&D.¹

Once you've found a good reason to develop this use case, consider whether you have to build it yourself. If AI poses an existential threat to your business, you might want to do AI in-house instead of outsourcing it to a competitor. However, if you're using AI to boost profits and productivity, you might have plenty of buy options that save time and money while delivering better performance.

The Role of AI and Humans in the Application

What role AI plays in the AI product influences the application's development and its requirements. Apple has a great document explaining different ways AI can be used in a product. Here are three key dimensions:

Critical or Complementary

If an app can still work without AI, AI is complementary. Face ID wouldn't work without AI-powered facial recognition; Gmail would still work without Smart Compose.

The more critical AI is, the more accurate and reliable it has to be. People are more accepting of mistakes when AI isn't core to the application.

Reactive or Proactive

A reactive feature responds to user requests; a proactive feature appears when there's an opportunity. A chatbot is reactive; traffic alerts on Google Maps are proactive.

Reactive features usually need to happen fast. Proactive features can be precomputed, but since users didn't ask for them, low quality feels intrusive — so they typically have a higher quality bar.

Dynamic or Static

Dynamic features update continually with user feedback; static features update periodically. Face ID adapts as faces change; object detection in Google Photos updates only when the app is upgraded.

Dynamic features might mean each user has their own model — finetuned on their data, or personalized via mechanisms like ChatGPT's memory. Static features typically share one model across users.

It's also important to clarify the role of humans. Will AI provide background support, make decisions directly, or both? For a customer support chatbot, AI responses can be used in different ways:

AI shows several responses that human agents can reference to write faster responses.
AI responds only to simple requests and routes more complex requests to humans.
AI responds to all requests directly, without human involvement.

Involving humans in AI's decision-making processes is called human-in-the-loop.

Microsoft (2023) proposed a framework for gradually increasing AI automation in products called Crawl-Walk-Run:

Crawl

Human involvement is mandatory.

Walk

AI can directly interact with internal employees.

Run

Increased automation, potentially including direct AI interactions with external users.

The role of humans can change over time as the quality of the AI system improves. For example, in the beginning you might use AI to generate suggestions for human agents. If the acceptance rate is high — say, 95% of AI-suggested responses to simple requests are used verbatim — you can let customers interact with AI directly for those simple requests.

AI Product Defensibility

If you're selling AI applications as standalone products, defensibility matters. The low entry barrier is both a blessing and a curse: if something is easy for you to build, it's also easy for your competitors. What moats do you have?

Building applications on top of foundation models means providing a layer on top of these models.² If the underlying models expand in capabilities, your layer might be subsumed by the models, rendering your app obsolete. Imagine building a PDF-parsing app on top of ChatGPT assuming it can't parse PDFs well. Your ability to compete weakens once that assumption fails. (Even then, a PDF-parsing app might still make sense built on top of open source models, for users who want to host in-house.)

One general partner at a major VC firm told me she's seen many startups whose entire products could be a feature for Google Docs or Microsoft Office. If their products take off, what would stop Google or Microsoft from allocating three engineers to replicate them in two weeks?

In AI, there are generally three types of competitive advantages:

Technology

With foundation models, the core technologies of most companies will be similar. Hard to differentiate here.

Data

Big companies likely have more existing data. But a startup that gets to market first and gathers usage data to continually improve their products can make data their moat. Even when usage data can't train models directly, it gives invaluable insights into user behavior and product shortcomings to guide future data collection.³

Distribution

The ability to bring your product in front of users. This advantage likely belongs to big companies.

Many successful companies' original products could've been features of larger products. Calendly could've been a feature of Google Calendar. Mailchimp could've been a feature of Gmail. Photoroom could've been a feature of Google Photos.⁴ Many startups eventually overtake bigger competitors by building a feature those competitors overlooked. Perhaps yours can be next.

Setting Expectations

Once you've decided to build this AI application yourself, the next step is to figure out what success looks like. The most important metric is how this will impact your business. For a customer support chatbot, business metrics might include:

What percentage of customer messages do you want the chatbot to automate?
How many more messages should the chatbot allow you to process?
How much quicker can you respond using the chatbot?
How much human labor can the chatbot save you?

A chatbot can answer more messages, but that doesn't mean it'll make users happy — track customer satisfaction and feedback in general. "User Feedback" on page 474 discusses how to design a feedback system.

To ensure a product isn't put in front of customers before it's ready, have clear expectations on its usefulness threshold: how good it has to be for it to be useful.

Quality

Measure the quality of the chatbot's responses.

Latency

TTFT (time to first token), TPOT (time per output token), and total latency. What's acceptable depends on your use case — if humans currently respond with a median of an hour, anything faster is good enough.

Cost

How much it costs per inference request.

Other

Metrics such as interpretability and fairness.

If you're not yet sure what metrics you want to use, don't worry — the rest of the book will cover many of these.

Milestone Planning

Once you've set measurable goals, you need a plan to achieve them. How to get there depends on where you start. Evaluate existing models to understand their capabilities — the stronger the off-the-shelf models, the less work you'll have to do.

For example, if your goal is to automate 60% of customer support tickets and an off-the-shelf model can already handle 30%, the effort needed is much less than if it could automate none. It's likely that your goals will change after evaluation. You may realize the resources needed to get the app to the usefulness threshold exceed its potential return — and decide not to pursue it.

The last mile challenge. Initial success with foundation models can be misleading. Base capabilities are already quite impressive, so building a fun demo doesn't take long. But a good demo doesn't promise a good end product. It might take a weekend to build a demo but months — or years — to build a product.

In the paper UltraChat, Ding et al. (2023) shared that "the journey from 0 to 60 is easy, whereas progressing from 60 to 100 becomes exceedingly challenging."LinkedIn (2024) shared the same sentiment: it took them one month to achieve 80% of the experience they wanted, but four more months to surpass 95%. A lot of time was spent on product kinks and hallucinations. The slow speed of achieving each subsequent 1% gain was discouraging.

Maintenance

Product planning doesn't stop at achieving its goals. You need to think about how this product will change over time and how it should be maintained. AI's fast pace of change adds extra challenge. Building on top of foundation models today means committing to riding this bullet train.

Many changes are good. Limitations are being addressed: context lengths are getting longer, model outputs are getting better, and inference is getting faster and cheaper. Figure 1-11 shows the evolution of inference cost and model performance on Massive Multitask Language Understanding (MMLU) (Hendrycks et al., 2020), a popular foundation model benchmark, between 2022 and 2024.

Figure 1-11. The cost of AI reasoning rapidly drops over time.

Figure 1-11. The cost of AI reasoning rapidly drops over time. Image from Nguyen (2024).

But even good changes cause friction. You'll have to constantly run a cost-benefit analysis of each technology investment.

Pricing Whiplash

You may decide to build a model in-house because it seems cheaper than paying for model providers, only to find out after three months that providers have dropped their prices in half — making in-house the expensive option.

Vendor Risk

You might invest in a third-party solution and tailor your infrastructure around it, only for the provider to go out of business after failing to secure funding.

Model Swaps

As model providers converge to the same API, swapping one for another is getting easier. But each model has its quirks, strengths, and weaknesses — workflows, prompts, and data still need adjustment. Without proper versioning and evaluation infrastructure, the process can cause headaches.

Regulations

Technologies surrounding AI are considered national security issues for many countries. GDPR was estimated to cost businesses $9 billion to comply with. Compute availability can change overnight (see the US October 2023 Executive Order). If your GPU vendor is suddenly banned from selling to your country, you're in trouble.

Some changes can be fatal. Regulations around intellectual property and AI usage are still evolving. If you build your product on top of a model trained using other people's data, can you be certain your product's IP will always belong to you? Many IP-heavy companies — game studios, for example — hesitate to use AI for fear of losing their IPs later on.

Once you've committed to building an AI product, let's look into the engineering stack needed to build these applications.

Smaller startups, however, might have to prioritize product focus and can't afford to have even one person to "look around." ↩
A running joke in the early days of generative AI is that AI startups are OpenAI or Claude wrappers. ↩
During the process of writing this book, I could hardly talk to any AI startup without hearing the phrase "data flywheel." ↩
Disclaimer: I'm an investor in Photoroom. ↩

Edit this pageorReport an issue

Foundation Model Use Cases

A tour of industry-proven and promising use cases for foundation models — from coding and creative work to writing, education, chatbots, information aggregation, data organization, and workflow automation.

The AI Engineering Stack

The three layers of the AI stack, how AI engineering differs from ML engineering and full-stack development, and how foundation models reshape model and application development.