[{"data":1,"prerenderedAt":1394},["ShallowReactive",2],{"navigation_docs_en":3,"\u002Fen\u002Fai-engineering\u002Fintro\u002Fch014-the-ai-engineering-stack":46,"\u002Fen\u002Fai-engineering\u002Fintro\u002Fch014-the-ai-engineering-stack-surround":1389},[4],{"title":5,"icon":6,"path":7,"stem":8,"children":9,"page":45},"AI Engineering",null,"\u002Fen\u002Fai-engineering","en\u002F1.ai-engineering",[10],{"title":11,"icon":12,"path":13,"stem":14,"children":15,"page":45},"Introduction to Building AI Applications with Foundation Models","i-lucide-brain-circuit","\u002Fen\u002Fai-engineering\u002Fintro","en\u002F1.ai-engineering\u002F1.intro",[16,20,25,30,35,40],{"title":11,"path":17,"stem":18,"icon":19},"\u002Fen\u002Fai-engineering\u002Fintro\u002Fch01","en\u002F1.ai-engineering\u002F1.intro\u002Fch01","i-lucide-sparkles",{"title":21,"path":22,"stem":23,"icon":24},"The Rise of AI Engineering","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch011-the-rise-of-ai-engineering","en\u002F1.ai-engineering\u002F1.intro\u002Fch011-the-rise-of-ai-engineering","i-lucide-history",{"title":26,"path":27,"stem":28,"icon":29},"Foundation Model Use Cases","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch012-foundation-model-use-cases","en\u002F1.ai-engineering\u002F1.intro\u002Fch012-foundation-model-use-cases","i-lucide-layout-grid",{"title":31,"path":32,"stem":33,"icon":34},"Planning AI Applications","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch013-planning-ai-applications","en\u002F1.ai-engineering\u002F1.intro\u002Fch013-planning-ai-applications","i-lucide-clipboard-list",{"title":36,"path":37,"stem":38,"icon":39},"The AI Engineering Stack","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch014-the-ai-engineering-stack","en\u002F1.ai-engineering\u002F1.intro\u002Fch014-the-ai-engineering-stack","i-lucide-layers",{"title":41,"path":42,"stem":43,"icon":44},"Summary","\u002Fen\u002Fai-engineering\u002Fintro\u002Fch015-summary","en\u002F1.ai-engineering\u002F1.intro\u002Fch015-summary","i-lucide-flag",false,{"id":47,"title":36,"body":48,"description":1383,"extension":1384,"links":6,"meta":1385,"navigation":1386,"path":37,"seo":1387,"stem":38,"__hash__":1388},"docs_en\u002Fen\u002F1.ai-engineering\u002F1.intro\u002Fch014-the-ai-engineering-stack.md",{"type":49,"value":50,"toc":1368},"minimark",[51,76,81,94,100,105,112,117,121,132,135,139,146,180,187,192,196,208,213,218,237,244,247,251,254,260,317,329,383,386,389,399,404,412,423,430,455,595,599,608,632,647,670,674,686,704,707,712,773,780,783,793,799,803,812,818,842,865,870,994,998,1009,1029,1033,1048,1071,1074,1114,1125,1136,1139,1144,1185,1189,1204,1246,1256,1267,1273,1284,1294],[52,53,54,58],"u-page-hero",{},[55,56,36],"template",{"v-slot:title":57},"",[55,59,60,69],{"v-slot:description":57},[61,62,63,64,68],"p",{},"AI engineering's rapid growth induced an incredible amount of ",[65,66,67],"strong",{},"hype and FOMO",". The number of new tools, techniques, models, and applications introduced every day can be overwhelming.",[61,70,71,72,75],{},"Instead of chasing constantly shifting sand, let's look into the ",[65,73,74],{},"fundamental building blocks"," of AI engineering.",[77,78,80],"h2",{"id":79},"where-ai-engineering-comes-from","Where AI Engineering Comes From",[61,82,83,84,87,88,93],{},"To understand AI engineering, it's important to recognize that ",[65,85,86],{},"AI engineering evolved out of ML engineering",". When a company starts experimenting with foundation models, it's natural that its existing ML team should lead the effort. Some companies treat AI engineering the same as ML engineering, as shown in ",[89,90,92],"a",{"href":91},".\u002Fmedia\u002Ffig-1-12.png","Figure 1-12",".",[61,95,96],{},[97,98],"img",{"alt":99,"src":91},"Figure 1-12. Many companies put AI engineering and ML engineering under the same",[101,102,103],"blockquote",{},[61,104,99],{},[61,106,107,108,93],{},"Some companies have separate job descriptions for AI engineering, as shown in ",[89,109,111],{"href":110},".\u002Fmedia\u002Ffig-1-13.png","Figure 1-13",[61,113,114],{},[97,115],{"alt":116,"src":110},"Figure 1-13. Some companies have separate job descriptions for AI engineering, as shown in the job headlines on LinkedIn from December 17, 2023.",[101,118,119],{},[61,120,116],{},[122,123,124,125,128,129,93],"note",{},"Regardless of where organizations position AI engineers and ML engineers, their roles have ",[65,126,127],{},"significant overlap",". Existing ML engineers can add AI engineering to their lists of skills to expand their job prospects. However, there are also AI engineers with ",[65,130,131],{},"no previous ML experience",[61,133,134],{},"To best understand AI engineering and how it differs from traditional ML engineering, the following section breaks down different layers of the AI application building process and looks at the role each layer plays in AI engineering and ML engineering.",[77,136,138],{"id":137},"three-layers-of-the-ai-stack","Three Layers of the AI Stack",[61,140,141,142,145],{},"There are three layers to any AI application stack. When developing an AI application, you'll likely ",[65,143,144],{},"start from the top layer and move down as needed",":",[147,148,149,163,172],"card-group",{},[150,151,154,155,158,159,162],"card",{"icon":152,"title":153},"i-lucide-app-window","Application Development","With models readily available, anyone can use them to develop applications. This is the layer that has seen the ",[65,156,157],{},"most action"," in the last two years, and it is still rapidly evolving. Application development involves providing a model with good prompts and necessary context. This layer requires ",[65,160,161],{},"rigorous evaluation",". Good applications also demand good interfaces.",[150,164,167,168,171],{"icon":165,"title":166},"i-lucide-cpu","Model Development","This layer provides tooling for developing models, including frameworks for ",[65,169,170],{},"modeling, training, finetuning, and inference optimization",". Because data is central to model development, this layer also contains dataset engineering. Model development also requires rigorous evaluation.",[150,173,176,177,93],{"icon":174,"title":175},"i-lucide-server","Infrastructure","At the bottom of the stack is infrastructure, which includes tooling for ",[65,178,179],{},"model serving, managing data and compute, and monitoring",[61,181,182,183,93],{},"These three layers and examples of responsibilities for each layer are shown in ",[89,184,186],{"href":185},".\u002Fmedia\u002Ffig-1-14.png","Figure 1-14",[61,188,189],{},[97,190],{"alt":191,"src":185},"Figure 1-14. Three layers of the AI engineering stack.",[101,193,194],{},[61,195,191],{},[61,197,198,199,202,203,207],{},"To get a sense of how the landscape has evolved with foundation models, in March 2024, I searched GitHub for all AI-related repositories with at least 500 stars. Given the prevalence of GitHub, I believe this data is a good proxy for understanding the ecosystem. In my analysis, I also included repositories for applications and models, which are the products of the application development and model development layers, respectively. I found a total of ",[65,200,201],{},"920 repositories",". ",[89,204,206],{"href":205},".\u002Fmedia\u002Ffig-1-15.png","Figure 1-15"," shows the cumulative number of repositories in each category month-over-month.",[61,209,210],{},[97,211],{"alt":212,"src":205},"Figure 1-15. Cumulative count of repositories by category over time",[101,214,215],{},[61,216,217],{},"Figure 1-15. Cumulative count of repositories by category over time.",[219,220,221,222,225,226,229,230,233,234,93],"tip",{},"The data shows a ",[65,223,224],{},"big jump"," in the number of AI toolings in 2023, after the introduction of Stable Diffusion and ChatGPT. The categories that saw the highest increases were ",[65,227,228],{},"applications and application development",". The infrastructure layer saw some growth, but it was much less. Even though models and applications have changed, the ",[65,231,232],{},"core infrastructural needs"," — resource management, serving, monitoring, etc. — ",[65,235,236],{},"remain the same",[61,238,239,240,243],{},"While the level of excitement and creativity around foundation models is unprecedented, ",[65,241,242],{},"many principles of building AI applications remain the same",". For enterprise use cases, AI applications still need to solve business problems, and, therefore, it's still essential to map from business metrics to ML metrics and vice versa. You still need to do systematic experimentation. With classical ML engineering, you experiment with different hyperparameters. With foundation models, you experiment with different models, prompts, retrieval algorithms, sampling variables, and more. (Sampling variables are discussed in Chapter 2.) We still want to make models run faster and cheaper. It's still important to set up a feedback loop so that we can iteratively improve our applications with production data.",[61,245,246],{},"This means that much of what ML engineers have learned and shared over the last decade is still applicable. This collective experience makes it easier for everyone to begin building AI applications. However, built on top of these enduring principles are many innovations unique to AI engineering, which we'll explore in this book.",[77,248,250],{"id":249},"ai-engineering-versus-ml-engineering","AI Engineering Versus ML Engineering",[61,252,253],{},"While the unchanging principles of deploying AI applications are reassuring, it's also important to understand how things have changed. This is helpful for teams that want to adapt their existing platforms for new AI use cases and developers who are interested in which skills to learn to stay competitive in a new market.",[61,255,256,257,145],{},"At a high level, building applications using foundation models today differs from traditional ML engineering in ",[65,258,259],{},"three major ways",[261,262,263,268,278,282,302,306],"steps",{},[264,265,267],"h3",{"id":266},"pre-trained-models-replace-training-your-own","Pre-trained models replace training your own",[61,269,270,271,274,275,93],{},"Without foundation models, you have to ",[65,272,273],{},"train your own models"," for your applications. With AI engineering, you use a model someone else has trained for you. This means that AI engineering focuses ",[65,276,277],{},"less on modeling and training, and more on model adaptation",[264,279,281],{"id":280},"bigger-models-more-compute-pressure","Bigger models, more compute pressure",[61,283,284,285,288,289,93,292],{},"AI engineering works with models that are ",[65,286,287],{},"bigger, consume more compute resources, and incur higher latency"," than traditional ML engineering. This means more pressure for efficient training and inference optimization. A corollary is that many companies now need more GPUs and work with bigger compute clusters than they previously did, creating more need for engineers who know how to work with ",[65,290,291],{},"GPUs and big clusters",[293,294,295],"sup",{},[89,296,301],{"href":297,"ariaDescribedBy":298,"dataFootnoteRef":57,"id":300},"#user-content-fn-23",[299],"footnote-label","user-content-fnref-23","1",[264,303,305],{"id":304},"open-ended-outputs-make-evaluation-harder","Open-ended outputs make evaluation harder",[61,307,308,309,312,313,316],{},"AI engineering works with models that can produce ",[65,310,311],{},"open-ended outputs",". Open-ended outputs give models the flexibility to be used for more tasks, but they are also harder to evaluate. This makes ",[65,314,315],{},"evaluation a much bigger problem"," in AI engineering.",[61,318,319,320,323,324,328],{},"In short, AI engineering differs from ML engineering in that it's ",[65,321,322],{},"less about model development and more about adapting and evaluating models",". Before we move on, let's clarify what ",[325,326,327],"em",{},"model adaptation"," means. In general, model adaptation techniques can be divided into two categories, depending on whether they require updating model weights.",[147,330,331,358],{},[150,332,335,344,351],{"icon":333,"title":334},"i-lucide-message-square","Prompt-Based Techniques",[61,336,337,343],{},[325,338,339,340,93],{},"Adapt a model ",[65,341,342],{},"without updating model weights"," You adapt a model by giving it instructions and context instead of changing the model itself.",[61,345,346,347,350],{},"Prompt engineering is ",[65,348,349],{},"easier to get started"," and requires less data. Many successful applications have been built with just prompt engineering. Its ease of use lets you experiment with more models, increasing your chance of finding one that's unexpectedly good for your application.",[61,352,353,354,357],{},"However, it might ",[65,355,356],{},"not be enough"," for complex tasks or applications with strict performance requirements.",[150,359,362,370,377],{"icon":360,"title":361},"i-lucide-sliders-horizontal","Finetuning",[61,363,364,369],{},[325,365,339,366,93],{},[65,367,368],{},"by updating model weights"," You adapt a model by making changes to the model itself.",[61,371,372,373,376],{},"In general, finetuning techniques are ",[65,374,375],{},"more complicated and require more data",", but they can improve quality, latency, and cost significantly.",[61,378,379,380,93],{},"Many things aren't possible without changing model weights, such as adapting a model to a ",[65,381,382],{},"new task it wasn't exposed to during training",[61,384,385],{},"Now, let's zoom into the application development and model development layers to see how each has changed with AI engineering, starting with what existing ML engineers are more familiar with.",[264,387,166],{"id":388},"model-development",[61,390,391,394,395,398],{},[325,392,393],{},"Model development"," is the layer most commonly associated with traditional ML engineering. It has three main responsibilities: ",[65,396,397],{},"modeling and training, dataset engineering, and inference optimization",". Evaluation is also required, but because most people will come across it first in the application development layer, I'll discuss evaluation in the next section.",[400,401,403],"h4",{"id":402},"modeling-and-training","Modeling and training",[61,405,406,408,409,93],{},[325,407,403],{}," refers to the process of coming up with a model architecture, training it, and finetuning it. Examples of tools in this category are ",[65,410,411],{},"Google's TensorFlow, Hugging Face's Transformers, and Meta's PyTorch",[61,413,414,415,418,419,422],{},"Developing ML models requires specialized ML knowledge. It requires knowing different types of ML algorithms (such as ",[65,416,417],{},"clustering, logistic regression, decision trees, and collaborative filtering",") and neural network architectures (such as ",[65,420,421],{},"feedforward, recurrent, convolutional, and transformer","). It also requires understanding how a model learns, including concepts such as gradient descent, loss function, regularization, etc.",[219,424,425,426,429],{},"With the availability of foundation models, ",[65,427,428],{},"ML knowledge is no longer a must-have"," for building AI applications. I've met many wonderful and successful AI application builders who aren't at all interested in learning about gradient descent. However, ML knowledge is still extremely valuable, as it expands the set of tools that you can use and helps troubleshooting when a model doesn't work as expected.",[122,431,432,437,448],{},[61,433,434],{},[65,435,436],{},"On the Differences Among Training, Pre-Training, Finetuning, and Post-Training",[61,438,439,440,443,444,447],{},"Training always involves changing model weights, but ",[65,441,442],{},"not all changes to model weights constitute training",". For example, ",[325,445,446],{},"quantization"," — reducing the precision of model weights — technically changes the model's weight values but isn't considered training.",[61,449,450,451,454],{},"The term ",[325,452,453],{},"training"," can often be used in place of pre-training, finetuning, and post-training, which refer to different training phases.",[456,457,458,498,509,556],"accordion",{},[459,460,463,473,487],"accordion-item",{"icon":461,"label":462},"i-lucide-sparkle","Pre-training",[61,464,465,468,469,472],{},[325,466,467],{},"Pre-training refers to"," training a model ",[65,470,471],{},"from scratch"," — the model weights are randomly initialized. For LLMs, pre-training often involves training a model for text completion.",[61,474,475,476,479,480,486],{},"Out of all training steps, pre-training is often the ",[65,477,478],{},"most resource-intensive"," by a long shot. For the InstructGPT model, pre-training takes up to ",[89,481,485],{"href":482,"rel":483},"https:\u002F\u002Fopenai.com\u002Findex\u002Finstruction-following\u002F",[484],"nofollow","98% of the overall compute and data resources",". Pre-training also takes a long time. A small mistake during pre-training can incur a significant financial loss and set back the project significantly.",[61,488,489,490],{},"Due to the resource-intensive nature of pre-training, this has become an art that only a few practice. Those with expertise in pre-training large models, however, are heavily sought after.",[293,491,492],{},[89,493,497],{"href":494,"ariaDescribedBy":495,"dataFootnoteRef":57,"id":496},"#user-content-fn-24",[299],"user-content-fnref-24","2",[459,499,500,501,504,505,508],{"icon":360,"label":361},"Finetuning means ",[65,502,503],{},"continuing to train a previously trained model"," — the model weights are obtained from the previous training process. Because the model already has certain knowledge from pre-training, finetuning typically requires ",[65,506,507],{},"fewer resources"," (e.g., data and compute) than pre-training.",[459,510,513,520,544],{"icon":511,"label":512},"i-lucide-arrow-right-circle","Post-training",[61,514,515,516,519],{},"Many people use ",[325,517,518],{},"post-training"," to refer to the process of training a model after the pre-training phase. Conceptually, post-training and finetuning are the same and can be used interchangeably. However, sometimes people use them differently to signify different goals.",[521,522,523,534],"ul",{},[524,525,526,527,529,530,533],"li",{},"It's usually ",[65,528,518],{}," when it's done by ",[65,531,532],{},"model developers",". For example, OpenAI might post-train a model to make it better at following instructions before releasing it.",[524,535,536,537,529,540,543],{},"It's ",[65,538,539],{},"finetuning",[65,541,542],{},"application developers",". For example, you might finetune an OpenAI model (which might have been post-trained itself) to adapt it to your needs.",[61,545,546,547,555],{},"Pre-training and post-training make up a spectrum.",[293,548,549],{},[89,550,554],{"href":551,"ariaDescribedBy":552,"dataFootnoteRef":57,"id":553},"#user-content-fn-25",[299],"user-content-fnref-25","3"," Their processes and toolings are very similar. Their differences are explored further in Chapters 2 and 7.",[459,557,560,579],{"icon":558,"label":559},"i-lucide-x-circle","What is NOT training",[61,561,562,563,565,566,569,570,578],{},"Some people use the term ",[325,564,453],{}," to refer to ",[65,567,568],{},"prompt engineering",", which isn't correct. I read a ",[89,571,574,577],{"href":572,"rel":573},"https:\u002F\u002Fwww.businessinsider.com\u002Fi-trained-ai-chatbot-on-my-journals-inner-child-2022-12",[484],[325,575,576],{},"Business Insider"," article"," where the author said she trained ChatGPT to mimic her younger self by feeding her childhood journal entries into ChatGPT.",[61,580,581,582,584,585,588,589,591,592,594],{},"Colloquially, the author's usage of the word ",[325,583,453],{}," is correct, as she's teaching the model to do something. But technically, if you teach a model what to do via the ",[65,586,587],{},"context input"," into the model, you're doing ",[65,590,568],{},". Similarly, I've seen people using the term ",[325,593,539],{}," when what they do is prompt engineering.",[400,596,598],{"id":597},"dataset-engineering","Dataset engineering",[61,600,601,603,604,607],{},[325,602,598],{}," refers to ",[65,605,606],{},"curating, generating, and annotating"," the data needed for training and adapting AI models.",[147,609,610,619],{},[150,611,614,615,618],{"icon":612,"title":613},"i-lucide-list-checks","Closed-Ended (Traditional ML)","Most use cases are ",[65,616,617],{},"close-ended"," — a model's output can only be among predefined values. For example, spam classification with only two possible outputs, \"spam\" and \"not spam\".",[150,620,623,624,627,628,631],{"icon":621,"title":622},"i-lucide-message-square-text","Open-Ended (Foundation Models)","Foundation models are ",[65,625,626],{},"open-ended",". Annotating open-ended queries is much harder than annotating close-ended queries — it's easier to determine whether an email is spam than to write an essay. So ",[65,629,630],{},"data annotation is a much bigger challenge"," for AI engineering.",[61,633,634,635,638,639,642,643,646],{},"Another difference is that ",[65,636,637],{},"traditional ML engineering works more with tabular data",", whereas foundation models work with ",[65,640,641],{},"unstructured data",". In AI engineering, data manipulation is more about ",[65,644,645],{},"deduplication, tokenization, context retrieval, and quality control",", including removing sensitive information and toxic data. Dataset engineering is the focus of Chapter 8.",[122,648,649,656,667],{},[61,650,651,652,655],{},"Many people argue that because models are now commodities, ",[65,653,654],{},"data will be the main differentiator",", making dataset engineering more important than ever. How much data you need depends on the adapter technique you use:",[61,657,658,661,662,661,664],{},[65,659,660],{},"Training from scratch"," > ",[65,663,361],{},[65,665,666],{},"Prompt engineering",[61,668,669],{},"Regardless of how much data you need, expertise in data is useful when examining a model, as its training data gives important clues about that model's strengths and weaknesses.",[400,671,673],{"id":672},"inference-optimization","Inference optimization",[61,675,676,678,679,682,683,93],{},[325,677,673],{}," means ",[65,680,681],{},"making models faster and cheaper",". Inference optimization has always been important for ML engineering. Users never say no to faster models, and companies can always benefit from cheaper inference. However, as foundation models scale up to incur even higher inference cost and latency, inference optimization has become ",[65,684,685],{},"even more important",[687,688,689,690,693,694,699,700,703],"warning",{},"One challenge with foundation models is that they are often ",[65,691,692],{},"autoregressive"," — tokens are generated sequentially. If it takes 10 ms for a model to generate a token, it'll take a second to generate an output of 100 tokens, and even more for longer outputs. As users are getting notoriously impatient, getting AI applications' latency down to the ",[89,695,698],{"href":696,"rel":697},"https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Fadrianbridgwater\u002F2020\u002F12\u002F03\u002Flocation-location-partition-how-to-beat-latency-as-cloud-grows\u002F",[484],"100 ms latency"," expected for a typical internet application is a ",[65,701,702],{},"huge challenge",". Inference optimization has become an active subfield in both industry and academia.",[61,705,706],{},"A summary of how the importance of different categories of model development change with AI engineering is shown in Table 1-4.",[101,708,709],{},[61,710,711],{},"Table 1-4. How different responsibilities of model development have changed with foundation models.",[713,714,715,731],"table",{},[716,717,718],"thead",{},[719,720,721,725,728],"tr",{},[722,723,724],"th",{},"Category",[722,726,727],{},"Building with traditional ML",[722,729,730],{},"Building with foundation models",[732,733,734,753,763],"tbody",{},[719,735,736,739,742],{},[737,738,403],"td",{},[737,740,741],{},"ML knowledge is required for training a model from scratch",[737,743,744,745],{},"ML knowledge is a nice-to-have, not a must-have",[293,746,747],{},[89,748,752],{"href":749,"ariaDescribedBy":750,"dataFootnoteRef":57,"id":751},"#user-content-fn-a",[299],"user-content-fnref-a","4",[719,754,755,757,760],{},[737,756,598],{},[737,758,759],{},"More about feature engineering, especially with tabular data",[737,761,762],{},"Less about feature engineering and more about data deduplication, tokenization, context retrieval, and quality control",[719,764,765,767,770],{},[737,766,673],{},[737,768,769],{},"Important",[737,771,772],{},"Even more important",[61,774,775,776,779],{},"Inference optimization techniques, including ",[65,777,778],{},"quantization, distillation, and parallelism",", are discussed in Chapters 7 through 9.",[264,781,153],{"id":782},"application-development",[61,784,785,786,789,790,93],{},"With traditional ML engineering, where teams build applications using their proprietary models, the ",[65,787,788],{},"model quality is a differentiation",". With foundation models, where many teams use the same model, ",[65,791,792],{},"differentiation must be gained through the application development process",[61,794,795,796,93],{},"The application development layer consists of three responsibilities: ",[65,797,798],{},"evaluation, prompt engineering, and AI interface",[400,800,802],{"id":801},"evaluation","Evaluation",[61,804,805,807,808,811],{},[325,806,802],{}," is about ",[65,809,810],{},"mitigating risks and uncovering opportunities",". Evaluation is necessary throughout the whole model adaptation process — to select models, to benchmark progress, to determine whether an application is ready for deployment, and to detect issues and opportunities for improvement in production.",[61,813,814,815,93],{},"While evaluation has always been important in ML engineering, it's even more important with foundation models. The challenges of evaluating foundation models are discussed in Chapter 3. To summarize, these challenges chiefly arise from foundation models' ",[65,816,817],{},"open-ended nature and expanded capabilities",[147,819,820,829],{},[150,821,824,825,828],{"icon":822,"title":823},"i-lucide-target","Closed-Ended Tasks","In tasks like ",[65,826,827],{},"fraud detection",", there are usually expected ground truths to compare your model's outputs against. If a model's output differs from the expected output, you know the model is wrong.",[150,830,833,834,837,838,841],{"icon":831,"title":832},"i-lucide-message-circle-question","Open-Ended Tasks","For a task like ",[65,835,836],{},"chatbots",", there are so many possible responses to each prompt that it is ",[65,839,840],{},"impossible to curate an exhaustive list of ground truths"," to compare a model's response to.",[61,843,844,845,848,849,854,855,860,861,864],{},"The existence of so many adaptation techniques also makes evaluation harder. A system that performs poorly with one technique might perform much better with another. When ",[65,846,847],{},"Google launched Gemini in December 2023",", they claimed that Gemini is better than ChatGPT in the MMLU benchmark (",[89,850,853],{"href":851,"rel":852},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300",[484],"Hendrycks et al., 2020","). Google had evaluated Gemini using a prompt engineering technique called ",[89,856,859],{"href":857,"rel":858},"https:\u002F\u002Fstorage.googleapis.com\u002Fdeepmind-media\u002Fgemini\u002Fgemini_1_report.pdf",[484],"CoT@32",". In this technique, Gemini was shown 32 examples, while ChatGPT was shown only 5 examples. ",[65,862,863],{},"When both were shown five examples, ChatGPT performed better",", as shown in Table 1-5.",[101,866,867],{},[61,868,869],{},"Table 1-5. Different prompts can cause models to perform very differently, as seen in Gemini's technical report (December 2023).",[713,871,872,905],{},[716,873,874],{},[719,875,876,878,881,884,887,890,893,896,899,902],{},[722,877],{},[722,879,880],{},"Gemini ultra",[722,882,883],{},"Gemini Pro",[722,885,886],{},"GPT-4",[722,888,889],{},"GPT-3.5",[722,891,892],{},"PaLM 2-L",[722,894,895],{},"Claude 2",[722,897,898],{},"Inflection-2",[722,900,901],{},"Grok 1",[722,903,904],{},"Llama-2",[732,906,907,962],{},[719,908,909,912,920,926,932,938,943,949,954,959],{},[737,910,911],{},"MMLU performance",[737,913,914,915,859,918],{},"90.04%",[916,917],"br",{},[916,919],{},[737,921,922,923,925],{},"79.13%",[916,924],{},"CoT@8",[737,927,928,929,931],{},"87.29%",[916,930],{},"CoT@32 (via API)",[737,933,934,935,937],{},"70%",[916,936],{},"5-shot",[737,939,940,941,937],{},"78.4%",[916,942],{},[737,944,945,946,948],{},"78.5%",[916,947],{},"5-shot CoT",[737,950,951,952,937],{},"79.6%",[916,953],{},[737,955,956,957,937],{},"73.0%",[916,958],{},[737,960,961],{},"68.0%",[719,963,964,966,971,976,982,984,986,988,990,992],{},[737,965],{},[737,967,968,969,937],{},"83.7%",[916,970],{},[737,972,973,974,937],{},"71.8%",[916,975],{},[737,977,978,979,981],{},"86.4%",[916,980],{},"5-shot (reported)",[737,983],{},[737,985],{},[737,987],{},[737,989],{},[737,991],{},[737,993],{},[400,995,997],{"id":996},"prompt-engineering-and-context-construction","Prompt engineering and context construction",[61,999,1000,807,1002,1005,1006,93],{},[325,1001,666],{},[65,1003,1004],{},"getting AI models to express the desirable behaviors from the input alone, without changing the model weights",". The Gemini evaluation story highlights the impact of prompt engineering on model performance. By using a different prompt engineering technique, ",[65,1007,1008],{},"Gemini Ultra's performance on MMLU went from 83.7% to 90.04%",[219,1010,1011,1026],{},[61,1012,1013,1014,1017,1018,1021,1022,1025],{},"It's possible to get a model to do amazing things with just prompts. The right instructions can get a model to perform the task you want, in the format of your choice. Prompt engineering is ",[65,1015,1016],{},"not just about telling a model what to do",". It's also about giving the model the necessary ",[65,1019,1020],{},"context and tools"," to do a given task. For complex tasks with long context, you might also need to provide the model with a ",[65,1023,1024],{},"memory management system"," so the model can keep track of its history.",[61,1027,1028],{},"Chapter 5 discusses prompt engineering, and Chapter 6 discusses context construction.",[400,1030,1032],{"id":1031},"ai-interface","AI interface",[61,1034,1035,678,1037,1040,1041,1043,1044,1047],{},[325,1036,1032],{},[65,1038,1039],{},"creating an interface for end users to interact with your AI applications",". Before foundation models, only organizations with sufficient resources to develop AI models could develop AI applications. These applications were often embedded into the organizations' existing products. For example, ",[65,1042,827],{}," was embedded into Stripe, Venmo, and PayPal. ",[65,1045,1046],{},"Recommender systems"," were part of social networks and media apps like Netflix, TikTok, and Spotify.",[61,1049,1050,1051,1054,1055,1058,1059,1062,1063,1066,1067,1070],{},"With foundation models, ",[65,1052,1053],{},"anyone can build AI applications",". You can serve your AI applications as standalone products or embed them into other products, including products developed by other people. For example, ",[65,1056,1057],{},"ChatGPT and Perplexity"," are standalone products, whereas ",[65,1060,1061],{},"GitHub Copilot"," is commonly used as a plug-in in VSCode, ",[65,1064,1065],{},"Grammarly"," as a browser extension for Google Docs, and ",[65,1068,1069],{},"Midjourney"," can be used via its standalone web app or via its integration in Discord.",[61,1072,1073],{},"Here are some of the interfaces that are gaining popularity for AI applications:",[147,1075,1076,1089,1098,1105],{},[150,1077,1080,1081],{"icon":1078,"title":1079},"i-lucide-monitor","Standalone Apps","Web, desktop, and mobile apps.",[293,1082,1083],{},[89,1084,1088],{"href":1085,"ariaDescribedBy":1086,"dataFootnoteRef":57,"id":1087},"#user-content-fn-26",[299],"user-content-fnref-26","5",[150,1090,1093,1094,1097],{"icon":1091,"title":1092},"i-lucide-puzzle","Browser Extensions","Let users ",[65,1095,1096],{},"quickly query AI models"," while browsing.",[150,1099,1101,1102,93],{"icon":333,"title":1100},"Chat App Integrations","Chatbots integrated into chat apps like ",[65,1103,1104],{},"Slack, Discord, WeChat, and WhatsApp",[150,1106,1109,1110,1113],{"icon":1107,"title":1108},"i-lucide-plug","Plug-ins & APIs","Many products — ",[65,1111,1112],{},"VSCode, Shopify, Microsoft 365"," — provide APIs to integrate AI as plug-ins and add-ons. These APIs can also be used by AI agents to interact with the world (Chapter 6).",[61,1115,1116,1117,1120,1121,1124],{},"While the chat interface is the most commonly used, AI interfaces can also be ",[65,1118,1119],{},"voice-based"," (such as with voice assistants) or ",[65,1122,1123],{},"embodied"," (such as in augmented and virtual reality).",[122,1126,1127,1128,1131,1132,1135],{},"These new AI interfaces also mean ",[65,1129,1130],{},"new ways to collect and extract user feedback",". The conversation interface makes it so much easier for users to give feedback in natural language, but this feedback is ",[65,1133,1134],{},"harder to extract",". User feedback design is discussed in Chapter 10.",[61,1137,1138],{},"A summary of how the importance of different categories of app development changes with AI engineering is shown in Table 1-6.",[101,1140,1141],{},[61,1142,1143],{},"Table 1-6. The importance of different categories in app development for AI engineering and ML engineering.",[713,1145,1146,1156],{},[716,1147,1148],{},[719,1149,1150,1152,1154],{},[722,1151,724],{},[722,1153,727],{},[722,1155,730],{},[732,1157,1158,1167,1176],{},[719,1159,1160,1162,1165],{},[737,1161,1032],{},[737,1163,1164],{},"Less important",[737,1166,769],{},[719,1168,1169,1171,1174],{},[737,1170,666],{},[737,1172,1173],{},"Not applicable",[737,1175,769],{},[719,1177,1178,1180,1182],{},[737,1179,802],{},[737,1181,769],{},[737,1183,1184],{},"More important",[77,1186,1188],{"id":1187},"ai-engineering-versus-full-stack-engineering","AI Engineering Versus Full-Stack Engineering",[61,1190,1191,1192,93,1195,1203],{},"The increased emphasis on application development, especially on interfaces, brings AI engineering closer to ",[65,1193,1194],{},"full-stack development",[293,1196,1197],{},[89,1198,1202],{"href":1199,"ariaDescribedBy":1200,"dataFootnoteRef":57,"id":1201},"#user-content-fn-27",[299],"user-content-fnref-27","6"," The rising importance of interfaces leads to a shift in the design of AI toolings to attract more frontend engineers.",[147,1205,1206,1215],{},[150,1207,1210,1211,1214],{"icon":1208,"title":1209},"i-lucide-code","Then — Python-centric ML","Traditionally, ML engineering is ",[65,1212,1213],{},"Python-centric",". Before foundation models, the most popular ML frameworks supported mostly Python APIs.",[150,1216,1219,1220,1223,1224,1229,1230,1229,1235,1240,1241,93],{"icon":1217,"title":1218},"i-lucide-globe","Now — JavaScript Joins","Today, Python is still popular, but there is increasing support for ",[65,1221,1222],{},"JavaScript APIs",": ",[89,1225,1228],{"href":1226,"rel":1227},"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchainjs",[484],"LangChain.js",", ",[89,1231,1234],{"href":1232,"rel":1233},"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.js",[484],"Transformers.js",[89,1236,1239],{"href":1237,"rel":1238},"https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-node",[484],"OpenAI's Node library",", and ",[89,1242,1245],{"href":1243,"rel":1244},"https:\u002F\u002Fgithub.com\u002Fvercel\u002Fai",[484],"Vercel's AI SDK",[61,1247,1248,1249,1252,1253,93],{},"While many AI engineers come from traditional ML backgrounds, more are increasingly coming from ",[65,1250,1251],{},"web development or full-stack backgrounds",". An advantage that full-stack engineers have over traditional ML engineers is their ability to ",[65,1254,1255],{},"quickly turn ideas into demos, get feedback, and iterate",[219,1257,1258,1259,1262,1263,1266],{},"With traditional ML engineering, you usually start with ",[65,1260,1261],{},"gathering data and training a model",". Building the product comes last. However, with AI models readily available today, it's possible to ",[65,1264,1265],{},"start with building the product first",", and only invest in data and models once the product shows promise, as visualized in Figure 1-16.",[61,1268,1269],{},[97,1270],{"alt":1271,"src":1272},"Figure 1-16. The new AI engineering workflow rewards those who can iterate fast. Image recreated from \"The Rise of the AI Engineer\"",".\u002Fmedia\u002Ffig-1-16.png",[101,1274,1275],{},[61,1276,1277,1278,1283],{},"Figure 1-16. The new AI engineering workflow rewards those who can iterate fast. Image recreated from \"The Rise of the AI Engineer\" (",[89,1279,1282],{"href":1280,"rel":1281},"https:\u002F\u002Fwww.latent.space\u002Fp\u002Fai-engineer?hide_intro_popup=true",[484],"Shawn Wang, 2023",").",[61,1285,1286,1287,1290,1291,93],{},"In traditional ML engineering, model development and product development are often ",[65,1288,1289],{},"disjointed processes",", with ML engineers rarely involved in product decisions at many organizations. However, with foundation models, ",[65,1292,1293],{},"AI engineers tend to be much more involved in building the product",[1295,1296,1299,1304],"section",{"className":1297,"dataFootnotes":57},[1298],"footnotes",[77,1300,1303],{"className":1301,"id":299},[1302],"sr-only","Footnotes",[1305,1306,1307,1318,1332,1341,1350,1359],"ol",{},[524,1308,1310,1311],{"id":1309},"user-content-fn-23","As the head of AI at a Fortune 500 company told me: his team knows how to work with 10 GPUs, but they don't know how to work with 1,000 GPUs. ",[89,1312,1317],{"href":1313,"ariaLabel":1314,"className":1315,"dataFootnoteBackref":57},"#user-content-fnref-23","Back to reference 1",[1316],"data-footnote-backref","↩",[524,1319,1321,1322,202,1327],{"id":1320},"user-content-fn-24","And they are offered ",[89,1323,1326],{"href":1324,"rel":1325},"https:\u002F\u002Fwww.nytimes.com\u002F2018\u002F04\u002F19\u002Ftechnology\u002Fartificial-intelligence-salaries-openai.html",[484],"incredible compensation packages",[89,1328,1317],{"href":1329,"ariaLabel":1330,"className":1331,"dataFootnoteBackref":57},"#user-content-fnref-24","Back to reference 2",[1316],[524,1333,1335,1336],{"id":1334},"user-content-fn-25","If you find the terms \"pre-training\" and \"post-training\" lacking in imagination, you're not alone. The AI research community is great at many things, but naming isn't one of them. We already talked about how \"large language models\" is hardly a scientific term because of the ambiguity of the word \"large\". And I really wish people would stop publishing papers with the title \"X is all you need.\" ",[89,1337,1317],{"href":1338,"ariaLabel":1339,"className":1340,"dataFootnoteBackref":57},"#user-content-fnref-25","Back to reference 3",[1316],[524,1342,1344,1345],{"id":1343},"user-content-fn-a","Many people would dispute this claim, saying that ML knowledge is a must-have. ",[89,1346,1317],{"href":1347,"ariaLabel":1348,"className":1349,"dataFootnoteBackref":57},"#user-content-fnref-a","Back to reference 4",[1316],[524,1351,1353,1354],{"id":1352},"user-content-fn-26","Streamlit, Gradio, and Plotly Dash are common tools for building AI web apps. ",[89,1355,1317],{"href":1356,"ariaLabel":1357,"className":1358,"dataFootnoteBackref":57},"#user-content-fnref-26","Back to reference 5",[1316],[524,1360,1362,1363],{"id":1361},"user-content-fn-27","Anton Bacaj told me that \"AI engineering is just software engineering with AI models thrown in the stack.\" ",[89,1364,1317],{"href":1365,"ariaLabel":1366,"className":1367,"dataFootnoteBackref":57},"#user-content-fnref-27","Back to reference 6",[1316],{"title":57,"searchDepth":1369,"depth":1369,"links":1370},2,[1371,1372,1373,1381,1382],{"id":79,"depth":1369,"text":80},{"id":137,"depth":1369,"text":138},{"id":249,"depth":1369,"text":250,"children":1374},[1375,1377,1378,1379,1380],{"id":266,"depth":1376,"text":267},3,{"id":280,"depth":1376,"text":281},{"id":304,"depth":1376,"text":305},{"id":388,"depth":1376,"text":166},{"id":782,"depth":1376,"text":153},{"id":1187,"depth":1369,"text":1188},{"id":299,"depth":1369,"text":1303},"The three layers of the AI stack, how AI engineering differs from ML engineering and full-stack development, and how foundation models reshape model and application development.","md",{},{"icon":39},{"title":36,"description":1383},"HU7hMFwTNodatw5Q_L27moKngP-Yee81UMkVjN2Jpac",[1390,1392],{"title":31,"path":32,"stem":33,"description":1391,"icon":34,"children":-1},"How to evaluate use cases, build vs buy, set success metrics, plan milestones, and maintain AI products in a fast-moving landscape.",{"title":41,"path":42,"stem":43,"description":1393,"icon":44,"children":-1},"A recap of how foundation models gave rise to AI engineering, the application patterns enabled, and the framework this book provides.",1778484800915]