AI & AgentsLearning

20 AI Terms I Finally Understood as a Frontend Developer

I build UIs for a living, but AI features keep landing on my plate. Here are the 20 concepts I sat down and actually learned, explained the way I wish someone had explained them to me.

Jul 2, 202610 min read

I am a frontend developer. My comfort zone is components, state, layout, and shaving milliseconds off a render. But over the last year the work has shifted. Half the features I ship now talk to a model somewhere, and the conversations with backend and ML folks were full of terms I would nod along to without truly getting.

So I sat down and learned them properly, starting from a video that walks through the 20 terms every engineer building AI apps keeps running into. These are my notes, rewritten from a frontend point of view. Not the ML researcher version, the "I need to build the interface for this and not sound lost in standup" version.

The foundation: how a model reads and thinks

Before the individual terms, here is the whole pipeline on one screen. Almost every concept below is just one box in this flow:

  "all that glitters"          the input text
           |
          v
  [ TOKENIZATION ]             break text into meaningful chunks
           |
          v
  [ all ][ that ][ glitter ][ ers ]      tokens
           |
          v
  [ VECTORS ]                  map each token to a coordinate of meaning
          |
         v
  [ ATTENTION ]                let each token read its neighbors for context
          |
         v
  [ FEEDFORWARD + REPEAT ]     stack many layers (the transformer)
           |
          v
  predict next token -> "is"   one piece at a time, then loop

1. Large Language Model (LLM)

Strip away the hype and an LLM is a neural network trained to predict the next token in a sequence. You feed it "all that glitters" and it predicts "is not gold" one piece at a time. That is the whole trick. Everything fancy is built on top of next-token prediction.

For me the mental unlock was this: the model is not looking up answers, it is guessing the most probable continuation. That single fact explains most of the weird behavior you have to design around in the UI.

2. Tokenization

Before the model does anything, it breaks your input into tokens. Not by spaces, but by meaningful chunks. "Glitters" might split into "glitter" and "ers", because that "ers" suffix carries meaning the model can reuse across "shimmers", "murmurs", and "flickers".

Why a frontend dev cares: tokens are the unit you get billed on and the unit context windows are measured in. When you are deciding how much chat history to send, you are really deciding how many tokens to spend.

3. Vectors

Each token gets mapped to a coordinate in a high dimensional space. Words with similar meaning sit close together, opposite meanings sit far apart. That coordinate is a vector, and the process is vectorization.

This is the part that made embeddings click for me. "Meaning" becomes math. Similarity becomes distance. That is the foundation for search features that understand intent instead of matching exact keywords.

Picture the space, simplified down to two dimensions. Related words cluster, unrelated ones drift apart:

        ^  (meaning axis)
        |
  king  * --- * queen
        |
  man   * --- * woman
        |
        |                 * banana
        |            * apple   * guava
        |                 * mango
        +----------------------------------> (meaning axis)

  distance = similarity.  "apple" sits near fruit,
  far from "king". A search for "mango" lands in the
  right neighborhood even without an exact word match.

4. Attention

The breakthrough from the 2017 transformer paper. A word like "apple" is ambiguous on its own. "Tasty apple" is a fruit, "Apple revenue" is a company, "apple of my eye" is a person you love. Attention lets the model look at nearby words and nudge the vector for "apple" toward the right meaning based on context.

This is why models feel like they understand you. They are reading the neighborhood around each word, not each word in isolation.

  ambiguous "apple"
                o
           /         \
         /             \
      /                  \  attention reads the nearby word...
   /                        \
"tasty"          "revenue"
    |                          |
    v                         v
  o  apple        apple  o
  near fruit     near company
  (banana,     (Google,
   mango)       Microsoft)

  same word, pushed toward a different meaning by context.

5. Self-supervised learning

The reason models could be trained on the whole internet cheaply. Instead of humans labeling every example, you hide part of some existing text and make the model predict the hidden part. The answer is already in the data, so no human labeling is needed.

That scalability is the quiet reason this field exploded. Training data went from expensive and scarce to essentially free.

6. Transformer

People, including me at first, use "transformer" and "LLM" interchangeably. They are not the same. An LLM is the product that predicts the next token. A transformer is one specific architecture for doing that: input tokens go through an attention block, then a feedforward network, and that stack repeats across many layers. Early layers disambiguate words, later layers pick up sarcasm, implication, and complex relationships.

Think of the LLM as the car and the transformer as the engine. You could swap in a different engine, like a diffusion or state space model, and still have a car.

  input tokens
          |
          v
  +---------------------------+
  |  ATTENTION                               |   layer 1: disambiguate words
  |  FEEDFORWARD NETWORK      |   ("crane" the bird vs the machine)
  +---------------------------+
                 |
                v
  +----------------------+
  |  ATTENTION           |   layer 2: subtler relationships
  |  FEEDFORWARD NETWORK |   (sarcasm, implication, tone)
  +----------------------+
         |
        v
        ...  (stacked 12 to 100+ times)
        |
       v
  predict the next token

  LLM = the product (predict next token)
  Transformer = one engine that does it

Making a base model actually useful

7. Fine-tuning

A base model predicts generic next tokens. Fine-tuning takes that base and runs it through curated question-and-answer pairs so it learns to respond a specific way. A medical model learns to answer in medical terms, a support bot learns your company's tone. The same base model can spawn many fine-tuned versions.

8. Few-shot prompting

Before sending a query, you add a few examples of the kind of answer you want, right there in the prompt. It is example-driven prompting, and it happens at request time. As a frontend dev, this is often something I control directly in the payload I assemble.

9. Retrieval-Augmented Generation (RAG)

Instead of hoping the model already knows your company policies, you fetch the most relevant documents in real time and send them along with the query. The model uses that context to answer. This is how you get accurate, company-specific responses without retraining anything.

The video makes a fair point that the field moves so fast some people already call RAG old news, but understanding it is still essential because so much production tooling is built on it.

  user query                          "what is your refund policy?"
      |
      v
  +---------+     1. embed query, similarity search
  | SERVER    | -----------------------------> [ VECTOR DB ]
  +---------+                                      |
      ^   |                                        | 2. relevant docs
      |    |  <-------------------------------------+
      |    |
      |    |  3. query + examples + fetched docs
      |   v
      |  [ LLM ]  4. grounded answer
      |        |
      +---+  5. response back to the user

10. Vector database

The engine behind RAG's retrieval step. You store documents as vectors, then when a query like "I am upset and want a refund" comes in, you do a similarity search. The word "upset" might not appear in any policy doc, but its vector sits close to documents about "low ratings" or "drop offs", so you find them anyway. That semantic matching is the whole point.

11. Model Context Protocol (MCP)

RAG handles context that lives inside your system. MCP is a standard way to pull in context and actions from outside it. The model can decide it needs external data, and an MCP client connects to external MCP servers, say Indigo's or Air India's, to fetch live flight details and even book a flight. The result comes back through the client to the user.

This is the piece that turns a chatbot into something that gets things done, not just describes how to.

12. Context engineering

The umbrella term for everything above: few-shot examples, RAG documents, MCP tool calls, all assembled into the context you hand the model. The two hard new problems here are managing user preferences and summarizing context so you do not blow past the window. A common trick is a sliding window: send the last hundred messages verbatim and a five-sentence summary of everything older.

The distinction that stuck with me: prompt engineering is stateless, one prompt at a time. Context engineering is long-lived and evolves with the user's history and preferences.

Where the field is heading

13. Agents

A long-running process that can query an LLM, hit external systems, and coordinate with other agents to meet a goal. A travel agent could watch for cheap flights and book them for you when the moment is right, based on your preferences. This is the most long-term, ambitious direction right now.

14. Reinforcement learning (RLHF)

How you train a model to behave the way people actually want. You have seen it in ChatGPT when it shows two responses and asks which is better. The chosen one gets a plus, the other a minus, and the model learns to favor paths that make users happy. The video's honest caveat: RL learns from outcomes, it does not build a true mental model. Show it a fair coin landing heads five times and it may bet heads, while a human knows it is still fifty-fifty.

15. Chain of thought

Train the model to break a problem into steps instead of blurting a final answer, and the quality jumps. Harder problems get more steps, easier ones get fewer.

16. Reasoning models

Models built to figure out how to solve a problem step by step, sometimes with chain of thought, sometimes with tree or graph of thought. DeepSeek and OpenAI's o1 and o3 are the examples. As a frontend dev, the practical impact is latency: reasoning takes time, so your UI needs to handle longer, streamed responses gracefully.

17. Multimodal models

Models that go beyond text to accept and generate images and video. They tend to understand the world more deeply than text-only models because they learn meaning across modes. For anyone building interfaces, this is a big deal: the inputs and outputs we design around are no longer just text boxes.

The efficiency layer

18. Small Language Models (SLM)

Fewer parameters, roughly 3 million to 300 million versus an LLM's 3 to 300 billion. They are trained on narrower, often company-specific data. A sales bot will not give you a weather report, but it does not need to. Smaller, cheaper, faster, and more private.

19. Distillation

How you build an SLM. A large "teacher" model and a small "student" model both try to predict the output. When the student matches the teacher, great. When it misses, its weights adjust. You are compressing a big network's knowledge into a small one that is cheaper to run and easier to host.

20. Quantization

The weights in a network are numbers, often 32 bits each. Quantization shrinks them to something like 8 bits, saving a large chunk of memory. It happens after training, so it does not cut training cost, but it cuts inference cost, which is what you pay every time the model responds in production.

What this changed for me

I did not learn these to become an ML engineer. I learned them so I can read an API's docs and know what "context window", "embeddings", or "tool calling" actually mean, so I can design UIs that respect streaming and latency, and so I can sit in a planning meeting and follow the plan instead of nodding along.

The best part is what the video ends on: once you understand these terms, the hype and nonsense in this space become easier to spot. You can tell the difference between a real capability and a marketing slide. For a frontend developer trying to build genuinely good AI features, that clarity is worth more than any single framework.

If you are in the same spot, learning AI from the interface side in, I hope these notes save you the confusion I went through.

Building something in this space?

I take on select builds when the work is worth doing right.

Start a conversation