I'm using Python notebooks to prototype my LLM prompts before implementing them in code.

When a traditional developer thinks about prototyping a code-only component, a few different approaches come to mind.

  1. Write a simple script to execute the new component in isolation, then print the results and inspect them manually.

  2. Improving on #1 above, use a unit test framework to validate outputs.

  3. Write a simple, basic application; nothing fancy, maybe a file or two; and then play with it to figure out if you like it.

Unfortunately, this doesn't work when you're prototyping an LLM-powered component that depends on prompts. AI is just too different from traditional software development.

Instead of these approaches, I've found that writing Python notebooks works extremely well, even if my final implementation is going to be in some other language.

How Python notebooks are helping me.

I like working on the experimental part using Python notebooks, in isolation from the main codebase. Python notebooks were designed with data scientists and researchers in mind, and they're excellent for prompt prototyping and experimentation.

It's hard to describe why. But it's easier. It comes from the fact that each cell can be triggered individually, which means that you're essentially creating a rich control interface that expands with every new code cell.

ChatGPT helps a lot, too! ChatGPT with GPT-4 is very good at writing Python code, especially in small chunks. This means that I'm able to generate code cells quickly. From interactive UI to beautiful graphs, using ChatGPT is way faster than coding it myself.

Once I finish experimenting with the prompt, I'll typically copy/paste it into my final codebase (whether that's Typescript or Java or still in Python).

The difference between traditional code and LLM prompts.

The main difference is in the difference in outputs.

Traditional software development is all about working with known quantities. The output of a traditional code-based component is more or less deterministic and structured, and requires design thinking at the level of interfaces in order to optimize code quality. Defining unit tests works really well in this case. It's very principles-based.

But when you're prototyping the core of your LLM app, the output is non-deterministic and much more qualitative. It requires iterative improvements of the prompt in order to optimize fitness of output. So it really is very different from iterating on system design! It's way more experimental! And so a prototyping approach is necessary.

What these notebooks are missing

I wish there was a "Postman for AI." That's what's missing. Some sort of app where I can define a whole data pipeline, and it can export that pipeline to a usable component in my language of choice.

They're also not great for code quality. Code quality becomes Phase 2, when I integrate the resulting prompt into my main codebase.

Overall though, the tradeoff is worth it.