Agentic checkout in ~100 lines of Python
A few months ago Stripe announced their upcoming Order Intents API at Stripe Sessions. There’s a lot to like about this API, and if Stripe can maintain their usual quality bar it’ll be a big unlock for a wide variety of different use cases. They didn’t show too much of the API surface during the demo, but the parts we have seen feel a lot like Two Tap’s old universal cart API which I personally really like. Passing product page URLs feels like the right abstraction over passing around opaque product identifiers.
Jeff Weinstein later went on TBPN and said that he’s “surprised we got self-driving cars before ubiquitous online commerce mediated by agents,” and I agree with that take. While we wait for Order Intents to become generally available, it’s never been easier to simply roll your own checkout agent. LLMs and the software ecosystem surrounding them have gotten so good that it only takes ~50 lines of code for a minimal implementation!
If you don’t have beta access to this new Stripe API but still want to experiment with agentic checkout, then this blog post will show you how.
If you aren’t interested in reading a lengthy blog post, the full code is available on GitHub.
The heart of our agent will be the excellent Browser Use library. It first dropped about 5 months ago, and it’s rapidly become a go-to tool for me both professionally and personally. In fact, the code we’re writing today is actually modeled after a cronjob I’m running on my homelab which periodically buys my preferred sunscreen.
Table of contents
Tutorial
Setup
You’ll first need a store to test against. While you could theoretically use a real online store, you’ll find that testing your agent becomes expensive rather quickly. You’re much better off signing up with the ecommerce platform of your choice and creating a development/test/sandbox store to play with. I prefer Shopify for this use case (instructions here), but anything is good.
Then we can scaffold a new Python project. I’m using uv
as my package/project manager; if you do not already have it installed, then you can find installation instructions here.
$ cd ~/Code$ mkdir checkout && cd checkout$ uv init$ uv pip install browser-use==0.3.2 python-dotenv$ touch .env checkout.py
I’m pinning the browser-use
version because this package updates fast. Since I first drafted this post a few months ago, substantial functionality (and breaking changes) have been introduced.
Throw an OpenAI API key into the .env
file, and opt out of Browser Use’s telemetry:
ANONYMIZED_TELEMETRY=falseOPENAI_API_KEY="sk-proj-..."
And finally add some scaffolding to agent.py
:
from dotenv import load_dotenv load_dotenv() import asyncio async def main(): ... asyncio.run(main())
We now have everything we require for agentic checkout.
Implementing a PoC
You can see the basic implementation below in all its glory. After tweaking the prompt to include your own details, you can run it with uv run agent.py
.
from browser_use import Agent, BrowserSessionfrom langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o") task = f"""Purchase the following products:* https://sophiabits-dev.myshopify.com/products/example-shirt The password for the store is "girtoh" Customer details:* Email: <your email>* First name: Sophia* Last name: Willows Shipping address:* Address: <your address> Payment details:* Number: 1* Expiry: 02/26* CVV: 123* Name on card: Sophia Willows Do not tick "remember me".""".strip() async def main(): session = BrowserSession( # keep the browser alive after the agent runs keep_alive=True, # improve anti-bot capabilities stealth=True, # don't reuse cookies between script runs storage_state={}, user_data_dir=None, ) await session.start() agent = Agent( browser_session=session, llm=llm, task=task, ) history = await agent.run() print(history) # close browser await session.kill() asyncio.run(main())
I am testing with a Shopify store, and Shopify development stores come with a “bogus” payment gateway which doesn’t accept real card details. “1” is a special card number which simulates a successful transaction when used. Likewise, Shopify development stores are locked behind a password—so I am giving the agent the password for my test store so it can gain access to the store.
When you run the agent a Chromium window will pop up which lets you visually see what the agent is doing, and browser-use
will log the internal state of the agent as it works through the task. It looks something like this:
INFO [agent] 📍 Step 1INFO [agent] 🤷 Eval: Unknown - Starting fresh with the input task.INFO [agent] 🧠 Memory: No previous steps remembered yet. 0 out of 1 products attempted for purchase.INFO [agent] 🎯 Next goal: Open the product URL in a new tab to begin the purchasing process.INFO [agent] 🛠️ Action 1/1: {"open_tab":{"url":"https://sophiabits-dev.myshopify.com/products/example-shirt"}}INFO [controller] 🔗 Opened new tab with https://sophiabits-dev.myshopify.com/products/example-shirtINFO [agent] 📍 Step 2INFO [agent] 👍 Eval: Success - The store URL has been accessed and is prompting for a password.INFO [agent] 🧠 Memory: The password prompt has been accessed for the store.0 out of 1 products attempted for purchase.INFO [agent] 🎯 Next goal: Enter the store password to access the site.INFO [agent] 🛠️ Action 1/2: {"input_text":{"index":0,"text":"***"}}INFO [agent] 🛠️ Action 2/2: {"click_element":{"index":1}}INFO [controller] ⌨️ Input girtoh into index 0INFO [controller] 🖱️ Clicked button with index 1: Enter...INFO [agent] 📄 Result: Successfully purchased 'Example T-Shirt'. Order confirmed with shipping and payment details provided. Confirmation displayed on thank you page.INFO [agent] ✅ Task completedINFO [agent] ✅ Successfully
That really is all there is to it—no tricks. Most of the engineers I’ve shown the above code snippet to have been immediately skeptical, but you can take this code as-is and it will work on the overwhelming majority of ecommerce sites. Getting to this point a year ago would have been a fiendishly difficult undertaking full of frustrating failure modes, but today the level of abstraction for building browser automations is extremely high. If you take nothing else away from this post, it should be that robotic process automation of the web is now a commodity.
This isn’t to say that the code we’ve written is perfect. There are many deficiencies with this agent, and the following sections will walk through resolving some of them. But if you did stop here then the code I’ve given you is a perfectly good solution for keeping you topped up on sunscreen.
Improving determinism
If you run the code a few times you’ll notice that the agent doesn’t behave very consistently. In particular, it randomly employs a variety of different strategies for adding products to the cart:
- It could navigate directly to each product URL.
- It could click the “Catalog” link in the top navigation bar, and then try to find the desired product(s) amongst all other products in the store!
- It could also try to search for each product via the search field at the top of the default Shopify theme.
The last two strategies are not particularly good. The LLM can easily get confused if the store contains multiple similarly-named products, and the second strategy can be costly if it’s necessary to paginate through a lot of products to find the particular ones we’re trying to purchase.
This is one of the big tradeoffs you make when using agents. You don’t have as much control over how the computer goes from A to B compared to a more traditional programmatic solution, or even compared to something simpler like prompt chaining. The advantage is that the agent is—hopefully—more resilient in the face of changing circumstances (e.g. checking out from a Shopify store vs a BigCommerce store), but it comes at the cost of determinism and steerability.
This particular problem is fairly easy to solve by iterating on our prompt:
task = f"""Purchase the following products:* https://sophiabits-dev.myshopify.com/products/example-shirt Add products to the cart by directly navigating to each product URL.<...>""".strip()
This is enough to consistently get the agent to follow strategy (1) when adding products to the cart, saving both time and cost.
Refining the prompt can take you pretty far. The current batch of LLMs are really good at following instructions and using tools. When building agents I think you should almost always try manipulating your prompt before trying other engineering techniques; especially early on, the effect size of prompt improvements can be extreme.
Of course, you do still need to know what to engineer around in the first place. I wouldn’t have assumed ahead of time that the agent would try searching for products because navigating directly to each product feels like the obvious way to solve the problem.
Running the agent multiple times is a good way to quickly get a feel for possible improvements, but to make systematic improvements you’ll want to make this happen programmatically sooner rather than later. Coming up with a suite of evals that you can benchmark against and use to analyze execution traces is helpful for ensuring that your prompt changes are actually improving things overall, and not causing regressions elsewhere.
Extracting structured data
Our agent can buy products for us, but we don’t yet have any structured information about the order. Our Python file has a print(history)
line which outputs all of the agent’s history which you could theoretically parse the order details from manually, but there is a much better strategy.
The Agent
class takes an optional Controller
instance, and a Controller
can be passed a Pydantic model which defines the expected output schema for the agent. It works similarly to structured outputs in OpenAI and Gemini models.
All we need to do is pass a Controller
instance and parse the final message in our agent’s history:
# <snip> from browser_use import Controllerfrom pydantic import BaseModel class OrderItem(BaseModel): title: str quantity: int price: float class Order(BaseModel): id: str items: list[OrderItem] currency_code: str shipping: float subtotal: float tax: float total: float async def main(): # <snip> controller = Controller(output_model=Order) agent = Agent( browser_session=session, controller=controller, llm=llm, task=task, ) history = await agent.run() result = history.final_result() parsed = Order.model_validate_json(result) print(parsed) # <snip> asyncio.run(main())
You can add doc comments to model fields and they’ll be used to help guide generation of the final result. It could be useful in this case to add a comment on OrderItem.price
to clarify that we want to receive the per-unit price rather than the total price of the line item, for instance.
Running the updated code yields the Order
object we’re after:
{ "id": "YJUDM6LM1", "items": [ { "title": "Example T-Shirt - Green / S", "quantity": 1, "price": 25.0 } ], "url": "https://sophiabits-dev.myshopify.com/checkouts/cn/Z2NwLXVzLWNlbnRyYWwxOjAxSlRTSjZIMDhUU0RZOE4zUTUwMFlCOVE2/thank-you", "currency_code": "NZD", "shipping": 30.0, "subtotal": 25.0, "tax": 0.0, "total": 55.0}
And looking at the post-checkout success page, you can see that the monetary values and order line items match what we have in the structured output:

LLMs are great at generating structured output these days! Getting good results used to involve a lot of hacking, but you really don’t need to work too hard here anymore. The main thing to work on optimizing here is the agent experience of your schemas; naming fields clearly and explaining how you want them filled inside doc comments can improve generation substantially.
Guardrails
You should not ship any LLM-based system without guardrails of some kind in place. Despite all their improvements LLMs are still
Despite improvements, LLMs have a very well-documented tendency to go off the rails and plenty of businesses have been humiliated by poorly-behaving AI. Agentic checkout in particular requires an elevated level of care because it involves money.
There are plenty of failure modes to think about here. If a malicious merchant were to drastically increase the price of the T-shirt I’ve been purchasing then the only thing stopping our agent from dutifully purchasing that T-shirt for $100K would be my credit card limit.
It’s possible to solve this with some more prompting (”don’t complete checkout if the price is greater than $X”), but it’s generally better to implement rules-based guardrails programmatically. Determinism and understandability is very important for something like this.
So we will break up our code. Rather than having one monolithic task responsible for adding all of the products to a cart and then checking out, we can split it up into three phases:
- Preparing a cart for checkout. The agent will add all of the products to a cart, open up the checkout page, load shipping methods, and then report back the total cost of the order.
- Running guardrails. If the order costs more than expected, we’ll bail out of checkout.
- Checkout. The agent will go ahead and complete the checkout process.
We’ll introduce another Pydantic model named Checkout
, and split out two functions from main
:
# <snip> class Checkout(BaseModel): currency_code: str """Total price of the checkout, including shipping, tax, etc.""" total_price: float class Request(BaseModel): product_urls: list[str] store_password: str customer_email: str customer_first_name: str customer_last_name: str shipping_address: str payment_number: str payment_cvv: str payment_expiry: str # step (1)async def prepare_checkout(session: BrowserSession, request: Request): product_urls_list = "\n".join([f"* {url}" for url in request.product_urls]) task = f"""Open the checkout page for an order containing the following products:{product_urls_list} Add products to the cart by directly navigating to each product URL. The password for the store is "{request.store_password}" Shipping address: {request.shipping_address} Enter the shipping address so that shipping methods are available. Then return the final details of the checkout page.""".strip() controller = Controller(output_model=Checkout) agent = Agent( browser_session=session, controller=controller, llm=llm, task=task, ) history = await agent.run() result = history.final_result() return Checkout.model_validate_json(result) # step (3)async def do_checkout(session: BrowserSession, request: Request): task = f"""Complete the checkout page and submit the order. Customer details:* Email: {request.customer_email}* First name: {request.customer_first_name}* Last name: {request.customer_last_name} Shipping address: {request.shipping_address} Payment details:* Number: {request.payment_number}* Expiry: {request.payment_expiry}* CVV: {request.payment_cvv}* Name on card: {request.customer_first_name} {request.customer_last_name} Do not tick "remember me". After checkout, return the details of the order.""".strip() controller = Controller(output_model=Order) agent = Agent( browser_session=session, controller=controller, llm=llm, task=task, ) history = await agent.run() result = history.final_result() return Order.model_validate_json(result) async def main(): session = BrowserSession( keep_alive=True, stealth=True, storage_state={}, user_data_dir=None, ) await session.start() checkout = await prepare_checkout( session, ["https://sophiabits-dev.myshopify.com/products/example-shirt"], ) # step (2) if checkout.currency_code != "NZD" or checkout.total_price > 60.0: raise Exception("Order is too expensive!") order = await do_checkout(session, request) print(order.model_dump_json(indent=2)) await session.kill()
Running this code with uv run agent.py
will result in the following:
- A browser window will open.
- The agent will negotiate the store’s password page, add all of our products to a cart, and then open up the checkout page. Once there the agent will key in the shipping address so that shipping methods load, and then return the total price of the order and the URL of the checkout page.
- Execution returns to our Python code, where we verify the order isn’t significantly more expensive than we’re willing to pay. My test store prices the T-shirt with shipping at $55, but I’m happy for my agent to flex up to $60 if the merchant makes changes.
- Using the same browser window, the agent completes the checkout process. Order details are returned back to our Python code, and we’re free to store those somewhere for later.
- The browser is closed.
Hiding sensitive data
We’re currently embedding raw credit card details and the buyer’s PII directly into the agent’s task, which means the LLM has full visibility of it.
This is problematic in two ways:
- Unless you have a zero data retention agreement with your LLM provider, then they’ll store this data for some period of time. A data breach could expose this sensitive information.
- Depending on what tools you have configured for the agent, it might be possible for an attacker to exfiltrate your sensitive data. A
send_email
tool, for example, would be vulnerable.
The industry’s largely settled on something similar to Simon Willison’s “Dual LLM” approach to mitigating this problem. The browser-use
implementation works like this:
- Instead of including the data directly in the LLM’s message history, the LLM is instead presented with an opaque string like
x_credit_card_number
. - If the sensitive value is present in the current page’s HTML, it gets masked back to
x_credit_card_number
before the HTML is shown to the LLM. - When the LLM makes a tool call that requires the sensitive data, the LLM will make the tool call using the opaque secret identifier.
browser-use
then transparently translates that call into a real function call outside of the LLM using the actual secret value.
This doesn’t protect you from the secret value entering the LLM’s context window through vision, so it’s necessary to disable vision for maximum security.
# <snip> async def do_checkout(session: BrowserSession, request: Request): task = f"""Complete the checkout page and submit the order. Customer details:* Email: {request.customer_email}* First name: {request.customer_first_name}* Last name: {request.customer_last_name} Shipping address: {request.shipping_address} Payment details:* Number: x_card_number* Expiry: x_card_expiry* CVV: x_card_cvv* Name on card: {request.customer_first_name} {request.customer_last_name} Do not tick "remember me". After checkout, return the details of the order.""".strip() controller = Controller(output_model=Order) agent = Agent( browser_session=session, controller=controller, llm=llm, task=task, sensitive_data={ "x_card_number": request.payment_number, "x_card_expiry": request.payment_expiry, "x_card_cvv": request.payment_cvv, }, use_vision=False, ) history = await agent.run() result = history.final_result() return Order.model_validate_json(result) # <snip>
This roughly doubles end-to-end checkout latency for me, although your mileage may vary depending on what ecommerce platform you’re testing against and which model you’re using. Purchasing from The Warehouse (which appears to run on Salesforce Commerce Cloud) isn’t as badly affected by the loss of vision as my test Shopify store.
I wouldn’t worry too much about this, though. There’s lots of room for improving latency elsewhere. In the same way that LLMs can be overkill for certain problems, the same is true for agent loops. Our checkout agent doesn’t really need quite so much agency as we’re giving it right now.
Consider this part of our prompt:
Open the checkout page for an order containing the following products:{product_urls_list}Add products to the cart by directly navigating to each product URL.
There isn’t really a particularly good reason to have the agent figure out how to navigate to each product and add it to the cart when we could simply do it ourselves. Looping over each product URL, calling session.navigate_to(product_url)
, and then finally dispatching an agent to only add the product to our cart saves a lot of agent turns.
Reminder: Don’t push this to production!
There are a lot of missing pieces here that would need implementation before you ship this out to real users.
- What happens if the agent is told to check out from a phishing site that resembles a legitimate website?
- What happens if a user review on the product page contains an attempt at prompt injection? There’s a lot of UGC on an online store, and approaches like CaMeL are hard to use in a system like this one.
- How do you get around anti-bot protection, particularly if you were to deploy this system on a public cloud?
- The long tail is huge! Mitre 10 requires shoppers to select a store or key in a delivery location before it’s possible to add items to a cart. The number of edge cases is likely a big part of the reason why there’s no good general-purpose API for this yet.
- All the usual ecommerce quirks. For example if the user specifically requested to buy from the EU store then your agent should probably respect that, but regional redirects could be a problem if your agent isn’t using an EU-based IP address.
Final thoughts
When I look at the agent.py
, I’m struck by the proportion of plain English it contains relative to Python. Especially so if you strip out all the boilerplate. Karpathy’s claim that “the hottest new programming language is English” seems increasingly prescient. There’s still a lot of traditional software engineering work to be done—LLMs are not building browser-use
yet—but the level of abstraction engineers get to work at has never been higher. It’s a good time to build things!